yiliu30 Goto Github PK

followers: 9.0 following: 123.0 repos: 53.0 gists: 27.0

Name: Yi Liu

Type: User

Company: AI Frameworks Engineer @Intel

Bio: Talk is cheap, pick one and do it.

Location: SH

Hi there 👋

🔭 I’m currently working on @Intel
🌱 I’m currently learning LLM, Quantization, TGI...
💬 Ask me about LLM, low-bits quantization, and so on...

Yi Liu's Projects

accelerate

🚀 A simple way to train and use PyTorch models with multi-GPU, TPU, mixed-precision

ai-pr-reviewer

AI-based Pull Request Summarizer and Reviewer with Chat Capabilities.

auto-awq-fork

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

auto-round

SOTA Weight-only Quantization Algorithm for LLMs

A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

baidu-allreduce

brecq

Pytorch implementation of BRECQ, ICLR 2021

brpc

brpc is an Industrial-grade RPC framework using C++ Language, which is often used in high performance system such as Search, Storage, Machine learning, Advertisement, Recommendation etc. "brpc" means "better RPC".

codexglue

CodeXGLUE

cs344

Introduction to Parallel Programming class code

deepspeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

dist_tuto.pth

Official code for "Writing Distributed Applications with PyTorch", PyTorch Tutorial

examples

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

extension-cpp

C++ extensions in PyTorch

former

Simple transformer implementation from scratch in pytorch.

gemma.cpp

lightweight, standalone C++ inference engine for Google's Gemma models.

gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

hawq

Quantization library for PyTorch. Support low-precision and mixed-precision quantization, with hardware implementation through TVM.

hqq-fork

Official implementation of Half-Quadratic Quantization (HQQ)

intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.

leveldb

LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values.

machine-learning-collection

A resource for learning about Machine learning & Deep Learning

marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

mpi-operator

Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.)

mpitutorial

MPI programming lessons in C and executable code examples

neural-compressor

Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance.

nlp-tutorial

Natural Language Processing Tutorial for Deep Learning Researchers

nn-zero-to-hero

Neural Networks: Zero to Hero

yiliu30 Goto Github PK

Hi there 👋

Yi Liu's Projects

Recommend Projects

Recommend Topics

Recommend Org