Topic: quantization Goto Github

Some thing interesting about quantization

👇 Here are 605 public repositories matching this topic...

666dzy666 / micronet

quantization,micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape

User: 666dzy666

quantization pruning dorefa twn bnn xnor-net pytorch model-compression group-convolution network-slimming

aaron-xichen / pytorch-playground

quantization,Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

User: aaron-xichen

pytorch-tutorial pytorch-tutorials pytorch quantization

autogptq / autogptq

quantization,An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Organization: autogptq

deep-learning inference large-language-models llms nlp pytorch quantization transformer transformers

csarron / awesome-emdl

quantization,Embedded and mobile deep learning research resources

User: csarron

efficient-neural-networks deep-neural-networks deep-learning mobile-ai embedded-ai mobile-inference mobile-deep-learning pruning quantization neural-network-compression

deepvac / deepvac

quantization,PyTorch Project Specification.

Organization: deepvac

pytorch deepvac python tensorboard quantization amp torchscript onnx ncnn coreml

dvmazur / mixtral-offloading

quantization,Run Mixtral-8x7B models in Colab or consumer desktops

User: dvmazur

colab-notebook deep-learning google-colab language-model llm mixture-of-experts offloading pytorch quantization

google / qkeras

quantization,QKeras: a quantization deep learning library for Tensorflow Keras

Organization: google

deep-learning quantization quantized-neural-networks hardware-acceleration quantized-networks tensorflow keras machine-learning asic-design fpga

guan-yuan / awesome-automl-and-lightweight-models

quantization,A list of high-quality (newest) AutoML works and lightweight models including 1.) Neural Architecture Search, 2.) Lightweight Structures, 3.) Model Compression, Quantization and Acceleration, 4.) Hyperparameter Optimization, 5.) Automated Feature Engineering.

User: guan-yuan

automl meta-learning automated-feature-engineering hyperparameter-optimization architecture-search model-compression model-acceleration awesome-list neural-architecture-search nas

hiyouga / llama-factory

quantization,Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)

User: hiyouga

Home Page: https://arxiv.org/abs/2403.13372

fine-tuning language-model llama llm peft transformers rlhf qlora quantization chatglm

htqin / awesome-model-quantization

quantization,A list of papers, docs, codes about model quantization. This repo is aimed to provide the info for model quantization research, we are continuously improving the project. Welcome to PR the works (papers, repositories) that are missed by the repo.

User: htqin

deep-learning quantization awesome model-compression binarization binarized-neural-networks binary-network efficient-deep-learning lightweight-neural-network model-acceleration model-quantization

huawei-noah / efficient-computing

quantization,Efficient computing methods developed by Huawei Noah's Ark Lab

Organization: huawei-noah

knowledge-distillation model-compression binary-neural-networks pruning quantization self-supervised

huawei-noah / pretrained-language-model

quantization,Pretrained language model and its related optimization techniques developed by Huawei Noah's Ark Lab.

Organization: huawei-noah

knowledge-distillation model-compression quantization pretrained-models large-scale-distributed

huggingface / optimum

quantization,🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools

Organization: huggingface

Home Page: https://huggingface.co/docs/optimum/main/

onnx pytorch inference training intel graphcore onnxruntime transformers quantization habana

huggingface / optimum-quanto

quantization,A pytorch quantization backend for optimum

Organization: huggingface

pytorch quantization optimum

imageoptim / libimagequant

quantization,Palette quantization library that powers pngquant and other PNG optimizers

Organization: imageoptim

Home Page: https://pngquant.org/lib

quantization quality palette rgba-pixels palette-generation minification pixel-array image-pixels conversion visual-studio

intel / intel-extension-for-pytorch

quantization,A Python package for extending the official PyTorch that can easily obtain performance on Intel platform

Organization: intel

pytorch neural-network machine-learning deep-learning intel quantization

intel / neural-compressor

quantization,SOTA low-bit LLM quantization (INT8/FP8/INT4/FP4/NF4) & sparsity; leading model compression techniques on TensorFlow, PyTorch, and ONNX Runtime

Organization: intel

Home Page: https://intel.github.io/neural-compressor/

low-precision pruning sparsity auto-tuning knowledge-distillation quantization quantization-aware-training post-training-quantization smoothquant large-language-models

intellabs / distiller

quantization,Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller

Organization: intellabs

pytorch pruning quantization pruning-structures jupyter-notebook network-compression deep-neural-networks regularization group-lasso distillation

intellabs / nlp-architect

quantization,A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

Organization: intellabs

Home Page: https://intellabs.github.io/nlp-architect

deeplearning nlp nlu tensorflow dynet deep-learning pytorch bert transformers quantization

ist-daslab / marlin

quantization,FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Organization: ist-daslab

4bit kernel llm quantization

ki6an / fastt5

quantization,⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.

User: ki6an

python t5 onnx onnxruntime quantization fastt5 nlp fast quantized-onnx-models translation

kornelski / pngquant

quantization,Lossy PNG compressor — pngquant command based on libimagequant library

User: kornelski

Home Page: https://pngquant.org

pngquant quality png png-compression quantization stdin palette smaller conversion image-optimization

maknee / minigpt4.cpp

quantization,Port of MiniGPT4 in C++ (4bit, 5bit, 6bit, 8bit, 16bit CPU inference with GGML)

User: maknee

c cpp deep-learning ggml machine-learning minigpt4 multimodal quantization

mit-han-lab / tinychatengine

quantization,TinyChatEngine: On-Device LLM Inference Library

Organization: mit-han-lab

Home Page: https://mit-han-lab.github.io/TinyChatEngine/

arm c cpp cuda-programming deep-learning edge-computing large-language-models on-device-ai quantization x86-64

mit-han-lab / tinyengine

quantization,[NeurIPS 2020] MCUNet: Tiny Deep Learning on IoT Devices; [NeurIPS 2021] MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning; [NeurIPS 2022] MCUNetV3: On-Device Training Under 256KB Memory

Organization: mit-han-lab

Home Page: https://mcunet.mit.edu

c codegenerator cpp deep-learning microcontroller pytorch tinyml edge-computing neural-architecture-search quantization

mobiusml / hqq

quantization,Official implementation of Half-Quadratic Quantization (HQQ)

Organization: mobiusml

Home Page: https://mobiusml.github.io/hqq_blog/

machine-learning quantization llm

mpolaris / onnx2tflite

quantization,Tool for onnx->keras or onnx->tflite. Hope this tool can help you.

User: mpolaris

onnx2keras onnx2tflite onnx2tensorflow deployment onnx pytorch2tensorflow quantization

neuralmagic / deepsparse

quantization,Sparsity-aware deep learning inference runtime for CPUs

Organization: neuralmagic

Home Page: https://neuralmagic.com/deepsparse/

machinelearning onnx inference computer-vision object-detection pruning quantization pretrained-models nlp cpus

open-mmlab / mmrazor

quantization,OpenMMLab Model Compression Toolbox and Benchmark.

Organization: open-mmlab

Home Page: https://mmrazor.readthedocs.io/en/latest/

nas pruning knowledge-distillation spos darts autoslim detection segmentation classification pytorch quantization

opengvlab / omniquant

quantization,[ICLR2024 spotlight] OmniQuant is a simple and powerful quantization technique for LLMs.

Organization: opengvlab

large-language-models llm quantization

opennmt / ctranslate2

quantization,Fast inference engine for Transformer models

Organization: opennmt

Home Page: https://opennmt.net/CTranslate2

neural-machine-translation cpp mkl quantization cuda thrust opennmt deep-neural-networks openmp onednn

openppl / ppq

quantization,PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.

Organization: openppl

neural-network deep-learning quantization pytorch caffe onnx cuda open-source

openvinotoolkit / nncf

quantization,Neural Network Compression Framework for enhanced OpenVINO™ inference

Organization: openvinotoolkit

quantization pruning sparsity quantization-aware-training mixed-precision-training compression semantic-segmentation object-detection classification nlp

openvinotoolkit / training_extensions

quantization,Train, Evaluate, Optimize, Deploy Computer Vision Models via OpenVINO™

Organization: openvinotoolkit

Home Page: https://openvinotoolkit.github.io/training_extensions/

openvino computer-vision deep-learning pytorch neural-networks-compression quantization hyper-parameter-optimization image-classification image-segmentation object-detection

paddlepaddle / paddleslim

quantization,PaddleSlim is an open-source library for deep model compression and architecture search.

Organization: paddlepaddle

Home Page: https://paddleslim.readthedocs.io/zh_CN/latest/

pruning quantization nas bert compression detection distillation ernie segmentation sparsity

pinto0309 / onnx2tf

quantization,Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request.

User: pinto0309

deep-learning machine-learning model-converter models onnx tensorflow tensorflow-lite tflite docker onnx-tensorflow

pytorch / ao

quantization,PyTorch native quantization and sparsity for training and inference

Organization: pytorch

brrr dtypes inference mx pytorch quantization sparsity training float8 transformer

quic / aimet

quantization,AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

Organization: quic

Home Page: https://quic.github.io/aimet-pages/index.html

quantization deep-learning compression open-source machine-learning pruning auto-ml network-compression deep-neural-networks network-quantization

rahulschand / gpu_poor

quantization,Calculate token/s & GPU memory requirement for any LLM. Supports llama.cpp/ggml/bnb/QLoRA quantization

User: rahulschand

Home Page: https://rahulschand.github.io/gpu_poor/

ggml huggingface llm quantization gpu language-model pytorch llama llama2 llamacpp

rwkv / rwkv.cpp

quantization,INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model

Organization: rwkv

deep-learning language-model llm machine-learning quantization rwkv ggml

sforaidl / kd_lib

quantization,A Pytorch Knowledge Distillation library for benchmarking and extending works in the domains of Knowledge Distillation, Pruning, and Quantization.

Organization: sforaidl

Home Page: https://kd-lib.readthedocs.io/

knowledge-distillation model-compression pruning quantization pytorch deep-learning-library machine-learning data-science benchmarking algorithm-implementations

squeezeailab / squeezellm

quantization,[ICML 2024] SqueezeLLM: Dense-and-Sparse Quantization

Organization: squeezeailab

Home Page: https://arxiv.org/abs/2306.07629

efficient-inference large-language-models llm model-compression natural-language-processing post-training-quantization quantization text-generation transformer llama

stochasticai / xturing

quantization,Build, customize and control you own LLMs. From data pre-processing to fine-tuning, xTuring provides an easy way to personalize open-source LLMs. Join our discord community: https://discord.gg/TgHXuSJEk6

Organization: stochasticai

Home Page: https://xturing.stochastic.ai

deep-learning fine-tuning gpt-2 gpt-j llama llm lora language-model alpaca finetuning

systran / faster-whisper

quantization,Faster Whisper transcription with CTranslate2

Organization: systran

deep-learning inference quantization speech-recognition speech-to-text transformer whisper openai

tensorflow / model-optimization

quantization,A toolkit to optimize ML models for deployment for Keras and TensorFlow, including quantization and pruning.

Organization: tensorflow

Home Page: https://www.tensorflow.org/model_optimization

tensorflow machine-learning deep-learning optimization quantized-neural-networks quantized-networks quantized-training keras model-compression compression ml pruning sparsity quantization

thulab / deephash

quantization,An Open-Source Package for Deep Learning to Hash (DeepHash)

Organization: thulab

deep-learning quantization hashing similarity-search

ufund-me / qbot

quantization,[🔥updating ...] AI 自动量化交易机器人(完全本地部署) AI-powered Quantitative Investment Research Platform. 📃 online docs: https://ufund-me.github.io/Qbot ✨ :news: qbot-mini: https://github.com/Charmve/iQuant

Organization: ufund-me

Home Page: https://github.com/Charmve

funds machine-learning pytrade quantitative-finance quantitative-trading quantization strategies trademarks quant-trade quant-trader