Comments (3)
ExLlama is not the same as GPTQ. ExLlama is a library built with a ton of optimizations around specifically the Llama model. It just chose to use GPTQ for quantization. If you just compare raw GPTQ to AWQ, you will find that AWQ is faster.
I was able to get 133 tokens/s on a 7B model with Tinychat, which is pretty good.
from llm-awq.
I know that exllama has some optimizations on cuda core. In fact, I mainly want to know that if awq uses GPU optimization technology, will the performance of awq be better than gptq-reorder?
from llm-awq.
I know that exllama has some optimizations on cuda core. In fact, I mainly want to know that if awq uses GPU optimization technology, will the performance of awq be better than gptq-reorder?
Yes, AWQ is faster because there are no reordering tricks.
from llm-awq.
Related Issues (20)
- Qustion about SFT HOT 6
- Bug report: cuda kernel gemm output nan for last token randomly, while gemv is ok HOT 2
- Trying to implement llm-awq in our OpenNMT-py architecture - help welcome HOT 1
- Add support for INT3 quantization HOT 2
- Question about the zero point HOT 1
- Question about the zero point
- ImportError: DLL load failed while importing awq_inference_engine HOT 1
- TypeError: LlamaForCausalLM.forward() got an unexpected keyword argument 'start_pos'
- Support long_llama
- [Question] Why does scales have 2 dimensions? HOT 1
- OpenCL support
- can not install awq CUDA kernels
- inference speed HOT 1
- difference from gptq when inferring HOT 6
- performance for prefill on long sequence
- 4bit awq backwarding
- AWQ and SmoothQuant HOT 2
- where is awq_inference_engine of "llm-awq/awq/quantize /qmodule.py" HOT 1
- why can only protect one salient channel per group? HOT 1
- 2 bit AWQ results?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from llm-awq.