Hello~, I'm reading AWQ and have a small question about the metrics. I found the resul

here's the performance: <a href="https://arx

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-ho

here's the performance: <a href="https://arxiv.org/abs/

here's the performance: <a href="https://arxiv.org/abs/2

A question about the metrics in the paper about llm-awq HOT 6 CLOSED

mit-han-lab commented on July 20, 2024

A question about the metrics in the paper

from llm-awq.

Comments (6)

qwopqwop200 commented on July 20, 2024 1

here's the performance:

LLaMA-7B	BPW	Wikitext2
FP16	16	5.68
RTN	4.00	6.29
RTN	3.00	25.54
GPTQ-128g	4.15	5.85
GPTQ-128g	3.15	6.61
AWQ-128g	4.15	5.81
AWQ-128g	3.15	6.46
AWQ-32g	3.60	6.10
SpQR-3b-16g-3b-32g-0.4%	3.63	5.73

AWQ is more hardware efficient and simpler to implement than SpQR, but the compression ratio seems to be worse than SpQR.

from llm-awq.

Sakits commented on July 20, 2024 1

Hi @loklok-infi,

Thanks for your interests in our work!
I think there are two potential reasons for the difference in the results:
(i) We used lm-eval-harness for evaluation, while GPTQ used their own implementation for evaluation (please refer here). There might be some differences in the experiment settings between them.
(ii) Regarding the 3/4-bit results, the results from GPTQ's paper are based on per-channel quantization without using group quantization. Our results are based on group quantization with a group size of 128.
Hope this answers your question :)

from llm-awq.

loklok-infi commented on July 20, 2024

Hi @loklok-infi,

Thanks for your interests in our work! I think there are two potential reasons for the difference in the results: (i) We used lm-eval-harness for evaluation, while GPTQ used their own implementation for evaluation (please refer here). There might be some differences in the experiment settings between them. (ii) Regarding the 3/4-bit results, the results from GPTQ's paper are based on per-channel quantization without using group quantization. Our results are based on group quantization with a group size of 128. Hope this answers your question :)

Thank you for answering! Actually what confuses me more is the fp16 results are also ~10% different, but as you said I guess it's from the lm-eval-harness and the implementation of GPTQ.

I guess it's a problem for the whole community today, a similar problem seems happened between huggingface's LLM leaderboard and LLaMA's official result, hope soon we could have a one-true-standard benchmark implementation for LLMs.

from llm-awq.

loklok-infi commented on July 20, 2024

here's the performance:

LLaMA-7B BPW Wikitext2
FP16 16 5.68
RTN 4.00 6.29
RTN 3.00 25.54
GPTQ-128g 4.15 5.85
GPTQ-128g 3.15 6.61
AWQ-128g 4.15 5.81
AWQ-128g 3.15 6.46
AWQ-32g 3.60 6.10
SpQR-3b-16g-3b-32g-0.4% 3.63 5.73
AWQ is more hardware efficient and simpler to implement than SpQR, but the compression ratio seems to be worse than SpQR.

Thank you! It's very helpful👍

from llm-awq.

JianbangZ commented on July 20, 2024

here's the performance:

LLaMA-7B BPW Wikitext2
FP16 16 5.68
RTN 4.00 6.29
RTN 3.00 25.54
GPTQ-128g 4.15 5.85
GPTQ-128g 3.15 6.61
AWQ-128g 4.15 5.81
AWQ-128g 3.15 6.46
AWQ-32g 3.60 6.10
SpQR-3b-16g-3b-32g-0.4% 3.63 5.73
AWQ is more hardware efficient and simpler to implement than SpQR, but the compression ratio seems to be worse than SpQR.

Hey do you have a repo to benchmark all these quant methods?

from llm-awq.

Sravanth-k27 commented on July 20, 2024

here's the performance:
LLaMA-7B BPW Wikitext2
FP16 16 5.68
RTN 4.00 6.29
RTN 3.00 25.54
GPTQ-128g 4.15 5.85
GPTQ-128g 3.15 6.61
AWQ-128g 4.15 5.81
AWQ-128g 3.15 6.46
AWQ-32g 3.60 6.10
SpQR-3b-16g-3b-32g-0.4% 3.63 5.73
AWQ is more hardware efficient and simpler to implement than SpQR, but the compression ratio seems to be worse than SpQR.

Hey do you have a repo to benchmark all these quant methods?

Yes , Code for this benchmark will be appreciated

from llm-awq.

A question about the metrics in the paper about llm-awq HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent