Hi I used weight int4, but when I run inference, finding that weight is actually int16

<a target="_blank" rel="noopener noreferrer" href="https://private-user-images.githubusercontent.com

below is my to do quant python -m awq.entry --model_path $MOD

Weight int4 quantization, but actually it is int16 about llm-awq HOT 4 OPEN

dongxuemin666 commented on July 4, 2024

Weight int4 quantization, but actually it is int16

from llm-awq.

Comments (4)

dongxuemin666 commented on July 4, 2024 1

I get this, weight is fake int4, in calculation, actually is int16

from llm-awq.

dongxuemin666 commented on July 4, 2024

image seems to be broken, please see this one

from llm-awq.

dongxuemin666 commented on July 4, 2024

below is my script to do quant

python -m awq.entry --model_path $MODEL
--w_bit 4 --q_group_size 128
--run_awq --dump_awq awq/llava_w4/llava-v1.6-vicuna-7b-w4-g128.pt

python -m awq.entry --model_path $MODEL
--w_bit 4 --q_group_size 128
--load_awq awq/llava_w4/llava-v1.6-vicuna-7b-w4-g128.pt
--q_backend real --dump_quant awq/llava_w4/llava-v1.6-vicuna-7b-w4-g128-awq.pt

from llm-awq.

ponytaill commented on July 4, 2024

I get this, weight is fake int4, in calculation, actually is int16

If it's convenient for you, could you explain it?

from llm-awq.

Weight int4 quantization, but actually it is int16 about llm-awq HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent