<a target="_blank" rel="noopener noreferrer" href="https://private-user-images.githubu

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Llama-3-8B about omniquant HOT 5 OPEN

hsb1995 commented on August 21, 2024

Llama-3-8B

from omniquant.

Comments (5)

hsb1995 commented on August 21, 2024

w=16,a=16
I can obtain the uncompressed values of w=16 and a=16. But once the compression value is set（w=6,a=6）, problems arise

from omniquant.

ChenMnZ commented on August 21, 2024

@hsb1995
LLaMA-3-8B uses GQA (Group Query Attention), which is not supported by current ‘let’.

from omniquant.

hsb1995 commented on August 21, 2024

@hsb1995 LLaMA-3-8B uses GQA (Group Query Attention), which is not supported by current ‘let’.

Professor, thank you for your full work. I really don't know how GQA is handled as you mentioned

Can I understand what you said that I kept the original "generateAct_scale.shift" file unchanged to obtain the "act_scales" and "act_shifts" files.
And then I will do our weight quantification for processing?
Parameter settings:
CUDA_VISIBLE_DEVICES=0 python main.py
--model /PATH/TO/LLaMA/llama-8b
--epochs 20 --output_dir ./log/llama-8b-w6a6
--eval_ppl --wbits 6 --abits 6 --lwc
Is the above operation possible?
I only deleted the let operation.

from omniquant.

hsb1995 commented on August 21, 2024

Hey, professor. I gave it a try.
It's really difficult to change. The current errors are as follows. What should I do when encountering these?

[2024-04-24 17:14:17 root](omniquant.py 50): INFO Starting ...
Some weights of LlamaForCausalLM were not initialized from the model checkpoint at /home/sam/Doctorproject/weight/llama-3-8b/LLM-Research/Llama-3-8b/ and are newly initialized: ['model.layers.17.self_attn.rotary_emb.inv_freq', 'model.layers.1.self_attn.rotary_emb.inv_freq', 'model.layers.3.self_attn.rotary_emb.inv_freq', 'model.layers.4.self_attn.rotary_emb.inv_freq', 'model.layers.16.self_attn.rotary_emb.inv_freq', 'model.layers.31.self_attn.rotary_emb.inv_freq', 'model.layers.21.self_attn.rotary_emb.inv_freq', 'model.layers.10.self_attn.rotary_emb.inv_freq', 'model.layers.24.self_attn.rotary_emb.inv_freq', 'model.layers.28.self_attn.rotary_emb.inv_freq', 'model.layers.11.self_attn.rotary_emb.inv_freq', 'model.layers.13.self_attn.rotary_emb.inv_freq', 'model.layers.14.self_attn.rotary_emb.inv_freq', 'model.layers.15.self_attn.rotary_emb.inv_freq', 'model.layers.2.self_attn.rotary_emb.inv_freq', 'model.layers.20.self_attn.rotary_emb.inv_freq', 'model.layers.27.self_attn.rotary_emb.inv_freq', 'model.layers.0.self_attn.rotary_emb.inv_freq', 'model.layers.7.self_attn.rotary_emb.inv_freq', 'model.layers.6.self_attn.rotary_emb.inv_freq', 'model.layers.9.self_attn.rotary_emb.inv_freq', 'model.layers.29.self_attn.rotary_emb.inv_freq', 'model.layers.26.self_attn.rotary_emb.inv_freq', 'model.layers.22.self_attn.rotary_emb.inv_freq', 'model.layers.19.self_attn.rotary_emb.inv_freq', 'model.layers.12.self_attn.rotary_emb.inv_freq', 'model.layers.8.self_attn.rotary_emb.inv_freq', 'model.layers.30.self_attn.rotary_emb.inv_freq', 'model.layers.25.self_attn.rotary_emb.inv_freq', 'model.layers.5.self_attn.rotary_emb.inv_freq', 'model.layers.18.self_attn.rotary_emb.inv_freq', 'model.layers.23.self_attn.rotary_emb.inv_freq']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Traceback (most recent call last):
File "/home/sam/Doctorproject/OmniQuant-main/main.py", line 419, in
main()
File "/home/sam/Doctorproject/OmniQuant-main/main.py", line 383, in main
omniquant(
File "/home/sam/Doctorproject/OmniQuant-main/quantize/omniquant.py", line 102, in omniquant
raise ValueError("Only support for opt/llama/Llama-2/Llama-3/falcon/mixtral now")
ValueError: Only support for opt/llama/Llama-2/Llama-3/falcon/mixtral now

from omniquant.

kimoji919 commented on August 21, 2024

@ChenMnZ hello,I also meet some problems like this.
I've tried your code in runing_falcon180b_on_single_a100_80g.ipynb with llama2-7b. Do quant and do save with real quant.However,while Loading pre-computed quantized weights,it returns warning like this,

and fail while exec code model = model.cuda().
bug like this

I also try your weight in huggingface,but seems it does not work.

from omniquant.

Llama-3-8B about omniquant HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent