Hi, Could you also support xgen-7b-8k-inst? Thanks. <a href="htt

Hi, Could you also support xgen-7b-8k-inst ? about llm-awq HOT 4 OPEN

cmxiong commented on July 20, 2024

Hi, Could you also support xgen-7b-8k-inst ?

from llm-awq.

Comments (4)

tonylins commented on July 20, 2024

Hi, thanks for reaching out. At the first glance, the model just use the LlamaForCausalLM class, so the current codebase should be able to support it. Have you tried applying it to the model and see if it works? Thanks!

from llm-awq.

tonylins commented on July 20, 2024

Actually, people tried applying the codebase to another salesforce model here #32 , which turns out to work.

from llm-awq.

casper-hansen commented on July 20, 2024

XGen is a LLaMa model and is already supported. Posting an easy-to-use script here for you to get started. Note that this assumes you have:

built AWQ
pip install tiktoken
the following script in the llm-awq directory

On a 3090, I am getting 76 tokens/s with this model.

hfuser="Salesforce"
model_name="xgen-7b-8k-inst"
group_size=128
repo_path="$hfuser/$model_name"
model_path="/workspace/llm-awq/$model_name"
search_result_path="/workspace/llm-awq/$model_name/$model_name-awq-search.pt"
quantized_model_path="/workspace/llm-awq/$model_name/$model_name-w4-g$group_size.pt"

git clone https://huggingface.co/$repo_path

python3 -m awq.entry --model_path $model_path \
    --w_bit 4 --q_group_size $group_size \
    --run_awq --dump_awq $search_result_path

python3 -m awq.entry --model_path $model_path \
    --w_bit 4 --q_group_size $group_size \
    --load_awq $search_result_path \
    --q_backend real \
    --dump_quant $quantized_model_path

cd tinychat

python3 demo.py --model_type llama \
    --model_path $model_path \
    --q_group_size $group_size \
    --load_quant $quantized_model_path \
    --precision W4A16

from llm-awq.

cmxiong commented on July 20, 2024

Great, thanks so much.

from llm-awq.

Recommend Projects