How long for the quantizing a 70b model? I had ran for 2days about aqlm HOT 7 CLOSED

vahe1994 commented on August 21, 2024

How long for the quantizing a 70b model? I had ran for 2days

from aqlm.

Comments (7)

xiechengmude commented on August 21, 2024

python main.py $MODEL_PATH $DATASET_PATH --nsamples=1024 \ --num_codebooks=1 --nbits_per_codebook=16 --in_group_size=8 \ --relative_mse_tolerance=0.01 --finetune_relative_mse_tolerance=0.001 \ --finetune_batch_size=32 --local_batch_size=1 --offload_activations \ --wandb --save $SAVE_PATH

from aqlm.

Vahe1994 commented on August 21, 2024

Hello!
Thank you for your interest in the project. Yes indeed, AQLM quantization takes considerably longer to calibrate than simpler quantization methods such as GPTQ. This only impacts quantization time, not inference time.
Quantization depends on your model size, hardware(number of GPUs , GPUs models e.t.c.) and quantization parameters.
I added more details on quantization time in ReadME.
Hope this helps. If you have any additional questions, please feel free to ask.

from aqlm.

xiechengmude commented on August 21, 2024

could you share a example script for quantizing a 70b model on 8*A100 ?

from aqlm.

Vahe1994 commented on August 21, 2024

Hi!
Hope this helps:
WANDB_PROJECT="wandb_project" WANDB_NAME="wandb_name" HF_HOME="/mnt/LLM" CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 OMP_NUM_THREADS=16 MKL_NUM_THREADS=16 python main.py meta-llama/Llama-2-70b-hf "pajama" --relative_mse_tolerance=0.01 --finetune_relative_mse_tolerance=0.001 --nsamples=2048 --num_codebooks=1 --nbits_per_codebook=16 --in_group_size=8 --finetune_batch_size=32 --local_batch_size=2 --wandb --save="path_to_save"

from aqlm.

Vahe1994 commented on August 21, 2024

If you want farther improve ppl, you can additionally run global fine-tuning after you obtained quantized model see #50 for the code and see #49 for example how to run it.

from aqlm.

github-actions commented on August 21, 2024

This issue is stale because it has been open for 30 days with no activity.

from aqlm.

github-actions commented on August 21, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

from aqlm.

Recommend Projects

How long for the quantizing a 70b model? I had ran for 2days about aqlm HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent