wimh966 / qdrop Goto Github PK

The official PyTorch implementation of the ICLR2022 paper, QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization

Python 99.34% Shell 0.66%

qdrop's Introduction

QDrop

Introduction

This repository contains the offical implementation for our paper QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization

In this work, we investigate how the activation quantization affects weight tuning. QDrop builds the relationship between activation quantization and flatness of quantized weights, and then proposes to randomly drop the activation quantization to achieve a flatter optimized weights.

2023/07/11 Update codes of detection task on branch qdrop_coco

File Organization

This branch qdrop contains the code of classificaion task on ImageNet dataset. Another branch qdrop_coco contains the code of detection task on MSCOCO dataset. For the code of MSCOCO, please see README in its branch.

QDrop/qdrop/
├── quantization/       [Quantization tools]
│   ├── fake_quant.py   [Implement quantize and dequantize functions]   
│   ├── observer.py     [Collect the information of distribution and calculate quantization clipping range]     
│   ├── state.py        [Set quantization states]

├── models/             [Definition of quantized models]

├── solver/ 
|   ├── main_imagenet.py [Run quantization on imagenet dataset]
|   ├── imagenet_util.py [Load imagenet dataset]
|   ├── recon.py         [Reconstruct models]

Usage

Go into the exp/w2a4 directory. You can find config.yaml and run.sh for each architecture. Execute the run.sh for quantized model. Other bit settings only need to change the corresponding bit number in yaml file.

Results

Results on low-bit activation in terms of accuracy on ImageNet.

Methods	Bits (W/A)	Res18	Res50	MNV2	Reg600M	Reg3.2G	MNasx2
Full Prec.	32/32	71.06	77.00	72.49	73.71	78.36	76.68
QDrop	4/4	69.13	75.12	67.83	70.95	76.46	73.04
QDrop	2/4	64.38	70.31	54.29	63.07	71.84	63.28
QDrop	3/3	65.68	71.28	54.38	64.65	71.69	64.05
QDrop	2/2	51.69	55.18	11.95	39.13	54.40	23.66

Reference

If you find this repo useful for your research, please consider citing the paper:

@article{wei2022qdrop,
title={QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization},
author={Wei, Xiuying and Gong, Ruihao and Li, Yuhang and Liu, Xianglong and Yu, Fengwei},
journal={arXiv preprint arXiv:2203.05740},
year={2022}
}

qdrop's People

Contributors

Stargazers

Watchers

Forkers

scotter-qian caowgg nickvdw itminner sallyeen qiulinzhang kilingki hylihitic imh966 xuke225 brotherhappy flytigerw tygerwu lenghuixing0330 goatwu cuong-pv innovatedmonster

qdrop's Issues

Runtimerror in experiment on mmdetection

Hi ! Thanks for your excellent work. When I try to reproduce the results based on mmdetection, there are some errors : RuntimeError: a leaf Variable that requires grad is being used in an in-place operation. Could you help me to solve those problems?
The detials are as follows:
Traceback (most recent call last):
File "qdrop/solver/quant_coco.py", line 365, in
Traceback (most recent call last):
File "qdrop/solver/quant_coco.py", line 365, in
main()
File "qdrop/solver/quant_coco.py", line 265, in main
recon_model(model, fp_model, cali_data, q_config.recon)
File "qdrop/solver/quant_coco.py", line 361, in recon_model
main()
File "qdrop/solver/quant_coco.py", line 265, in main
_recon_model(model, fp_model)
File "qdrop/solver/quant_coco.py", line 359, in _recon_model
recon_model(model, fp_model, cali_data, q_config.recon)
File "qdrop/solver/quant_coco.py", line 361, in recon_model
_recon_model(child_module, getattr(fp_module, name))
File "qdrop/solver/quant_coco.py", line 359, in _recon_model
_recon_model(model, fp_model)
File "qdrop/solver/quant_coco.py", line 359, in _recon_model
_recon_model(child_module, getattr(fp_module, name))
File "qdrop/solver/quant_coco.py", line 357, in _recon_model
_recon_model(child_module, getattr(fp_module, name))
File "qdrop/solver/quant_coco.py", line 359, in _recon_model
reconstruction(model, fp_model, child_module, getattr(fp_module, name), cali_data, recon_config)
File "/QDrop-qdrop_coco/qdrop/solver/recon.py", line 148, in reconstruction
_recon_model(child_module, getattr(fp_module, name))
File "qdrop/solver/quant_coco.py", line 357, in _recon_model
module_ddp = build_ddp(
File "/QDrop-qdrop_coco/mmdet/utils/util_distribution.py", line 71, in build_ddp
reconstruction(model, fp_model, child_module, getattr(fp_module, name), cali_data, recon_config)
File "/QDrop-qdrop_coco/qdrop/solver/recon.py", line 148, in reconstruction
return ddp_factory[device](model, *args, **kwargs)
File "/home/xxx/anaconda3/envs/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 580, in init
module_ddp = build_ddp(
File "/home/xiaojunrui/QDrop-qdrop_coco/mmdet/utils/util_distribution.py", line 71, in build_ddp
return ddp_factory[device](model, *args, **kwargs)
File "/home/xxx/anaconda3/envs/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 580, in init
self._sync_params_and_buffers(authoritative_rank=0)
File "/home/xxx/anaconda3/envs/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 597, in _sync_params_and_buffers
self._sync_params_and_buffers(authoritative_rank=0)
self._distributed_broadcast_coalesced(
File "/home/xxx/anaconda3/envs/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 597, in _sync_params_and_buffers
File "/home/xxxx/anaconda3/envs/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1334, in _distributed_broadcast_coalesced
self._distributed_broadcast_coalesced(
File "/home/xxx/anaconda3/envs/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1334, in _distributed_broadcast_coalesced
dist._broadcast_coalesced(
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
dist._broadcast_coalesced(
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2421226) of binary: /home/xxx/anaconda3/envs/bin/python

Unable to reproduce the results in paper for MobileNetV2

您好！我发现无法使用代码复现论文Tab 3中BRECQ量化配置下的QDROP（W2A2）MobileNetV2的结果。论文中的结果为13.05，可是我在设置了网络第一层输出为8比特（只在源码上修改了这一部分）之后，得到的结果仅为9.848，这是正常情况吗？修改部分的代码如下：
a_list[0].bitwidth_refactor(8)

about the first layer quantization

你好,看了代码,发现的输入是normalize之后的,所以无论怎么做,确认下第一层只是权重量化吧.

关于目标检测网络的FP模型参数来源

请问论文中提到的fasterRCNN RetinaNet 网络的全精度模型参数是从哪里下载的呢？

关于原始模型的下载

程序里用到的resnet的模型从哪下，因为我看代码里resnet是自己写的，但是参数文件是在你本地

I could get accuracy 0% as results

I'm trying to get a similar results in github but I could only get abnormal results, accuracy : 0%

I used ImageNet datasets, resnet18 architecture

I'll attach some pictures:

ImageNet folder, directory structure
the code that I executed
architecture and results screent shots

Thank you so much

train directory

ILSVRC/Data/CLS-LOC/train/folders/images.jpeg

val directory
-ILSVRC/Data/CLS-LOC/val2/val(single folder)/images.jpeg

code that I executed

Architecture & results

CUDA out of memory

Traceback (most recent call last):
  File "../../../qdrop/solver/main_imagenet.py", line 124, in <module>
    main(args.config)
  File "../../../qdrop/solver/main_imagenet.py", line 112, in main
    recon_model(model, fp_model)
  File "../../../qdrop/solver/main_imagenet.py", line 110, in recon_model
    recon_model(child_module, getattr(fp_module, name))
  File "../../../qdrop/solver/main_imagenet.py", line 108, in recon_model
    reconstruction(model, fp_model, child_module, getattr(fp_module, name), cali_data, config.quant.recon)
  File "/code/QDrop-o/qdrop/solver/recon.py", line 121, in reconstruction
    fp_inp, fp_oup = save_inp_oup_data(fp_model, fp_module, cali_data, store_inp=True, store_oup=True, bs=config.batch_size, keep_gpu=config.keep_gpu)
  File "/code/QDrop-o/qdrop/solver/recon.py", line 36, in save_inp_oup_data
    cached[1] = torch.cat([x for x in cached[1]])
RuntimeError: CUDA out of memory. Tried to allocate 3.06 GiB (GPU 0; 14.62 GiB total capacity; 9.48 GiB already allocated; 2.55 GiB free; 11.03 GiB reserved in total by PyTorch)

There is a problem of "out of memory", how much GPU memory does this code need? And how to train with multiple GPUs?

Can't reproduce results on Resnet18

Hello,

I'm trying to reproduce the results from the paper (Table 3) for ResNet18 (on ImageNet) but I'm getting an accuracy that is 0.5-0.7% in all cases.
I tried the given config for W2A4 and also W4A4, both with different random seeds. All attempts separately and the average results came out smaller than expected.
I also tried to compare the results without the 8-bit for the first and last layers restriction and got a similar accuracy drop compared to the relevant results from the paper.

I will appreciate your support in understanding if there is some issue here, although I didn't change anything in the code...

Thanks in advance.

Question for case setting in paper

您好，非常感谢你们的工作，在这里我想问一下关于论文中case2和case3的问题。文中你们指出case3没对当前block的激活进行量化，但是我在阅读你们的代码时发现在进行block重建时你们计算损失是在对激活量化之后进行的。
err = loss_func(out_quant, cur_out)
这里的out_quant是对激活进行量化之后的特征，所以我想知道case2和case3的区别。
谢谢！

Questions about “set_first_last_layer_to_8bit” function

hello，in the “set_first_last_layer_to_8bit” function at quant_model.py,What role does this sentence “module_list[-2].act_quantizer.bitwidth_refactor(8)” play？It is not the last layer in the resnet18. Actually i find it is the stage4.unit2.conv2 module but its act_quantizer is disabled.
I want to know what the purpose of this sentence is。thank you very much If you answer my question

Unable to reproduce the results in paper

您好，我采用run.sh的默认参数以及pretrain model，此时设置了第一层和最后一层为8bit，在ResNet18上跑出量化结果为：

w2a2：50.73
w3a3：65.32
w2a4：64.57
都达不到论文里面的结果。
您这边大概知道可能的原因吗？

关于检测模型的代码

您好，非常感谢您这么优秀的工作，代码仓库中只看到了ImageNet的相关代码，请问能否开源检测任务上的相关代码呢，非常感谢！

confused about activation quantization

Few questions about activation quantization.

in the real depolyment,we always fuse the activations,so why we need to quantize the activation output.
to the best of my knowledge , normally what activation function does is that to clip the value calculated by the previous layer. By doing activation quantization,we clipped the quantized value by using an quantized activation funciton. It's quite reasonable. So could you please give me some explanation or more intuition about why we need to adjust or fine tune the quantized activation?

about the perform_1D_search and perform_2D_search

perform_1D_search 表示对只有正值或负值的进行搜索，perform_2D_search表示对有正负值的进行搜索，但是里面的如何搜索的看不出来什么原理，有什么参考资料吗

QDrop/quant/quant_layer.py

Line 167 in f1162c9

def perform_1D_search(self, x):

About transformers on NLP tasks

您好！您论文中包含了对BERT的量化，请问您是否可以给出对BERT量化的代码呢？

感谢作者的分享，我想请问一下这个QDrop可以应用在yolov5上面吗

Qdrop-BERT experiment configurations

Hello :)

First of all, it was a pleasure to see your cool papers and experimental results.

There is one question that I am facing, so I leave this issue here.

Among the experimental results, I am curious about the experimental setting for the BERT model on GLUE task.
The paper did not explain how the block structure was established.
I would be very grateful if you could tell me about the detailed experimental settings.

iters of reconstruction

hello, in your code, I notice the iters of reconstruction for each resnet layer is 20,000. I try to turn it to 1000/5000, and find 20,000 is best, but there is not much difference between 1000 and 5000. So I am curious how much the iters will determine the final accuracy, and whether we can reduce it to 5000~1000 without much hesitation.

如何获取量化后的weight和input

你好，我尝试从imagenet_utils.validate_model(val_loader,model)中加hook获取量化后的input和weight,但是失败了,似乎validate_mode里面的仍然是以浮点类型进行的运算。请问如何可以获得量化后模型的整型input,和weight?
谢谢

"mse" 中 p 的取值

您好，想问一下，为什么这里的 p 用的是 2.4 呀？

# L_p norm minimization as described in LAPQ
# https://arxiv.org/abs/1911.07190
score = lp_loss(x, x_q, 2.4, reduction='all')

The optimization order

Thanks for your work!
I have a question regarding the optimization order.
In BRECQ, authors first separately perform AdaRound then they optimize input quantization parameters. But in your implementation you jointly optimized both parameters. Does this approach give better performance than BRECQ's?