Git Product home page Git Product logo

qdrop's Introduction

QDrop

Introduction

This repository contains the offical implementation for our paper QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization

In this work, we investigate how the activation quantization affects weight tuning. QDrop builds the relationship between activation quantization and flatness of quantized weights, and then proposes to randomly drop the activation quantization to achieve a flatter optimized weights.

image-20230711143114190

  • 2023/07/11 Update codes of detection task on branch qdrop_coco

File Organization

This branch qdrop contains the code of classificaion task on ImageNet dataset. Another branch qdrop_coco contains the code of detection task on MSCOCO dataset. For the code of MSCOCO, please see README in its branch.

QDrop/qdrop/
├── quantization/       [Quantization tools]
│   ├── fake_quant.py   [Implement quantize and dequantize functions]   
│   ├── observer.py     [Collect the information of distribution and calculate quantization clipping range]     
│   ├── state.py        [Set quantization states]

├── models/             [Definition of quantized models]

├── solver/ 
|   ├── main_imagenet.py [Run quantization on imagenet dataset]
|   ├── imagenet_util.py [Load imagenet dataset]
|   ├── recon.py         [Reconstruct models]

Usage

Go into the exp/w2a4 directory. You can find config.yaml and run.sh for each architecture. Execute the run.sh for quantized model. Other bit settings only need to change the corresponding bit number in yaml file.

Results

Results on low-bit activation in terms of accuracy on ImageNet.

Methods Bits (W/A) Res18 Res50 MNV2 Reg600M Reg3.2G MNasx2
Full Prec. 32/32 71.06 77.00 72.49 73.71 78.36 76.68
QDrop 4/4 69.13 75.12 67.83 70.95 76.46 73.04
QDrop 2/4 64.38 70.31 54.29 63.07 71.84 63.28
QDrop 3/3 65.68 71.28 54.38 64.65 71.69 64.05
QDrop 2/2 51.69 55.18 11.95 39.13 54.40 23.66

Reference

If you find this repo useful for your research, please consider citing the paper:

@article{wei2022qdrop,
title={QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization},
author={Wei, Xiuying and Gong, Ruihao and Li, Yuhang and Liu, Xianglong and Yu, Fengwei},
journal={arXiv preprint arXiv:2203.05740},
year={2022}
}

qdrop's People

Contributors

wimh966 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

qdrop's Issues

Runtimerror in experiment on mmdetection

Hi ! Thanks for your excellent work. When I try to reproduce the results based on mmdetection, there are some errors : RuntimeError: a leaf Variable that requires grad is being used in an in-place operation. Could you help me to solve those problems?
The detials are as follows:
Traceback (most recent call last):
File "qdrop/solver/quant_coco.py", line 365, in
Traceback (most recent call last):
File "qdrop/solver/quant_coco.py", line 365, in
main()
File "qdrop/solver/quant_coco.py", line 265, in main
recon_model(model, fp_model, cali_data, q_config.recon)
File "qdrop/solver/quant_coco.py", line 361, in recon_model
main()
File "qdrop/solver/quant_coco.py", line 265, in main
_recon_model(model, fp_model)
File "qdrop/solver/quant_coco.py", line 359, in _recon_model
recon_model(model, fp_model, cali_data, q_config.recon)
File "qdrop/solver/quant_coco.py", line 361, in recon_model
_recon_model(child_module, getattr(fp_module, name))
File "qdrop/solver/quant_coco.py", line 359, in _recon_model
_recon_model(model, fp_model)
File "qdrop/solver/quant_coco.py", line 359, in _recon_model
_recon_model(child_module, getattr(fp_module, name))
File "qdrop/solver/quant_coco.py", line 357, in _recon_model
_recon_model(child_module, getattr(fp_module, name))
File "qdrop/solver/quant_coco.py", line 359, in _recon_model
reconstruction(model, fp_model, child_module, getattr(fp_module, name), cali_data, recon_config)
File "/QDrop-qdrop_coco/qdrop/solver/recon.py", line 148, in reconstruction
_recon_model(child_module, getattr(fp_module, name))
File "qdrop/solver/quant_coco.py", line 357, in _recon_model
module_ddp = build_ddp(
File "/QDrop-qdrop_coco/mmdet/utils/util_distribution.py", line 71, in build_ddp
reconstruction(model, fp_model, child_module, getattr(fp_module, name), cali_data, recon_config)
File "/QDrop-qdrop_coco/qdrop/solver/recon.py", line 148, in reconstruction
return ddp_factory[device](model, *args, **kwargs)
File "/home/xxx/anaconda3/envs/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 580, in init
module_ddp = build_ddp(
File "/home/xiaojunrui/QDrop-qdrop_coco/mmdet/utils/util_distribution.py", line 71, in build_ddp
return ddp_factory[device](model, *args, **kwargs)
File "/home/xxx/anaconda3/envs/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 580, in init
self._sync_params_and_buffers(authoritative_rank=0)
File "/home/xxx/anaconda3/envs/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 597, in _sync_params_and_buffers
self._sync_params_and_buffers(authoritative_rank=0)
self._distributed_broadcast_coalesced(
File "/home/xxx/anaconda3/envs/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 597, in _sync_params_and_buffers
File "/home/xxxx/anaconda3/envs/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1334, in _distributed_broadcast_coalesced
self._distributed_broadcast_coalesced(
File "/home/xxx/anaconda3/envs/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1334, in _distributed_broadcast_coalesced
dist._broadcast_coalesced(
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
dist._broadcast_coalesced(
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2421226) of binary: /home/xxx/anaconda3/envs/bin/python

Unable to reproduce the results in paper for MobileNetV2

您好!我发现无法使用代码复现论文Tab 3中BRECQ量化配置下的QDROP(W2A2)MobileNetV2的结果。论文中的结果为13.05,可是我在设置了网络第一层输出为8比特(只在源码上修改了这一部分)之后,得到的结果仅为9.848,这是正常情况吗?修改部分的代码如下:
a_list[0].bitwidth_refactor(8)

关于原始模型的下载

程序里用到的resnet的模型从哪下,因为我看代码里resnet是自己写的,但是参数文件是在你本地

I could get accuracy 0% as results

I'm trying to get a similar results in github but I could only get abnormal results, accuracy : 0%

I used ImageNet datasets, resnet18 architecture

I'll attach some pictures:

  • ImageNet folder, directory structure
  • the code that I executed
  • architecture and results screent shots

Thank you so much

train directory

  • ILSVRC/Data/CLS-LOC/train/folders/images.jpeg
    train

val directory
-ILSVRC/Data/CLS-LOC/val2/val(single folder)/images.jpeg
val9

code that I executed
run10

Architecture & results
1

2

3

CUDA out of memory

Traceback (most recent call last):
  File "../../../qdrop/solver/main_imagenet.py", line 124, in <module>
    main(args.config)
  File "../../../qdrop/solver/main_imagenet.py", line 112, in main
    recon_model(model, fp_model)
  File "../../../qdrop/solver/main_imagenet.py", line 110, in recon_model
    recon_model(child_module, getattr(fp_module, name))
  File "../../../qdrop/solver/main_imagenet.py", line 108, in recon_model
    reconstruction(model, fp_model, child_module, getattr(fp_module, name), cali_data, config.quant.recon)
  File "/code/QDrop-o/qdrop/solver/recon.py", line 121, in reconstruction
    fp_inp, fp_oup = save_inp_oup_data(fp_model, fp_module, cali_data, store_inp=True, store_oup=True, bs=config.batch_size, keep_gpu=config.keep_gpu)
  File "/code/QDrop-o/qdrop/solver/recon.py", line 36, in save_inp_oup_data
    cached[1] = torch.cat([x for x in cached[1]])
RuntimeError: CUDA out of memory. Tried to allocate 3.06 GiB (GPU 0; 14.62 GiB total capacity; 9.48 GiB already allocated; 2.55 GiB free; 11.03 GiB reserved in total by PyTorch)

There is a problem of "out of memory", how much GPU memory does this code need? And how to train with multiple GPUs?

Can't reproduce results on Resnet18

Hello,

I'm trying to reproduce the results from the paper (Table 3) for ResNet18 (on ImageNet) but I'm getting an accuracy that is 0.5-0.7% in all cases.
I tried the given config for W2A4 and also W4A4, both with different random seeds. All attempts separately and the average results came out smaller than expected.
I also tried to compare the results without the 8-bit for the first and last layers restriction and got a similar accuracy drop compared to the relevant results from the paper.

I will appreciate your support in understanding if there is some issue here, although I didn't change anything in the code...

Thanks in advance.

Question for case setting in paper

您好,非常感谢你们的工作,在这里我想问一下关于论文中case2和case3的问题。 文中你们指出case3没对当前block的激活进行量化,但是我在阅读你们的代码时发现在进行block重建时你们计算损失是在对激活量化之后进行的。
err = loss_func(out_quant, cur_out)
这里的out_quant是对激活进行量化之后的特征,所以我想知道case2和case3的区别。
谢谢!

Questions about “set_first_last_layer_to_8bit” function

hello,in the “set_first_last_layer_to_8bit” function at quant_model.py,What role does this sentence “module_list[-2].act_quantizer.bitwidth_refactor(8)” play?It is not the last layer in the resnet18. Actually i find it is the stage4.unit2.conv2 module but its act_quantizer is disabled.
I want to know what the purpose of this sentence is。thank you very much If you answer my question

Unable to reproduce the results in paper

您好,我采用run.sh的默认参数以及pretrain model,此时设置了第一层和最后一层为8bit,在ResNet18上跑出量化结果为:

  1. w2a2:50.73
  2. w3a3:65.32
  3. w2a4:64.57
    都达不到论文里面的结果。
    您这边大概知道可能的原因吗?

关于检测模型的代码

您好,非常感谢您这么优秀的工作,代码仓库中只看到了ImageNet的相关代码,请问能否开源检测任务上的相关代码呢,非常感谢!

confused about activation quantization

Few questions about activation quantization.

  • in the real depolyment,we always fuse the activations,so why we need to quantize the activation output.
  • to the best of my knowledge , normally what activation function does is that to clip the value calculated by the previous layer. By doing activation quantization,we clipped the quantized value by using an quantized activation funciton. It's quite reasonable. So could you please give me some explanation or more intuition about why we need to adjust or fine tune the quantized activation?

Qdrop-BERT experiment configurations

Hello :)

First of all, it was a pleasure to see your cool papers and experimental results.

There is one question that I am facing, so I leave this issue here.

Among the experimental results, I am curious about the experimental setting for the BERT model on GLUE task.
The paper did not explain how the block structure was established.
I would be very grateful if you could tell me about the detailed experimental settings.

iters of reconstruction

hello, in your code, I notice the iters of reconstruction for each resnet layer is 20,000. I try to turn it to 1000/5000, and find 20,000 is best, but there is not much difference between 1000 and 5000. So I am curious how much the iters will determine the final accuracy, and whether we can reduce it to 5000~1000 without much hesitation.

如何获取量化后的weight和input

你好,我尝试从imagenet_utils.validate_model(val_loader,model)中加hook获取量化后的input和weight,但是失败了,似乎validate_mode里面的仍然是以浮点类型进行的运算。请问如何可以获得量化后模型的整型input,和weight?
谢谢

"mse" 中 p 的取值

您好,想问一下,为什么这里的 p 用的是 2.4 呀?

# L_p norm minimization as described in LAPQ
# https://arxiv.org/abs/1911.07190
score = lp_loss(x, x_q, 2.4, reduction='all')

The optimization order

Thanks for your work!
I have a question regarding the optimization order.
In BRECQ, authors first separately perform AdaRound then they optimize input quantization parameters. But in your implementation you jointly optimized both parameters. Does this approach give better performance than BRECQ's?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.