facebookresearch / kill-the-bits Goto Github PK

Code for: "And the bit goes down: Revisiting the quantization of neural networks"

License: Other

Python 100.00%

kill-the-bits's Introduction

And the bit goes down

This repository contains the implementation of our paper: And the bit goes down: Revisiting the quantization of neural networks (ICLR 2020) as well as the compressed models we obtain (ResNets and Mask R-CNN).

Our compression method is based on vector quantization. It takes as input an already trained neural network and, through a distillation procedure at all layers and a fine-tuning stage, optimizes the accuracy of the network.

This approach outperforms the state-of-the-art w.r.t. compression/accuracy trade-off for standard networks like ResNet-18 and ResNet-50 (see Compressed models).

$Illustration of our method. We approximate a binary classifier $\varphi$ that labels images $x$ as dogs or cats by quantizing its weights. \textbf{Standard method}: quantizing $\varphi$ with the standard objective function \eqref{eq:pq_obj} leads to a classifier $\widehat \varphi_{\text{bad}}$ that tries to approximate $\varphi$ over the entire input space and thus performs badly for in-domains inputs. \textbf{Our method}: quantizing $\varphi$ with our objective function \eqref{eq:ours_obj} leads to a classifier $\widehat \varphi_{\text{good}}$ that performs well for in-domain inputs. \benjamin{Maybe move the $\widehat \varphi_{\text{bad}}$ and $\widehat \varphi_{\text{good}}$ labels from the top left corner into the image close to the respective lines. Then the $\phi(x)$ can come down a little.$

Installation

Our code works with Python 3.6 and newest. To run the code, you must have the following packages installed:

NumPy
PyTorch (version=1.0.1.post2)

These dependencies can be installed with: pip install -r requirements.txt

Compressed Models

The compressed models (centroids + assignments) are available in the models/compressed folder. We provide code to evaluate those models on their standard benchmarks (ImageNet/COCO). Note that inference can be performed both on GPU or on CPU. Note also that we did not optimize this precise part of the code for speed. Indeed, the code for inference should rather be regarded as a proof of concept: based on the centroids and the assignments, we recover the accuracies mentioned in the table above by instantiating the full, non-compressed model.

Vanilla ResNets

We provide the vanilla compressed ResNet-18 and ResNet-50 models for 256 centroids in the low and high compression regimes. As mentioned in the paper, the low compression regime corresponds to a block size of 9 for standard 3x3 convolutions and to a block size of 4 for 1x1 pointwise convolutions. Similarly, the high compression regime corresponds to a block size of 18 for standard 3x3 convolutions and to a block size of 8 for 1x1 pointwise convolutions.

Model (non-compressed top-1)	Compression	Size ratio	Model size	Top-1 (%)
ResNet-18 (69.76%)	Small blocks Large blocks	29x 43x	1.54 MB 1.03 MB	65.81 61.18
ResNet-50 (76.15%)	Small blocks Large blocks	19x 31x	5.09 MB 3.19 MB	73.79 68.21

To evaluate on the standard test set of ImageNet: clone the repo, cd into src/ and run:

python inference.py --model resnet18 --state-dict-compressed models/compressed/resnet18_small_blocks.pth --device cuda --data-path YOUR_IMAGENET_PATH

Semi-supervised ResNet50

We provide the compressed semi-supervised ResNet50 trained and open-sourced by Yalniz et. al. We use 256 centroids and the small blocks compression regime.

Model (non-compressed top-1)	Compression	Size ratio	Model size	Top-1 (%)
Semi-Supervised ResNet-50 (79.30%)	Small blocks	19x	5.20 MB	76.12

To evaluate on the standard test set of ImageNet: clone the repo, cd into src/ and run:

python inference.py --model resnet50_semisup --state-dict-compressed models/compressed/resnet50_semisup_small_blocks.pth --device cuda --data-path YOUR_IMAGENET_PATH

Mask R-CNN

We provide the compressed Mask R-CNN (backbone ResNet50-FPN) available in the PyTorch Model Zoo. As mentioned in the paper, we use 256 centroids and various block sizes to reach an interesting size/accuracy tradeoff (with a 26x compression factor). Note that you need torchvision 0.3 in order to run this part of the code.

Model	Size	Box AP	Mask AP
Non-compressed	170 MB	37.9	34.6
Compressed	6.65 MB	33.9	30.8

To evaluate on COCO: clone the repo, run git checkout mask_r_cnn, cd into src/ and run:

python inference.py --model maskrcnn_resnet50_fpn --state-dict-compressed models/compressed/mask_r_cnn.pth --device cuda --data-path YOUR_COCO_PATH

Results

You can also compress the vanilla ResNet models and reproduce the results of our paper by cd into src/ and by running the following commands:

For the small blocks compression regime:

python quantize.py --model resnet18 --block-size-cv 9 --block-size-pw 4 --n-centroids-cv 256 --n-centroids-pw 256 --n-centroids-fc 2048 --data-path YOUR_IMAGENET_PATH
python quantize.py --model resnet50 --block-size-cv 9 --block-size-pw 4 --n-centroids-cv 256 --n-centroids-pw 256 --n-centroids-fc 1024 --data-path YOUR_IMAGENET_PATH

For the large blocks compression regime:

python quantize.py --model resnet18 --block-size-cv 18 --block-size-pw 4 --n-centroids-cv 256 --n-centroids-pw 256 --n-centroids-fc 2048 --data-path YOUR_IMAGENET_PATH
python quantize.py --model resnet50 --block-size-cv 18 --block-size-pw 8 --n-centroids-cv 256 --n-centroids-pw 256 --n-centroids-fc 1024 --data-path YOUR_IMAGENET_PATH

Note that the vanilla ResNet-18 and ResNet-50 teacher (non-compressed) models are taken from the PyTorch model zoo. Note also that we run our code on a single 16GB Volta V100 GPU.

License

This repository is released under Creative Commons Attribution 4.0 International (CC BY 4.0) license, as found in the LICENSE file.

Bibliography

Please consider citing [1] if you found the resources in this repository useful.

[1] Stock, Pierre and Joulin, Armand and Gribonval, Rémi and Graham, Benjamin and Jégou, Hervé. And the bit goes down: Revisiting the quantization of neural networks.

@inproceedings{stock2019killthebits,
  title = {And the bit goes down: Revisiting the quantization of neural networks},
  author = {Stock, Pierre and Joulin, Armand and Gribonval, R{\'e}mi and Graham, Benjamin and J{\'e}gou, Herv{\'e}},
  booktitle = {International Conference on Learning Representations (ICLR)},
  year = {2020}
}

kill-the-bits's People

Contributors

Stargazers

Watchers

Forkers

github30 tanmdl chaoso langyo howiehao dreadlord1984 tj1116 liealie xieydd molyswu wavelet2008 92ypli kumarkarun leo-xxx trantorrepository wolf1981 yudie433 windzhougithub zhyj3038 pacinoan jy1023408440 wangdeyu ajinkyapuar zyg11 rubenszimbres jamshaidsohail5 zhipengliu6 pwnkrptr roysh ashwathaithal tshepomk fengxingxiang take-cheeze ayushp123 eblancoh shafiahmed cprakashagr geochri superying amir22010 jurjsorinliviu stjordanis marcelomata kuov ray-mami waltersharpwei manikant92 hoainamken wnma3mz naviocean linhduongtuan holianh tuanho27 desinurch zyqqing wtsitp sapi98 proteus1991 moluwaiting arhumsavera hienpham15 faizmisman frankgt gopikishan14 yangtong1989 gottacatchemai 666dzy666 anhnktp luyanfcp lzr9926 gandad haibochina dlyshare liwanning ccsone zhouleisjtu lyk125 wikipedia2008 templeblock juzigithub qinyun123 pierrehao happyxuwork zbpjlc cooparation lemonfmr gitarya tiantian-han kkaory dapenggg iamrishab utkarshnath lliai datakalp devjiu summer1719 lucas-coutinho nononowow yiminchi yejing-lai

kill-the-bits's Issues

How to implement multi-gnu training?

Hi @pierrestock, I would like to ask you how to implement the codes of multi-gnu training. Which parts could be run on multi-gnu, like quantization, fine-tune and global fine-tune? Thank you.

Why the second GPU has memory consumption?

Hi @pierrestock, I would like to ask you a question that I observed during training, My computer has two GPUs, and I use the first GPU to perform training. But the second GPU has a large memory consumption while GPU utilization is 0. Is this consumption necessary and how to avoid it? Thank you.

footprint memory cost!

The example given in 4.1(metrics), I think the indexing cost is not well calculated:
cost for save the centroid: 9 *16(float16)*256 / 1024 =4.5kB
cost for indexing : 16bits * 128(cin subvectors) * 128(cout vector) / 1024 =32KB
can you confirm?

Compression scheduler

Thanks so much for the well-written paper.
I want to share these ideas with you.
(1) I see that you compress all the layers with the same factor. Based on some other compression techniques that I used such as pruning (thresholding), bayesian pruning and matrix factorization, if you compress the last layers much more than the first layers, the accuracy might be much better.

(2) Research question:
Actually, we design a very complex architecture and after that, we design more complex compression techniques to reduce the size. Is it possible to interpret deeply these compression techniques and extract some patterns which could help us design good architecture from the beginning?
Thanks to comment
DeeperDeeper

cannot reproduce the results

Hello,
I tried the command:
python quantize.py --model resnet18 --block-size-cv 9 --block-size-pw 4 --n-centroids-cv 256 --n-centroids-pw 256 --n-centroids-fc 2048 --data-path YOUR_IMAGENET_PATH
However, after i got state_dict_compressed.pth, i run inference with that file, i received the accuracy is only 0.1
I installed the same version of torch, torchvision in requirements.txt
I got the correct accuracy for your file (resnet18_small_blocks.pth)
Could you check it?
Thanks you!

How to illustrate 'in-domain' and 'out-of-domain'?

Hi @pierrestock, I would like to ask you how to illustrate these two terms. What are they indicate in the image-related problem? Why the 'out-of-domain' could be ignored during training? Thanks in advance.

small typo in section 4.2

The results are presented in Figure 3 instead of Figure 1

TypeError: 'module' object is not callable

Step 1: Quantize network
Quantizing layer: layer1.0.conv1, size: [64, 64, 3, 3], n_blocks: 64, block size: 9, centroids: 256, bits/weight: 0.89, compressed size: 0.01 MB
Traceback (most recent call last):
File "/home/zhangweiwei/kill_the_bits/src/quantize.py", line 388, in
main()
File "/home/zhangweiwei/kill_the_bits/src/quantize.py", line 202, in main
stride=stride, padding=padding, groups=groups)
TypeError: 'module' object is not callable

Process finished with exit code 1

The variable 'in_activations_current' becomes 0 valued at the 9th layer of ResNet50

Hi @pierrestock, I met a problem that the variable 'in_activations_current' becomes 0 valued at the 9th layer of ResNet50 network, which causes a problem that the function 'quantizer.encode()' enters an endless loop. I would like to ask you that did you meet such a problem before and how to handled it? Thanks in advance.

RuntimeError: svd_cuda: the updating process of SBDSDC did not converge (error: 8)

RuntimeError: svd_cuda: the updating process of SBDSDC did not converge (error: 8)
on gpu of 2080ti

已压缩模型进行指定图片测试

作者您好：
看到您的程序中有压缩模型，想请问您已压缩的resnet18可以在自己的指定图片上进行目标分类准确度测试吗？按照您给出的评估语句，python inference.py --model resnet18 --state-dict-compressed models/compressed/resnet18_small_blocks.pth --device cuda --data-path YOUR_IMAGENET_PATH
自己创建的Image文件夹，将指定图片放进去，将YOUR IMAGENET PATH更改为./Image，测试不成功。想请您指点一下，谢谢！

how to quantize Mask R-CNN?

processing time

Here you reach very good compression rate but what about the inference time? I think it will be much more than before for every forward step, we have to load the weights from the lookup table based on the indices and run the conv operations? I know that you can, for example, upload the model inside a mobile and after that decompress it and start using it for inference (I have read this in another paper...) Can you clarify?

About multi-gpu training

Hi, i would like to ask you some details about multi-gpu training， i just follow your reply. But i got dead lock . I can not find the reason.

How does dynamic X determine fixed Q(W)?

dear author:
I'm studying your papers.I have one question for your work.

as we all know.X is not fixed.
Different X might make different Q(W)
How you deal it?

Why the activation is the input?

Hi @pierrestock, I would like to ask you a question about the hook,

def _register_hooks(self):
    # define hook to save output after each layer
    def fwd_hook(module, input, output):
        layer = self.modules_to_layers[module]
        if self._watch:
            # retrieve activations
            activations = input[0].data.cpu()
            # store activations
            self.activations[layer].append(activations)

why the activation is the value of input but not the output here? Thanks in advance.

What should I do if I want to quantize Mask-RCNN by myself

Hello,

The responsibility provides the path to quantize resnet. If I want to quantize Mask RCNN, what should I do on the code?

THanks!

train acc is about 60%, but test acc is only 8%

Hi，thanks for your work. I want to reproduce the result in the paper, from the log file, I found that during the training phase, the Top1 acc can reach 60%, but the validation acc is only 8%.It's so werid...

RuntimeError for resnet50 small&large

$ python inference.py --model resnet18 --state-dict-compressed models/compressed/resnet50_large_blocks.pth --device cuda --data-path ../../../ACIQ/ILSVRC2012
Traceback (most recent call last):
File "inference.py", line 104, in
main()
File "inference.py", line 95, in main
top_1 = evaluate(test_loader, model, criterion, device=device).item()
File "/home/renpei/ktb/kill-the-bits/src/utils/training.py", line 101, in evaluate
output = model(input)
File "/home/renpei/anaconda3/envs/KTBEnv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/renpei/ktb/kill-the-bits/src/models/resnet.py", line 172, in forward
x = self.layer1(x)
File "/home/renpei/anaconda3/envs/KTBEnv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/renpei/anaconda3/envs/KTBEnv/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
input = module(input)
File "/home/renpei/anaconda3/envs/KTBEnv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/renpei/ktb/kill-the-bits/src/models/resnet.py", line 65, in forward
out += identity
RuntimeError: The size of tensor a (58) must match the size of tensor b (56) at non-singleton dimension 3

Inference bug.

When I compressed my own resnet18 model, params of bn layer didnot compressed and saved in raw format.
But in https://github.com/facebookresearch/kill-the-bits/blob/master/src/inference.py line 84:87, just load weight and bias，the running_mean and running_var have been ignored. So I got wrong results.

When I checked your compressed resnet18 weights, BN layer just have weight and bias. But the official resnet18 weights trained on imgnet, the bn layer have "weight, bias, mean,var"

About multi-gpu training(i can't reopen the former issue)

sorry to bother you， i still have trouble in multi-gpu training. Could you show me the code for training mask_rcnn? Now i want to quantize a network for pose estimation.

RuntimeError: svd_cuda: the updating process of SBDSDC did not converge

File "/home/zhang/kill-the-bits/src/quantization/em.py", line 83, in step
A_pinv = torch.pinverse(in_activations)
RuntimeError: svd_cuda: the updating process of SBDSDC did not converge (error: 8)
environment：cuda9.0 nvidia-smi:410.93 ubuntu16.04

Does this Quantization model accuracy is calculated on fixed point or floating point?

I have special hardware which supports only fixed-point models 8 bit. so what could be the better approach to quantize this model in fixed point.

I got a bad performance with "mask_r_cnn.pth" .How can I improve it, pelase ?

Hello,
I run the script "python inference.py --model maskrcnn_resnet50_fpn --state-dict-compressed models/compressed/mask_r_cnn.pth --device cuda --data-path my coco path",and got the result :

I think it's not a great performance.

I run the model with 'mask_r_cnn.pth' on some pedestrian images from the web, and got the bad results, the boxes are not accurate,the scores of the boxes are lower than maskrcnn.
How can I improve the performance of the "mask_r_cnn.pth" please ?

About small datasets training

This is a great project. I am a newcomer to vector quantization. I have encountered such a problem with this project.

Since I just want to test the methods in the paper, I am using the CIFAR10 dataset for testing here. The model selects resnet18 and the dataset uses CIFAR10. After fine-tuning the model, save the model as resnet18-cifar10.pth. Finally the run command is just like what is said in the README.md.

python quantize.py --model resnet18 --block-size-cv 9 --block-size-pw 4 --n-centroids-cv 256 --n-centroids-pw 256 --n-centroids-fc 2048 -- Data-path ../cifar10

Although the program has not finished running, the log during the process tells me that the result is not good. For resnet18-cifar10.pth, the accuracy rate can reach about 80%. But in the log this accuracy is very low, no more than 10. As follows,

Quantizing time: 3min, Top1 after quantization: 8.51

Here I changed the path of the data-path to the path of cifar10, and changed the model load in quantize.py to resnet18-cifar10.pth. The data loading in data/dataloader.py was also changed to CIFAR10.

In summary, I don't know which step I have a problem with? Or I ignored some aspect of the paper and looked forward to your reply. Thank you very much for your work and I have benefited a lot.

P.S.
Related files have been uploaded to my github.

Question about open code and different accuracy when infer by using quantized pth file

Thank you for your efforts to make improved achivement in the perspective of high compression ratio with high accuracy.

I tried to validate your code according to your README.md and your paper in my development environment.

I used the github code https://github.com/facebookresearch/kill-the-bits.

After quantize with as follow args, I can get pth files per layer and state_dict_compressed.pth, finally.

Thus using this compressed pth, I ran inference.

The result accuracy is 10%, however When I used the given compresed pth - 'models/compressed/resnet18_small_block.pth', then it shows the accuracy as your paper inform.

The args I used for quantization experiment using your code:

model - resnet18
dataset - imagenet
n-iter - 100, # of EM iteration
n-activations - 1024, size of the batch of activations
block-size-cv - 9, quantization block size for 3x3 conv
block-size-pw - 4, quantization block size for 1x1 conv
block-size-fc - 4, quantization block size for fully-c layers
n-centroids-cv, 256, # of centroids for 3x3 conv
n-centroids-pw, 256, # of centroids for 1x1 conv
n-centroids-fc, 2048, # of centroids for classifier
n-centroids-t, 4, threshold for reducing # of centroids
eps, 1e-8, empty cluster resolution
n-workers, 20, # of workers for data loading
finetune-centroids, 2500, # of iters for layer-wise fine tuning of centroids
lr-centroids, 0.05, Learning rate to fine tune centroids
momentum-centroids, 0.9, momentum when using SGD
weight-decay-centroids, 1e-4, weight decay
finetune-whole, 10000, # of iters for global fine tuning of centroids
lr-whole, 0.01, learning rate to fine tune classifier
momentum-whole, 0.9, momentum when using SGD
weight-decay-whole, 1e-4, weight decay
finetune-whole-epochs, 9, # of epochs for global fine tuning of the centroids
finetune-whole-stepsize, 3, learning rate schedule for global fine tuning of the centroids
batch-size, 128, batch size for fine-tuning step

The development environment is on the pytorch 1.4.0 version with 32GB V100 2-GPUs.

Could you kindly explain the reason I can not validate your notifed accuracy as your paper?

Thank you

How to use '_register_hooks'?

Hi @pierrestock, I would like to ask you how to use the function '_register_hooks' in watcher.py. Could you provide some example codes? In addition, I notice that It may generate errors during saving a model as torch.save may not support the function 'fwd_hook'. Thank you.

About the results on semi-supervised ResNet-50

Hello @pierrestock,

Congratulations on your amazing work! We have been working on vector quantization of neural networks as well, and recently published our findings at https://arxiv.org/abs/2010.15703.

As we mention in our paper, we have trouble reproducing the accuracy of the semi-supervised uncompressed Resnet-50 model reported on your paper:

After downloading the model from https://github.com/facebookresearch/semi-supervised-ImageNet1K-models, instead of 79.3% accuracy, we obtain 78.72%.

I know that you are not the author of that paper, but we were wondering if you could please verify whether you actually obtained the reported accuracy with the uncompressed model. If so, could you please share that model with us? If you did not, could you please let us know so we can amend our paper (and you can amend yours)?

Cheers,

counting ops

Hey,

Thanks for sharing the amazing work! I'm a new bird to network compression and want to know how to compute the flops and network parameters in your quantized model. I tried a counter toolkit but it reports the same number of parameters and flops as regular ResNet-50 does (when I use the pre-trained compressed ResNet-50).

Thx,
Shawn

inference time？

About compare of inference time ？

I can't run python inference.py --model maskrcnn_resnet50_fpn --state-dict-compressed models/compressed/mask_r_cnn.pth --device cuda --data-path YOUR_COCO_PATH

I have downloaded and run follow the readme. But when running
python inference.py --model maskrcnn_resnet50_fpn --state-dict-compressed models/compressed/mask_r_cnn.pth --device cuda --data-path YOUR_COCO_PATH (actually I replaced YOUR_COCO_PATH), it thrown me an error:

usage: inference.py [-h] [--model {resnet18,resnet50,resnet50_semisup}] [--state-dict-compressed STATE_DICT_COMPRESSED] [--device {cpu,cuda}] [--data-path DATA_PATH] [--batch-size BATCH_SIZE] [--n-workers N_WORKERS] inference.py: error: argument --model: invalid choice: 'maskrcnn_resnet50_fpn' (choose from 'resnet18', 'resnet50', 'resnet50_semisup')

I try to insert 'maskrcnn_resnet50_fpn' in model but it still does not run.

Please, help me to solve it.

Thanks a lot.

The val set of imagenet is not stored by its "label folder"!

RuntimeError: Found 0 files in subfolders of: /aiml/data/imagenet/val
Supported extensions are: .jpg,.jpeg,.png,.ppm,.bmp,.pgm,.tif,.tiff,.webp

Reproducing memory of MaskRCNN and semisupervised ResNet-50

Hi @pierrestock,

Sorry to bother you again; there is one last aspect we've had trouble reproducing from the paper -- the memory taken by the MaskRCNN model, which is reported as 6.51 MB, with a 26x compression factor:

We also have a smaller discrepancy with the unsupervised resnet50 model, reported as 5.15 MB in the README:

We have followed the paper, and counted all the codebooks as being stored in float16 format:

However, the paper does not say which encoding is used by the bnorm layers or other layers ignored for the purpose of compression. To reproduce the results on Imagenet, we have used float32 encoding for these two cases. This gives us the following results:

resnet50_small_blocks.pth
bits: 42705152
Byts: 5338144.0
  KB: 5213.03125
  MB: 5.09

resnet50_large_blocks.pth
bits: 26710272
Byts: 3338784.0
  KB: 3260.53125
  MB: 3.18

resnet18_small_blocks.pth
bits: 12927232
Byts: 1615904.0
  KB: 1578.03125
  MB: 1.54

resnet50_semisup_small_blocks.pth
bits: 43655424
Byts: 5456928.0
  KB: 5329.03125
  MB: 5.20

resnet18_large_blocks.pth
bits: 8634624
Byts: 1079328.0
  KB: 1054.03125
  MB: 1.03

mask_r_cnn.pth
bits: 55743008
Byts: 6967876.0
  KB: 6804.56640625
  MB: 6.65

There are however, two small discrepances:

the semi-supervised model, reported at 5.15 MB while we get 5.20 MB, and
the maskrcnn model, reported at 6.51 MB while we get 6.65 MB -- this matches the number that we report in our paper.

We have tried counting the uncompressed and bnorm layers as float16, but that also gives different results than those reported in the paper:

half_codebooks: True, half_weights: True
mask_r_cnn.pth
bits: 54209296
Byts: 6776162.0
  KB: 6617.345703125
  MB: 6.46

with similar results for the semi-supervised resnet50.

half_codebooks: True, half_weights: True
resnet50_semisup_small_blocks.pth
bits: 42114688
Byts: 5264336.0
  KB: 5140.953125
  MB: 5.02

So, my question is: could you please explain how you obtained the model size for mask-rcnn and the unsupervised resnet50?

I have put together a gist that makes it easy to see how we computed our numbers: https://gist.github.com/una-dinosauria/e528b91de3ca9ab108cbf00aba3d9c2a.
Please do make sure to run this on the mask_r_cnn branch of the codebase, as the mask_r_cnn.pth model is missing all the compressed biases on master.

Thank you in advance,
Julieta

=

The process of testing MaskRCNN was killed by system due to over RAM.

Hi everyone,

I have run:
python3 inference.py --model maskrcnn_resnet50_fpn --state-dict-compressed models/compressed/mask_r_cnn.pth --device cpu --data-path MY_COCO_PATH (MY_COCO_PATH is val2017 and annotation)

It stopped at 2/5000 and was killed. I checked RAM and realized that it was over RAM (8Gb).

The terminal showed:
Loading data
loading annotations into memory...
Done (t=21.66s)
creating index...
index created!
loading annotations into memory...
Done (t=0.87s)
creating index...
index created!
Creating data loaders
Using [0, 1.0, inf] as bins for aspect ratio quantization
Count of instances per bin: [85308 31958]
Test: [ 0/5000] eta: 8:26:33 model_time: 5.8540 (5.8540) evaluator_time: 0.1168 (0.1168) time: 6.0788 data: 0.1080
Test: [ 1/5000] eta: 6:46:16 model_time: 3.5983 (4.7261) evaluator_time: 0.0064 (0.0616) time: 4.8763 data: 0.0872
Test: [ 2/5000] eta: 7:05:10 model_time: 5.3340 (4.9287) evaluator_time: 0.1168 (0.0997) time: 5.1042 data: 0.0748
Killed

Please, help me to solve this problem. Thanks a lot.

Empty cluster still remain

Hi there,
I've tried to reproduce this project on EfficientNet but it seems that the advance method to re-initialize the centroid to avoid empty clusters does not help for this case (depthwise convolutions)
even I set the iteration to 1 million and decrease the number of centroids but still can't solve.

Have your team met this issue before and is there any idea for that?

Thank you,

The definition of 'quantizer' in quantize.py

Hi @pierrestock, I notice that the variable 'quantizer' in line 363 of 'quantize.py' is not defined. Is it correct using the definition in the previous 'for' loop? Thank you.

CentroidSGD in centroid_sgd.py

Thanks for your paper and code.
I have a question about CentroidSGD. I am confused about the function of class CentroidSGD. Can I use torch.optim.SGD instead? If not, why?

class CentroidSGD(Optimizer):
    """
    Performs centroids finetuning given the block assignments.

    Args:
        - params: model.parameters()
        - assignments: assignments of each block of size n_blocks
          in the reshaped + unrolled weight matrix of the layers
        - n_centroids: number of centroids used to quantized the layer
        - n_blocks: number of blocks in the reshaped weight matrix
        - lr, momentum, dampening, weight_decay, nesterov: classical
          optimizer parameters, see PyTorch's documentation

    Remarks:
        - After each iteration, the gradients corresponding to the blokcs
          assigned to centroid k are averaged and the same update using
          this averaged gradient is applied to all the corresponding blocks
    """

    def __init__(self, params, lr=required, momentum=0, dampening=0, weight_decay=0, nesterov=False):

suspicious that the code of the compression model saved in the published source code is wrong.

When inference, Top1 hits only 0.1, suspicious that the code of the compression model saved in the published source code is wrong.