magic-research / dataset_quantization Goto Github PK

View Code? Open in Web Editor NEW

239.0 239.0 17.0 9.24 MB

[ICCV2023] Dataset Quantization

Python 99.99% Shell 0.01% JavaScript 0.01%

dataset_quantization's People

Contributors

Stargazers

Watchers

Forkers

successhaha zhoudaquan eltociear bainaryglobe bainaryglobe codwest bainaryglobe myfirstkindom vityavitalich shadowtinker mldl shabbirhasan1 zhangxin-xd asdlei99

dataset_quantization's Issues

Embedding calculation

Dear maintainers, i have found your paper really interesting and would like to replicate the results with different embedding function f, that was mentioned in the paper.

However, i have faced such a piece of code below, found in submodular.py. I struggle to follow the logic behind such embedding construction and have not found any mention of such procedure in paper text. Could you please explain the need in that procedure and some intuition behind that?

bias_parameters_grads = torch.autograd.grad(loss, outputs)[0]
weight_parameters_grads = self.model.embedding_recorder.embedding.view(batch_num, 1,
                        self.embedding_dim).repeat(1, self.args.num_classes, 1) *\
                        bias_parameters_grads.view(batch_num, self.args.num_classes,
                        1).repeat(1, 1, self.embedding_dim)

gradients.append(torch.cat([bias_parameters_grads, weight_parameters_grads.flatten(1)],
                            dim=1).cpu().numpy())```

Using for large 10M or 100M datasets

Thanks for the great work and code !

I was going over the code ad realize that the bin creation relies on an N x N similarity matrix, where N is the number of examples code line.

That would create lead to memory issues when scaling to large datasets with 10 M or 100 M examples because that would need a matrix of size 10Mx10M or 100Mx100M.

Have you thought about suggestions to address those use-case ?

Requirements Issues

Dear maintainers,

I really liked your paper and now curious to reproduce it. Unfortunately, following your instructions in README leads to the sequence of errors.

ModuleNotFoundError: No module named 'torchcam'
ModuleNotFoundError: No module named 'timm
AttributeError: module 'numpy' has no attribute 'float'.'
from 56 line in util/pos_embed.py
AttributeError: module 'torchvision.models.resnet' has no attribute 'model_urls'

Please consider updating requirements.

training on Imagenet

Dear maintainers, i have found your paper really interesting and would like to replicate the results with different dataset, but i face following problem:

i have pretrained it, and when i use timm for evaluating the quantized ImageNet data and run the code:

sh distributed_train.sh 9 [TRAIN_ROOT] [EVAL_ROOT] --select-indices [INDICES1] [INDICES2] --output [OUTPUT_DIR] --model resnet50 --sched cosine --epochs 260 --lr 0.6 --reprob 0.6 --remode pixel --batch-size 128 --amp --aug-splits 3 -aa rand-m9-mstd0.5-inc1 --resplit --split-bn --jsd --dist-bn reduce

it always reminds me that : train.py: error: unrecognized arguments: --select-indices select_indices_SOP_0.125.npy -aa v0

do you know what the matter is ? Thank for your reply.

Typo in instruction finetuning

There is a typo in the instruction finetuning part.

--output_dir ./data/alpaca_data_k5_1k.json
should be
--output_dir ./data/alpaca_data_dq_k5_1k.json

magic-research / dataset_quantization Goto Github PK

dataset_quantization's People

Contributors

Stargazers

Watchers

Forkers

dataset_quantization's Issues

Embedding calculation

Using for large 10M or 100M datasets

Requirements Issues

training on Imagenet

Typo in instruction finetuning

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent