magic-research / dataset_quantization Goto Github PK
View Code? Open in Web Editor NEW[ICCV2023] Dataset Quantization
[ICCV2023] Dataset Quantization
Dear maintainers, i have found your paper really interesting and would like to replicate the results with different embedding function f, that was mentioned in the paper.
However, i have faced such a piece of code below, found in submodular.py. I struggle to follow the logic behind such embedding construction and have not found any mention of such procedure in paper text. Could you please explain the need in that procedure and some intuition behind that?
bias_parameters_grads = torch.autograd.grad(loss, outputs)[0]
weight_parameters_grads = self.model.embedding_recorder.embedding.view(batch_num, 1,
self.embedding_dim).repeat(1, self.args.num_classes, 1) *\
bias_parameters_grads.view(batch_num, self.args.num_classes,
1).repeat(1, 1, self.embedding_dim)
gradients.append(torch.cat([bias_parameters_grads, weight_parameters_grads.flatten(1)],
dim=1).cpu().numpy())```
Thanks for the great work and code !
I was going over the code ad realize that the bin creation relies on an N x N similarity matrix, where N is the number of examples code line.
That would create lead to memory issues when scaling to large datasets with 10 M or 100 M examples because that would need a matrix of size 10Mx10M or 100Mx100M.
Have you thought about suggestions to address those use-case ?
Dear maintainers,
I really liked your paper and now curious to reproduce it. Unfortunately, following your instructions in README leads to the sequence of errors.
Please consider updating requirements.
Dear maintainers, i have found your paper really interesting and would like to replicate the results with different dataset, but i face following problem:
i have pretrained it, and when i use timm
for evaluating the quantized ImageNet data and run the code:
sh distributed_train.sh 9 [TRAIN_ROOT] [EVAL_ROOT] --select-indices [INDICES1] [INDICES2] --output [OUTPUT_DIR] --model resnet50 --sched cosine --epochs 260 --lr 0.6 --reprob 0.6 --remode pixel --batch-size 128 --amp --aug-splits 3 -aa rand-m9-mstd0.5-inc1 --resplit --split-bn --jsd --dist-bn reduce
it always reminds me that : train.py: error: unrecognized arguments: --select-indices select_indices_SOP_0.125.npy -aa v0
do you know what the matter is ? Thank for your reply.
There is a typo in the instruction finetuning part.
--output_dir ./data/alpaca_data_k5_1k.json
should be
--output_dir ./data/alpaca_data_dq_k5_1k.json
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.