maxhalford / pytorch-resample Goto Github PK
View Code? Open in Web Editor NEW๐ฒ Iterable dataset resampling in PyTorch
License: MIT License
๐ฒ Iterable dataset resampling in PyTorch
License: MIT License
Hello,
I have classification dataset of 32k samples, where 89% of my data is class 0 and the other 11% is class 1. I tried to use pytorch-resample to Undersample my 0-class, with a distribution of 0.5/0.5. It did work great, but I noticed that the dataset differs at each epoch, making the training useless. I know there is a random seed initialized the first time we call the Undersample class, but I didn't manage to reset this seed after each epoch since the dataset is part of the DataLoader.
Have you not encountered a similar behavior? Do you have an idea for a fix? Thanks in advance
Nice work! I was about to use this handy tool until I realize my problem was even trickier.
I'm dealing with a very large (TB scale) WebDataset, which also inherits IterableDataset. And the dataset can keep growing. My goal is to do balanced sampling based on some attributes of my samples. In other words, I want to have N classes with each of them having equal weights. It would be straightforward to do with this resample tool if I knew all the possible classes. However, that requires me to iterate through the whole dataset, which can take hours - which would have been fine if I only do it once, but I may have to do it over and over since my dataset will grow in size in the future (and new classes will come). I'm also aware that one workaround is to manage it out of the loop by maintaining an incremental list of classes on disk.
Still, it would be even better if this can be handled during training. I was thinking of building the desired_dist
dynamically by initializing it with an empty dict and adding unseen classes to it with equal constant weights on-the-fly. It seems this might work but I've not tested so. Do you think this is something worth having in the repo? And do you see any caveats of doing so? Any suggestions are appreciated.
Hi
Thanks for the wonderful code.
Can you please recheck you HybridSampler implementation?
I could not run the examples and the benchmark code from anywhere especially for the HybridSampler.
The other parts of the code runs absolutely fine.
Thanks
Devraj
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.