Git Product home page Git Product logo

pytorch-resample's Issues

Not the same number of samples at each epoch

Hello,

I have classification dataset of 32k samples, where 89% of my data is class 0 and the other 11% is class 1. I tried to use pytorch-resample to Undersample my 0-class, with a distribution of 0.5/0.5. It did work great, but I noticed that the dataset differs at each epoch, making the training useless. I know there is a random seed initialized the first time we call the Undersample class, but I didn't manage to reset this seed after each epoch since the dataset is part of the DataLoader.

Have you not encountered a similar behavior? Do you have an idea for a fix? Thanks in advance

Weighted sampling without knowing all classes beforehand

Nice work! I was about to use this handy tool until I realize my problem was even trickier.

I'm dealing with a very large (TB scale) WebDataset, which also inherits IterableDataset. And the dataset can keep growing. My goal is to do balanced sampling based on some attributes of my samples. In other words, I want to have N classes with each of them having equal weights. It would be straightforward to do with this resample tool if I knew all the possible classes. However, that requires me to iterate through the whole dataset, which can take hours - which would have been fine if I only do it once, but I may have to do it over and over since my dataset will grow in size in the future (and new classes will come). I'm also aware that one workaround is to manage it out of the loop by maintaining an incremental list of classes on disk.

Still, it would be even better if this can be handled during training. I was thinking of building the desired_dist dynamically by initializing it with an empty dict and adding unseen classes to it with equal constant weights on-the-fly. It seems this might work but I've not tested so. Do you think this is something worth having in the repo? And do you see any caveats of doing so? Any suggestions are appreciated.

HybridSampler throwing an error

Hi

Thanks for the wonderful code.
Can you please recheck you HybridSampler implementation?
I could not run the examples and the benchmark code from anywhere especially for the HybridSampler.
The other parts of the code runs absolutely fine.

Thanks
Devraj

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.