binitai / betterloader Goto Github PK
View Code? Open in Web Editor NEWA better PyTorch data loader capable of custom image operations and image subsets
Home Page: https://binitai.github.io/BetterLoader/
License: MIT License
A better PyTorch data loader capable of custom image operations and image subsets
Home Page: https://binitai.github.io/BetterLoader/
License: MIT License
Adding a sampler
param option for the BetterLoader
As we begin moving towards supporting unsupervised learning, one of the first steps will be allowing a user to pass an SubsetRandomSampler object into the BetterLoader, which would be used in order to arbitrarily load data.
Let's use #19 to do this.
As we require more and more custom params to be set at the underlying dataloader level, specifying all of these in a variable object that is a key-value pair of the metadata object would be useful.
We're going to end up adding more and more constructor args to mimic DataLoader args. This deals with that whole problem entirely.
Add a dataloader_params
key to the dataset_metadata
parameter passed into the BetterLoader. This would contain a dict
of key-value pairs that we would want to set on the Dataloader level
We really need to give the landing page a facelift
You don't always have access to subset and index files saved - sometimes you want to generate them dynamically. Being able to pass in objects/arrays is more useful than passing in just filenames, especially since those filenames can just be opened and then accessed anyway.
We may have to tweak the metadata functions that dynamically read these files, but aside from that, I think it would be a useful value-add.
@JamesBollas thoughts?
Right now, we just pass a single transform object in. This is inconvenient if we want different transforms for train
, test
, and val
, as mentioned in #25.
Rename the transform
parameter to transforms
and treat it as a dictionary instead. We can then split it within fetch_segmented_dataloaders
. We would also want to update our tests to reflect this change + cover the edge case where the transforms
parameter is either None
or {}
We currently have 2 integration tests, literally for the sake of having them. This needs to be fixed
Given BetterLoader's modular nature, a comprehensive integration test suite would be key moving forward. Testing various types of index files, as well as the consequent functions that would handle them would all be an integral part of ensuring that we don't break what we've already got :)
We can probably make the website a little more clear overall
We've taken a fairly risky/bad approach by just resorting to setting function params to arbitrary defaults when they aren't passed in. While this works sometimes, I think we've overused this approach and should really trim parts of it down.
Eliminate optional parameters in any non-public function, and actually throw exceptions when things aren't right, rather than passing None
values around
Again, our usage documentation is extremely minimal. To the point where this is probably unusable until we write usage docs.
Just got to take a few hours out and document how far we've got so far :)
Hi !
Is it possible to use a data transform dictionary?
Like this :
data_transforms = {
'train': transforms.Compose([
transforms.Resize([224,224]),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
'val': transforms.Compose([
transforms.Resize([224,224]),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
'test': transforms.Compose([
transforms.Resize([224,224]),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
]),
}
And do you have an example for the dataset_metadata ?
Pretty self explanatory - I wrote out a skeleton, but didn't actually implement anything
The README.md
for this project could probably be more succinct, and documentation about things like the Makefile could probably be removed.
Maybe we could use something like PyTorch's README, as a baseline (ours should be less elaborate, of course). The idea would be to make the README slightly more concise and potentially refined, while still keeping it succinct and to the point.
We currently have no unit tests, literally just some really basic integration tests, to get us off the ground.
We need to write a baseline test suite, looking at things like unit testing the individual custom classes and their helper methods, as well as maybe a few overall integration tests as well
The Dataset Metadata
section of the Getting Started docs page definitely needs some details for the callable function parameters passed as key-value pairs.
The current description is confusing and is really difficult to understand without peeping under the hood.
We should probably add a short function example along with a docstring for every callable parameter listed
Less is definitely not more in our case lmao
Hi !
Is it possible to make a shuffle ?
I did not find in the documentation.
BetterLoader should support unsupervised learning tasks too
The first step is definitely to review the data loading process for models like Autoencoders and chart out a gameplan based on that. Updates to this ticket are coming soon
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.