Comments (5)
Good questions. Can you reduce the probability matrix to float16. or if its still too big, potentially even unsigned int (8bits)?
from cleanlab.
If you decide to split in batches, try to split up the classes into 100 batches, each of 1k classes, where all the label noise between classes will be self-contained in each batch (i.e. types of dogs go in one batch if label noise is between types of dogs, and all the airplanes go in another batch, if your label noise is unlikely to have dogs mislabeled as planes)
If the above is tricky for you, you can also just split it up randomly, but you may want to run it a few times and then combine all the label errors.
from cleanlab.
Thank you for the quick response.
Good questions. Can you reduce the probability matrix to float16. or if its still too big, potentially even unsigned int (8bits)?
I can reduce precision to 8bits per element but still, it's gonna be almost 1Tb.
If you decide to split in batches, try to split up the classes into 100 batches, each of 1k classes, where all the label noise between classes will be self-contained in each batch (i.e. types of dogs go in one batch if label noise is between types of dogs, and all the airplanes go in another batch, if your label noise is unlikely to have dogs mislabeled as planes)
I've been trying to apply it for facial recognition task so I don't think splitting classes into self-contained batches is possible in my case.
If the above is tricky for you, you can also just split it up randomly, but you may want to run it a few times and then combine all the label errors.
Is it better to split them by classes or samples?
from cleanlab.
from cleanlab.
Got it, thank you.
from cleanlab.
Related Issues (20)
- Error in null: Ambiguous truth value of a Series HOT 4
- Add end-to-end tests at the end of Datalab quickstart tutorial
- get rid of warnings in the datalab quickstart tutorial
- Remove Tensorflow version constraint in developer dependencies
- add unit test with all identical dataset HOT 3
- Difference of object detection confident learning with objectlab paper HOT 1
- update coveragerc to only skip over specific experimental subfolders that currently are untested
- Null issue check throwing an error HOT 1
- lab.find_issues(features=features) outputs error for underperforming issue HOT 1
- Object detection, segmentation k-fold practical issue HOT 1
- Trying to create Datalab object with label set to a dtype of 'category' but getting 'NotImplementedError'
- test_scores_for_identical_examples unit test fails
- be able to pass in kwargs to plt.show()
- datalab issue guide should better describe the relevant cleanlab columns
- Trying to build docs with a new notebook I have created but getting `AttributeError` from the audio.ipynb tutorial HOT 1
- Doctests are failing for some functions HOT 1
- In the “Synthetic Data Quality” part, do we need the same amount of real data and generated data HOT 1
- image datalab tutorial broken: Getting build error RuntimeError: Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [64, 1, 1, 28, 28] HOT 2
- 3D Cleanlab / DCAI ?
- Follow-Up: Revert macOS CI Environment to Latest Version Once Python Compatibility Is Resolved
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cleanlab.