Comments (7)
Hi @xfgao!
To help identify the issue, could you share the command with which you are running download_upstream.py
? Also, is the above the full error message?
Thanks!
from datacomp.
We were running the following command to download data:
python download_upstream.py --scale datacomp_1b --data_dir DATA_DIR
We were able to download all the metadata and a bunch of tar files at the beginning, but after a certain point we keep getting the error message:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/opt/conda/envs/datacomp/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/opt/conda/envs/datacomp/lib/python3.10/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
File "/opt/conda/envs/datacomp/lib/python3.10/multiprocessing/synchronize.py", line 110, in __setstate__
self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory
from datacomp.
The error seems to be a limit at the OS level on the number of thread you can open
Either decrease the number of thread you are using, increase the limit or use a different machine
If you provide some info on your environment it could help
from datacomp.
Thanks for the response. For the data downloading, I'm using Ubuntu 20.04 on an AWS EC2 g5.12xlarge instance (with 48 cpu cores). After reducing the processes_count
to 8 and thread_count
to 8, I'm still getting the the same FileNotFoundError
error.
from datacomp.
Can you try using virtual env instead of conda?
from datacomp.
Do we have a requirement.txt file for setting up virtual env?
from datacomp.
You can try installing the packages listed under pip
in the environment.yml
, if I am not mistaken it should achieve something similar to the desired environment if your system python is the correct version (although this needs to be verified). You should still train with the original environment to avoid other issues - but for just the data download it should be fine.
from datacomp.
Related Issues (20)
- 14% of SHA256 hashes not matching HOT 32
- the normal success rate and downloading speed? HOT 1
- `zeroshot_templates` split error for FairFace / UTKFace HOT 9
- Deduplication against evaluation sets HOT 1
- Remove CSAM, if present HOT 2
- Metadata for datacomp-large text-based filter HOT 1
- Pretraining dataset HOT 1
- Training log HOT 1
- Frequency of Leaderboard Updates HOT 1
- About update metadata with the corresponding image sample in shards HOT 2
- ModuleNotFoundError: No module named 'training' HOT 2
- Availability of npy indices for large pool
- Average caption length for CommonPool HOT 1
- Downloading Commonpool XLarge
- ImageNet 21k based filtered dataset HOT 1
- Invalid files for Datacomp1B
- Problems in run train.py HOT 3
- Metadata downloading fails and no way to resume the download
- Redundant labels in iWILDCAM eval data
- Label Errors in ImageNet-O Eval Set
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from datacomp.