Git Product home page Git Product logo

Comments (7)

GeorgiosSmyrnis avatar GeorgiosSmyrnis commented on July 24, 2024

Hi @xfgao!

To help identify the issue, could you share the command with which you are running download_upstream.py? Also, is the above the full error message?

Thanks!

from datacomp.

xfgao avatar xfgao commented on July 24, 2024

We were running the following command to download data:
python download_upstream.py --scale datacomp_1b --data_dir DATA_DIR
We were able to download all the metadata and a bunch of tar files at the beginning, but after a certain point we keep getting the error message:

Traceback (most recent call last):
 File "<string>", line 1, in <module>
 File "/opt/conda/envs/datacomp/lib/python3.10/multiprocessing/spawn.py", line 116, in spawn_main
   exitcode = _main(fd, parent_sentinel)
 File "/opt/conda/envs/datacomp/lib/python3.10/multiprocessing/spawn.py", line 126, in _main
   self = reduction.pickle.load(from_parent)
 File "/opt/conda/envs/datacomp/lib/python3.10/multiprocessing/synchronize.py", line 110, in __setstate__
   self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory

from datacomp.

rom1504 avatar rom1504 commented on July 24, 2024

The error seems to be a limit at the OS level on the number of thread you can open

Either decrease the number of thread you are using, increase the limit or use a different machine

If you provide some info on your environment it could help

from datacomp.

xfgao avatar xfgao commented on July 24, 2024

Thanks for the response. For the data downloading, I'm using Ubuntu 20.04 on an AWS EC2 g5.12xlarge instance (with 48 cpu cores). After reducing the processes_count to 8 and thread_count to 8, I'm still getting the the same FileNotFoundError error.

from datacomp.

rom1504 avatar rom1504 commented on July 24, 2024

Can you try using virtual env instead of conda?

from datacomp.

xfgao avatar xfgao commented on July 24, 2024

Do we have a requirement.txt file for setting up virtual env?

from datacomp.

GeorgiosSmyrnis avatar GeorgiosSmyrnis commented on July 24, 2024

You can try installing the packages listed under pip in the environment.yml, if I am not mistaken it should achieve something similar to the desired environment if your system python is the correct version (although this needs to be verified). You should still train with the original environment to avoid other issues - but for just the data download it should be fine.

from datacomp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.