Git Product home page Git Product logo

fastgpu's Introduction

fastgpu

A queue service for quickly developing scripts that use all your GPUs efficiently

fastgpu provides a single command, fastgpu_poll, which polls a directory to check for scripts to run, and then runs them on the first available GPU. If no GPUs are available, it waits until one is. If more than one GPU is available, multiple scripts are run in parallel, one per GPU.

An API is also provided for polling programmatically, which is extensible for assigning other resources to processes besides GPUs. For details on the API, see the docs for core.

Install

pip install fastgpu

How to use

--help provides command help:

$ fastgpu_poll --help
usage: fastgpu_poll [-h] [--path PATH] [--exit EXIT]

Poll `path` for scripts using `ResourcePoolGPU.poll_scripts`

optional arguments:
  -h, --help   show this help message and exit
  --path PATH  Path containing `to_run` directory
  --exit EXIT  Exit when `to_run` is empty

path defaults to the current directory. The path should contain a subdirectory to_run containing executable scripts you wish to run. It should not contain any other files, although it can contain subdirectories (which are ignored).

fastgpu_poll will run each script in to_run in sorted order. Each script will be assigned to one GPU. The CUDA_VISIBLE_DEVICES environment variable will be set to the ID of this GPU in the script's subprocess. In addition, the FASTGPU_ID environment variable will also be set to this ID.

Once a script is selected to be run, it is moved into a directory called running. Once it's finished, it's moved into complete or fail as appropriate. stdout and stderr are captured to files with the same name as the script, plus stdout or stderr appended.

If exit is 1 (which is the default), then once all scripts are run, fastgpu_poll will exit. If it is 0 then fastgpu_poll will continue running until it is killed; it will keep polling for any new scripts that are added to to_run.

To limit the GPUs available to fastgpu, set CUDA_VISIBLE_DEVICES before polling, e.g.:

CUDA_VISIBLE_DEVICES=2,3 fastgpu_poll script_dir

fastgpu's People

Contributors

jph00 avatar muellerzr avatar sgugger avatar tmabraham avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

fastgpu's Issues

how to use? fastgpu library still work?

hello. I wonder if the function of fastgpu is still working normally - is it possible to use it?

how do i use it? How are you running it? I would be grateful if you could give me some examples of how to run it.

These are the commands I entered.

CUDA_VISIBLE_DEVICES=1,2 fastgpu_poll ~/jupyter-workspace/fastgpu

CUDA_VISIBLE_DEVICES=1,2 fastgpu_poll --path=to_run

CUDA_VISIBLE_DEVICES=1,2 fastgpu_poll hub_test.py

CUDA_VISIBLE_DEVICES=1,2 fastgpu_poll --path=to_run hub_test.py

error message 1: fastgpu_poll: error: unrecognized arguments: /root/jupyter-workspace/fastgpu

error message 2: Traceback (most recent call last):
File "/opt/conda/envs/py38/bin/fastgpu_poll", line 8, in
sys.exit(fastgpu_poll())
File "/opt/conda/envs/py38/lib/python3.8/site-packages/fastscript/core.py", line 76, in _f
func(**args.dict)
File "/opt/conda/envs/py38/lib/python3.8/site-packages/fastgpu/cli.py", line 12, in fastgpu_poll
rp.poll_scripts(exit_when_empty=exit)
File "/opt/conda/envs/py38/lib/python3.8/site-packages/fastgpu/core.py", line 79, in poll_scripts
sleep(poll_interval)
NameError: name 'sleep' is not defined

sleep not defined

I'm having an issue running fastgpu I get the following error

Traceback (most recent call last):
  File "/home/cody/miniconda3/envs/IK/bin/fastgpu_poll", line 10, in <module>
    sys.exit(fastgpu_poll())
  File "/home/cody/miniconda3/envs/IK/lib/python3.8/site-packages/fastscript/core.py", line 76, in _f
    func(**args.__dict__)
  File "/home/cody/miniconda3/envs/IK/lib/python3.8/site-packages/fastgpu/cli.py", line 12, in fastgpu_poll
    rp.poll_scripts(exit_when_empty=exit)
  File "/home/cody/miniconda3/envs/IK/lib/python3.8/site-packages/fastgpu/core.py", line 79, in poll_scripts
    sleep(poll_interval)
NameError: name 'sleep' is not defined

I installed fastcore, fastgpu, and fastscript via conda-forge

fastcore                  1.1.0                      py_0    fastai
fastgpu                   1.0.1              pyh39e3cac_0    fastai
fastscript                1.0.0                         0    fastai

I inspected the source code from core.py and sure enough no import or call to sleep

    def poll_scripts(self, poll_interval=0.1, exit_when_empty=True):
        while True:
            sleep(poll_interval)
            script = find_next_script(self.path/'to_run')
            if script is None:
                if exit_when_empty: break
                else: continue
            ident = self.lock_next()
            if ident is None: continue
            run_name = safe_rename(script, self.path/'running')
            self.run(run_name, ident)

I assume this was supposed to be implmented in fastcore.all or pynvml ....

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.