Git Product home page Git Product logo

Comments (6)

adefossez avatar adefossez commented on June 25, 2024 1

I see. In any case it is a good idea to add but I wanted to make sure it was worth it. It does require a few non trivial changes in how things are done. But I should be able to get a first version sometime next week :)

One thing I was wondering on the longer term (in particular for the brainmagick evaluation kind of workflows) was how to design a Dora-like generic API for any target function in python, as long as there is some way to define a reliable job signature from the function arguments. Then you would get an API similar to submitit or things like ProcessPool, except it would do the resumability and job de duplication for you transparently! And this would be much easier to apply to something that is not your main training script. Anyway just curious if something like that would be useful in the long run to you guys.

from dora.

adefossez avatar adefossez commented on June 25, 2024

Yes this is definitely possible, I can work on that :)

from dora.

adefossez avatar adefossez commented on June 25, 2024

Might I ask though if in your case it wouldn't be more appropriate to schedule those small jobs directly with submitit ? Given that they are short it seems like you might not need all the capabilities of dora to handle long running jobs (cancellation and resumability). Are those main model training or some kind of post training evaluation ? Are you actually using the dora command line interface to read the results ?

from dora.

kwanUm avatar kwanUm commented on June 25, 2024

Hey Alex, we are using the dora command line interface at the moment, although it's a bit hard to use due to the many lines at the table.

Our jobs are typically running a heavy model for 10-20K times in row for the purpose of optimizing protein sequences. They finish at about a few hours up to a few days (depending on the model being run).

We do need dora's resuming and cancellation capabilities and are constantly using them.
Does it req a significant change in dora?

from dora.

robert-verkuil avatar robert-verkuil commented on June 25, 2024

Amazing if you think the job_array switch might take effect quickly! For @kwanUm and I, that's our largest blocker for Dora usage currently (more than git_save). We've canceled our latest jobs and are probably going to use hydra directly as a workaround until job_array support is added. (losing all the nice Dora benefits in the meantime.)


For the more general Dora-like functionality w/ a concurrent.futures -like interface, this is maybe interesting.... Will think about it more? Currently submitit/Dask with manual checkpointing to file-system solves many problems for the alternative workloads I'm thinking of. IIUC, Dora would additionally bring - {multi-node, easy retries of failures, easy grid-style sweeping + grouping of results, command-line display of results for quick monitoring}?

from dora.

robert-verkuil avatar robert-verkuil commented on June 25, 2024

We've been using this successfully on the development branch and now master. Thanks so much @adefossez! ❤️
This enables us to do at-scale sweeps again, and was put together lightning-fast. 💪

from dora.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.