Git Product home page Git Product logo

Comments (11)

adefossez avatar adefossez commented on June 25, 2024 1

exactly ! the only thing now is that at some point code_1 might become abandoned with no XP pointing to it. But that is fine I think, as anyway it will almost always still be cheaper than having 1 repo per XP (unless you have a single XP you retry 1000 times, but that's typically not the case :) )

from dora.

adefossez avatar adefossez commented on June 25, 2024

you mean in order to execute this code only or as an indication of what code was being ran ?

from dora.

kingjr avatar kingjr commented on June 25, 2024

Here is what I use. I think it'd be great to add it to dora default, because it facilitates working with old setups/config to .e.g understand whether/why an experiment does not work anymore.

import git

def save_git(git_path):
    """Backup git repo to path for replicability."""
    curr_git = git.Repo(__file__, search_parent_directories=True)
    curr_date = curr_git.commit().committed_date
    bckp_date = 0
    if git_path.exists():
        bckp_git = git.Repo(git_path)
        bckp_date = bckp_git.commit().committed_date
    if bckp_date < curr_date:
        if git_path.exists():
            shutil.rmtree(git_path)
        curr_git.clone(git_path)

from dora.

kwanUm avatar kwanUm commented on June 25, 2024

@adefossez I mean both for execution and as an indication.

from dora.

adefossez avatar adefossez commented on June 25, 2024

Execution would be cool, although hard to make in a reliable manner.

@kingjr I think you always assume you are working from a clean commit in that case. This is rarely the case in practice.

from dora.

adefossez avatar adefossez commented on June 25, 2024

See #9
The clean_git option can be set globally on a project. In that case, before any remote job submission, the repository will be cloned for each XP and each XP will run from its individual code.

This allows both to keep track of the code used for an XP, as well as protect a pending or preempted XP from future code changes.

However, if an XP is stopped and rescheduled (not preempted), the clone will be updated and I do not try to keep track of the different versions used across the lifetime of an XP, only the last one (as this would be way too complex).

from dora.

adefossez avatar adefossez commented on June 25, 2024

Closing the task now that the option (actually git_save, not git_clean) has been added.

from dora.

robert-verkuil avatar robert-verkuil commented on June 25, 2024

@adefossez

This feature is great! Though, we have a blocker for adopting it. We have a rather large repo (>1Gb, many files), and are looking to launch many small xp's (1k+).

It seems the current git_save solution duplicates the repo for each xp. Would it be difficult to extend it for the case where the repo-to-clone is large, and would ideally be shared e.g. via a symlink to a single, central copy in grids/code?

Thanks,
Robert

P.S. in the case of amended grids, I suppose more "central copies" would be added in grids/code and symlinks in xps would track which version of the code they ran with?

from dora.

adefossez avatar adefossez commented on June 25, 2024

Definitely for 1k+ this is going to be painfully slow. I could make shared clone for all jobs scheduled at the same time. The only limitation in that case is that the clone would be stuck at one commit, because it is almost impossible to reference count properly how many times it is used and if it is safe to change its current commit. But this could be solved either with a dora clean command that could look for such abandonned clones, or just ignored for the time being, as anyway there would still be a lot less of those clone than when we have one clone per XP.

from dora.

robert-verkuil avatar robert-verkuil commented on June 25, 2024

thanks for the fast response! If I'm understanding correctly, it might look like the following? Let me know if I'm misunderstanding.

  1. dora grid mygrid
  2. Launches xps: {A, B}
  3. Dir structure looks like:
outputs/
    grids/
        mygrid/
             A -> ../../xps/A
             B -> ../../xps/B
    xps/
          A/
                code -> ../../code/code_1
          B
                code -> ../../code/code_1
    code/
         code_1/
  1. B failed for some reason, and C is added to sweep script, sweep script changes are committed the repo. User runs dora grid mygrid --retry.
  2. Dir structure now looks like:
outputs/
    grids/
        mygrid/
             A -> ../../xps/A
             B -> ../../xps/B
             C -> ../../xps/C
    xps/
          A/
                code -> ../../code/code_1
          B
                code -> ../../code/code_2.  <- upversioned
          C
                code -> ../../code/code_2  <- added
    code/
         code_1/
         code_2/  <- added

from dora.

robert-verkuil avatar robert-verkuil commented on June 25, 2024

awesome! yes I think that would definitely be ok (and that edge case indeed sounds acceptable haha). This version of things would be so cool :). Can't wait to use it if you think it's doable!

from dora.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.