[Feature request] copy the source code to xps dir upon launch (only .py and .yaml files is a good start) about dora HOT 11 CLOSED

facebookresearch commented on June 25, 2024

[Feature request] copy the source code to xps dir upon launch (only *.py and *.yaml files is a good start)

from dora.

Comments (11)

adefossez commented on June 25, 2024 1

exactly ! the only thing now is that at some point code_1 might become abandoned with no XP pointing to it. But that is fine I think, as anyway it will almost always still be cheaper than having 1 repo per XP (unless you have a single XP you retry 1000 times, but that's typically not the case :) )

from dora.

adefossez commented on June 25, 2024

you mean in order to execute this code only or as an indication of what code was being ran ?

from dora.

kingjr commented on June 25, 2024

Here is what I use. I think it'd be great to add it to dora default, because it facilitates working with old setups/config to .e.g understand whether/why an experiment does not work anymore.

import git

def save_git(git_path):
    """Backup git repo to path for replicability."""
    curr_git = git.Repo(__file__, search_parent_directories=True)
    curr_date = curr_git.commit().committed_date
    bckp_date = 0
    if git_path.exists():
        bckp_git = git.Repo(git_path)
        bckp_date = bckp_git.commit().committed_date
    if bckp_date < curr_date:
        if git_path.exists():
            shutil.rmtree(git_path)
        curr_git.clone(git_path)

from dora.

kwanUm commented on June 25, 2024

@adefossez I mean both for execution and as an indication.

from dora.

adefossez commented on June 25, 2024

Execution would be cool, although hard to make in a reliable manner.

@kingjr I think you always assume you are working from a clean commit in that case. This is rarely the case in practice.

from dora.

adefossez commented on June 25, 2024

See #9
The clean_git option can be set globally on a project. In that case, before any remote job submission, the repository will be cloned for each XP and each XP will run from its individual code.

This allows both to keep track of the code used for an XP, as well as protect a pending or preempted XP from future code changes.

However, if an XP is stopped and rescheduled (not preempted), the clone will be updated and I do not try to keep track of the different versions used across the lifetime of an XP, only the last one (as this would be way too complex).

from dora.

adefossez commented on June 25, 2024

Closing the task now that the option (actually git_save, not git_clean) has been added.

from dora.

robert-verkuil commented on June 25, 2024

@adefossez

This feature is great! Though, we have a blocker for adopting it. We have a rather large repo (>1Gb, many files), and are looking to launch many small xp's (1k+).

It seems the current git_save solution duplicates the repo for each xp. Would it be difficult to extend it for the case where the repo-to-clone is large, and would ideally be shared e.g. via a symlink to a single, central copy in grids/code?

Thanks,
Robert

P.S. in the case of amended grids, I suppose more "central copies" would be added in grids/code and symlinks in xps would track which version of the code they ran with?

from dora.

adefossez commented on June 25, 2024

Definitely for 1k+ this is going to be painfully slow. I could make shared clone for all jobs scheduled at the same time. The only limitation in that case is that the clone would be stuck at one commit, because it is almost impossible to reference count properly how many times it is used and if it is safe to change its current commit. But this could be solved either with a dora clean command that could look for such abandonned clones, or just ignored for the time being, as anyway there would still be a lot less of those clone than when we have one clone per XP.

from dora.

robert-verkuil commented on June 25, 2024

thanks for the fast response! If I'm understanding correctly, it might look like the following? Let me know if I'm misunderstanding.

dora grid mygrid
Launches xps: {A, B}
Dir structure looks like:

outputs/
    grids/
        mygrid/
             A -> ../../xps/A
             B -> ../../xps/B
    xps/
          A/
                code -> ../../code/code_1
          B
                code -> ../../code/code_1
    code/
         code_1/

B failed for some reason, and C is added to sweep script, sweep script changes are committed the repo. User runs dora grid mygrid --retry.
Dir structure now looks like:

outputs/
    grids/
        mygrid/
             A -> ../../xps/A
             B -> ../../xps/B
             C -> ../../xps/C
    xps/
          A/
                code -> ../../code/code_1
          B
                code -> ../../code/code_2.  <- upversioned
          C
                code -> ../../code/code_2  <- added
    code/
         code_1/
         code_2/  <- added

from dora.

robert-verkuil commented on June 25, 2024

awesome! yes I think that would definitely be ok (and that edge case indeed sounds acceptable haha). This version of things would be so cool :). Can't wait to use it if you think it's doable!

from dora.

[Feature request] copy the source code to xps dir upon launch (only .py and .yaml files is a good start) about dora HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent