Comments (11)
exactly ! the only thing now is that at some point code_1
might become abandoned with no XP pointing to it. But that is fine I think, as anyway it will almost always still be cheaper than having 1 repo per XP (unless you have a single XP you retry 1000 times, but that's typically not the case :) )
from dora.
you mean in order to execute this code only or as an indication of what code was being ran ?
from dora.
Here is what I use. I think it'd be great to add it to dora default, because it facilitates working with old setups/config to .e.g understand whether/why an experiment does not work anymore.
import git
def save_git(git_path):
"""Backup git repo to path for replicability."""
curr_git = git.Repo(__file__, search_parent_directories=True)
curr_date = curr_git.commit().committed_date
bckp_date = 0
if git_path.exists():
bckp_git = git.Repo(git_path)
bckp_date = bckp_git.commit().committed_date
if bckp_date < curr_date:
if git_path.exists():
shutil.rmtree(git_path)
curr_git.clone(git_path)
from dora.
@adefossez I mean both for execution and as an indication.
from dora.
Execution would be cool, although hard to make in a reliable manner.
@kingjr I think you always assume you are working from a clean commit in that case. This is rarely the case in practice.
from dora.
See #9
The clean_git
option can be set globally on a project. In that case, before any remote job submission, the repository will be cloned for each XP and each XP will run from its individual code.
This allows both to keep track of the code used for an XP, as well as protect a pending or preempted XP from future code changes.
However, if an XP is stopped and rescheduled (not preempted), the clone will be updated and I do not try to keep track of the different versions used across the lifetime of an XP, only the last one (as this would be way too complex).
from dora.
Closing the task now that the option (actually git_save
, not git_clean
) has been added.
from dora.
This feature is great! Though, we have a blocker for adopting it. We have a rather large repo (>1Gb, many files), and are looking to launch many small xp
's (1k+).
It seems the current git_save
solution duplicates the repo for each xp
. Would it be difficult to extend it for the case where the repo-to-clone is large, and would ideally be shared e.g. via a symlink to a single, central copy in grids/code
?
Thanks,
Robert
P.S. in the case of amended grids, I suppose more "central copies" would be added in grids/code
and symlinks in xps would track which version of the code they ran with?
from dora.
Definitely for 1k+ this is going to be painfully slow. I could make shared clone for all jobs scheduled at the same time. The only limitation in that case is that the clone would be stuck at one commit, because it is almost impossible to reference count properly how many times it is used and if it is safe to change its current commit. But this could be solved either with a dora clean
command that could look for such abandonned clones, or just ignored for the time being, as anyway there would still be a lot less of those clone than when we have one clone per XP.
from dora.
thanks for the fast response! If I'm understanding correctly, it might look like the following? Let me know if I'm misunderstanding.
dora grid mygrid
- Launches xps: {A, B}
- Dir structure looks like:
outputs/
grids/
mygrid/
A -> ../../xps/A
B -> ../../xps/B
xps/
A/
code -> ../../code/code_1
B
code -> ../../code/code_1
code/
code_1/
- B failed for some reason, and C is added to sweep script, sweep script changes are committed the repo. User runs
dora grid mygrid --retry
. - Dir structure now looks like:
outputs/
grids/
mygrid/
A -> ../../xps/A
B -> ../../xps/B
C -> ../../xps/C
xps/
A/
code -> ../../code/code_1
B
code -> ../../code/code_2. <- upversioned
C
code -> ../../code/code_2 <- added
code/
code_1/
code_2/ <- added
from dora.
awesome! yes I think that would definitely be ok (and that edge case indeed sounds acceptable haha). This version of things would be so cool :). Can't wait to use it if you think it's doable!
from dora.
Related Issues (20)
- `dora grid ... -t 0` crashes if the job hasn't logged anything yet HOT 1
- Cannot import name 'hydra_main' from 'dora' on Colab or Kaggle environment HOT 2
- Can I train on multiple machines? HOT 2
- Slurm Configuration HOT 1
- World size by dora_distrib.world_size() is equal to 1 when I have two GPU's HOT 2
- Run a grid experiment for the first time HOT 1
- Why only the log file of rank > 0 is created? HOT 2
- How to run with torchrun? HOT 3
- Can not work on multi machines with multi gpus HOT 2
- Initializing Dora xp/using Dora HOT 1
- Can we train with dora on multiple machines without SLURM? HOT 2
- Is there any way to use the Debugger of VSCode while using "dora run"? HOT 3
- Cannot install due to requirement of "sklearn" HOT 1
- How to add the --export=ALL option to srun? HOT 2
- Support for custom resolvers with Hydra HOT 2
- No training when using 2 nodes and torchrun HOT 6
- [Feature request] Export grid tree table to LaTeX/csv HOT 3
- No stop command? HOT 2
- Now I want to debug dora,Is dora parsing from the train.py file?
- Python Debugger and dora HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dora.