Comments (5)
So right now, all jobs on SLURM are indexed using $SLURM_ARRAY_TASK_ID
and execute job-specific setup by sourcing {staging_dir}/jobs/$SLURM_ARRAY_TASK_ID/setup.sh
. That means one of three things needs to happen to make custom job names work out:
- In pipelines with custom names, canine can generate a
{staging_dir}/aliases
file, with one job alias per line. Jobs can read line$SLURM_ARRAY_TASK_ID
to determine their name, then continue setup by running{staging_dir}/jobs/$CANINE_JOB_ALIAS/setup.sh
- The jobs directory should jointly encode the task id and custom name (ie:
{staging_dir}/jobs/0_foo/
) so that jobs can source{staging_dir}/jobs/${SLURM_ARRAY_TASK_ID}_*/setup.sh
. This would allow jobs to jump straight to the correct directory, while still keeping them human-readable. - In pipelines with custom names, canine should symlink
{staging_dir}/alias/{custom name}
to{staging_dir}/jobs/{proper job id}
. That way, jobs can continue to launch as normal, and humans can inspect the workspace by browsing the alias directory
At the moment, I'm leaning towards option 2, because it seems like the simplest change to achieve the desired goal. It also avoids any uniqueness requirements because the outputs/
folder could also follow the same id_alias
naming scheme.
@hurrialice @julianhess what are your thoughts?
from canine.
I like the first one best - I just want a table to trace my jobs and this does not really need to be reflected in the file structure.
If we will have a table of aliases - is it possible that we combine with #5 ?
A possible table format could be -
<job_id> <custome_name> <job_status>
from canine.
Okay, so it seems like overall, nobody really needs the jobs/
directory to be labeled with entity names, so here's my compromise:
jobs/
stays numbered by the array task id- Custom aliases are set within
setup.sh
like other canine variables - The
output/
folder will use custom aliases (which requires that the aliases all be unique) - The job alias will be included in the output dataframe from
Orchestrator.run_pipeline()
a la #5
from canine.
That is beautiful! 👏
from canine.
closed in 63cc655
from canine.
Related Issues (20)
- mount error not caught HOT 2
- Wipe output directory upon job retry
- Add new condition here for localization that hit quota issue; infinite retries HOT 3
- Delete file on md5 check failure
- Add localizer for DRS URIs, and update GDC handler to use it when available HOT 7
- Raise more informative error when no outputs are found HOT 3
- NFS creation from custom image fails silently HOT 2
- Use fallocate to enforce disk space HOT 1
- Add debug mode to `wait_for_jobs_to_finish`
- Dummy Backend hangs on linux VMs HOT 1
- orchestrator should attempt to run `delocalize.py` for jobs failed from node failures HOT 1
- Enable job shell variable expansion during delocalization
- Add job avoidance unit tests
- Reuse canine backend (docker transient) across different proceesses HOT 2
- local override HOT 5
- Dummy backend on linux permissions issues HOT 3
- Stream localization fails on preemptible NFS server
- localization.sh fails with transfer_bucket
- Copy or symlink when delocalizing a directory? HOT 1
- scancel'ing noop'd jobs is slow HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from canine.