Comments (2)
I think seeing an error.run.1 file on disk takes precedence over seeing the task running in slurm (or sge for that matter).
Now how did we get here? Was this maybe another case of job submitted to queue while another instance of the same job was still running? I imagine if the first instance errored while the second one was still pending then we would arrive at this situation.
Or maybe was the second instance submitted after the first instance crashed but before the manager found the error file?
from sisyphus.
I can confirm that the manager reports ans error as soon as the error file is found and also agree that the most likely scenario is that two jobs were submitted and one crashed.
That the manager restarted the job before it found the error file is unlikely. I would not rule it out, but there are steps to prevent this. Otherwise this would probably frequently happen to finished jobs as well.
from sisyphus.
Related Issues (20)
- Sisyphus fails to hash numpy bools HOT 1
- Sisyphus sometimes hangs for a while (several minutes) at startup HOT 11
- Suggestion: Database for cache, `_sis_all_path_available` etc
- AssertionError: Only runnable jobs can list needed tasks HOT 1
- Local job hangs, not available input HOT 2
- More then one matching SLURM task (again) HOT 1
- Auto-restart hanging job if faulty node detected HOT 11
- Possibility to set specific env vars for job HOT 13
- Auto-restart jobs on user-specified error conditions HOT 1
- `Unsupported operand type(s) for *` when running `tk.remove_job_and_descendants(mode="move")` HOT 7
- `sis_hash_helper` on `enum.Enum`s is Python version dependent HOT 2
- Too many open file descriptors HOT 5
- Use `dill` for pickling HOT 2
- `IndexError` in `DelayedFormat` when providing kwargs HOT 5
- Drop older Python support / increase min required Python version
- Incorrect type annotations for rqmt in task.py HOT 3
- Task's resume function is ignored HOT 5
- Handling of requeueing in SLURM HOT 2
- Hooking job exit/failure HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sisyphus.