Comments (10)
What’s the content of veros_batch.sh
?
from veros.
#!/bin/bash -l
#
#SBATCH -p aegir
#SBATCH -A ocean
#SBATCH --job-name=acc_lr
#SBATCH --time=23:59:59
#SBATCH --constraint=v2
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=32
#SBATCH --exclusive
export BH_STACK=openmp
module load bohrium/05102018 veros/05102018
veros-resubmit -i acc.lowres -n 50 -l 62208000 -c "python acc.py -b bohrium -v debug" --callback "/usr/bin/sbatch /groups/ocean/nutrik/veros_cases/paper/acc/veros_batch.sh"
from veros.
How long (in real time) do the 29 day runs take?
from veros.
11 minutes.
The issue is not about wall clock time limit exceedance. Every model cycle is successfully completed with "Timing summary" information but the case is not resubmitted.
from veros.
And how do you execute the first run?
from veros.
sbatch veros_batch.sh
from veros.
Ok. So the thing is that resubmission has an inherent race condition. The job reschedules itself, waits for a short while, then exits, but if the wait time is too short it might be that the job is killed before the new job has gone through. Unless there is a reliable way to determine whether a new job has been scheduled there's not much we can do about that.
One thing you can try is to add a time.sleep(5)
after the following line:
If that doesn't help, we'll have to dig deeper.
On a side note, the reason why it works without the --callback
argument is that the default behavior for the script is to call itself. In this case, it just runs again on the same node, and is never really rescheduled. It will thus time out after 24h.
from veros.
Yeah, adding time.sleep(5)
has solved the issue! Thanks!
from veros.
But, I think the master branch must be fixed for other users.
from veros.
Let’s keep this open for now. I’ll probably introduce some waiting time, but I doubt we can make the problem go away in every case.
from veros.
Related Issues (20)
- Non-hydrostatic solver HOT 5
- Linear solver issue when parallelising VEROS HOT 1
- Document overturning variables HOT 2
- Bathymetry in Veros HOT 1
- Changing grid resolution and a closed ITF in global_4deg setup HOT 1
- How to run Veros via MPI on a cluster with slurm? HOT 1
- How to run veros with multi-GPU HOT 5
- Perturbation to basic state variables HOT 20
- The latest model version is not displayed correctly in documentation HOT 2
- How to change the sampling frequency of Veros 1
- How to change the sampling and output frequency of veros 1_deg model ? HOT 6
- No output files generating while using mpirun with JAX HOT 18
- How to add a mask file to veros and how to call in the model setup ? HOT 15
- Output variables HOT 11
- ENH: more metadata in diagnostic outputs? HOT 2
- North Atlantic Missing assets.json HOT 2
- Set beginning and ending time for simulation? HOT 4
- Veros using 10% of CPU HOT 1
- Using the North Atlantic regional setup for other regions HOT 5
- Update function HOT 17
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from veros.