Git Product home page Git Product logo

veros's People

Contributors

dependabot-preview[bot] avatar dependabot[bot] avatar dionhaefner avatar iuryt avatar jonasdelacour avatar jrpedersen avatar kinow avatar madsbk avatar nutrik avatar rloewe avatar tomchor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

veros's Issues

Linear solver issue when parallelising VEROS

Hey all,

I have run into an issue when trying to parallelize the new version of VEROS on a cluster.
The problem arises when using the PETsc linear solver which is set as the default if you use cpu and over 1 process.
The problem is a division by zero in the _petsc_solver function in the:

rel_residual=residual_norm/rhs_norm

line. So the only way I found around this is to explicitly state what solver to use in the bash file:

export 'VEROS_LINEAR_SOLVER'='scipy'

before running the resubmit command.

The models I have tried with the same problem are: ACC_channel, ACC_basic, ACC_sector and the north Atlantic setup.

The cluster uses slurm, and this is what is written in the batch file:

#!/bin/bash -l
#SBATCH -p mycluster
#SBATCH -A myaccount
#SBATCH --job-name=veros_mysetup
#SBATCH --time=23:59:59
#SBATCH --constraint=v1
#SBATCH --nodes=1
#SBATCH --ntasks=16
#SBATCH --cpus-per-task=1
#SBATCH --threads-per-core=1
#SBATCH --exclusive

export OMP_NUM_THREADS=2

export 'VEROS_LINEAR_SOLVER'='scipy'
veros resubmit -i acc -n 1 -l 31536000
-c "srun --mpi=pmi2 -- veros run acc_sector.py -b jax --float-type float32 -n 4 4"
--callback "sbatch veros_batch.sh"

I have also tried changing the number of taskes with no change.

The two version of VEROS I have tried are:
veros/040422_cpu_py3.9.10
veros/240322_cpu_py3.9.10

Hope this help and is the information needed,
Cheers,
Rasmus

Failure at runtime on ubuntu 14.04

I've installed veros on ubuntu 14.04 as per the intructions. When I try to run the eady.py example, I get a failure that leads to an abort and core
~/veros/setup/eady$ python eady.py
Setting up everything
Initializing streamfunction method
determining number of land masses

      Land mass and perimeter

0    5   10   15   20   25   30   35

35 111111111111111111111111111111111111
34 111111111111111111111111111111111111
33 ************************************
32 000000000000000000000000000000000000
31 000000000000000000000000000000000000
30 000000000000000000000000000000000000
29 000000000000000000000000000000000000
28 000000000000000000000000000000000000
27 000000000000000000000000000000000000
26 000000000000000000000000000000000000
25 000000000000000000000000000000000000
24 000000000000000000000000000000000000
23 000000000000000000000000000000000000
22 000000000000000000000000000000000000
21 000000000000000000000000000000000000
20 000000000000000000000000000000000000
19 000000000000000000000000000000000000
18 000000000000000000000000000000000000
17 000000000000000000000000000000000000
16 000000000000000000000000000000000000
15 000000000000000000000000000000000000
14 000000000000000000000000000000000000
13 000000000000000000000000000000000000
12 000000000000000000000000000000000000
11 000000000000000000000000000000000000
10 000000000000000000000000000000000000
9 000000000000000000000000000000000000
8 000000000000000000000000000000000000
7 000000000000000000000000000000000000
6 000000000000000000000000000000000000
5 000000000000000000000000000000000000
4 000000000000000000000000000000000000
3 000000000000000000000000000000000000
2 ************************************
1 222222222222222222222222222222222222
0 222222222222222222222222222222222222
0 5 10 15 20 25 30 35

solving for boundary contribution by island 0
solving for boundary contribution by island 1
Cannot load library: /usr/lib/libbh_ve_openmp.so: undefined symbol: _ZN7bohrium4jitk17write_source2fileERKSsRKN5boost10filesystem4pathEmS2_b
terminate called after throwing an instance of 'std::runtime_error'
what(): ConfigParser: Cannot load library
Aborted (core dumped)

Looking at the core dump doesn't seem to help, though I may be doing this wrong:
gdb ~/path_to_python core
[New LWP 3825]
[New LWP 3828]
[New LWP 3830]
[New LWP 3829]
Core was generated by `python eady.py'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f4ca0f4ac37 in ?? ()
(gdb) bt
#0 0x00007f4ca0f4ac37 in ?? ()
#1 0x00007f4ca0f4e028 in ?? ()
#2 0x0000000000000020 in ?? ()
#3 0x0000000000000000 in ?? ()

Any suggestions?

Ensure reproducibility

  • Can we somehow bundle model code with output?
  • Should we pin a setup to a specific version of Veros?
  • How can we attach version information and setup specification as metadata on outputs?

Job resubmission with job scheduler doesn't work

I was not able to find out the reason behind resubmission issue with job scheduler, such as:
veros-resubmit -i acc.lowres -n 50 -l 62208000 -c "python acc.py -b bohrium -v debug" --callback "/usr/bin/sbatch /groups/ocean/nutrik/veros_cases/paper/acc/veros_batch.sh"
Although jobs with run length of up to 29 days are resubmitted fine, those with longer run length are not resubmitted and no errors or messages are reported.

In fact, jobs are successfully resubmitted without scheduler (--callback "./veros_batch.sh") for any run length.

Put Veros on PyPI

Necessary steps:

  • Move assets from git lfs to external webspace and handle dynamical asset download
  • Brush up setup.py to comply with best practices
  • Make sure scripts work in system-wide installation and cross-platform

Parallel scalability benchmarks?

Hi Dion,

Nice work with Veros! I skimmed through the docs and your slides from AMS. I have a few questions:

  • Can Veros run in distributed memory mode (beyond a single node)?
  • Have you done any parallel scalability benchmarks -- even if only for shared memory?

I am looking at your benchmarks for different problem sizes, and one thing confuses me. For 1e7 elements, you get, for example for Veros numpy, 65 s for a 4-core system and 90 s for a 24-core system + GPU. Am I not reading this correctly? This seems to me like that performance is worse with more cores, but I cannot imagine that would be the case.

Build a better testing suite

So far, we use a home brew testing suite with limited flexibility and robustness. Ideally, we would want a testing suite that

  • builds on a robust framework like pytest
  • can be used for debugging / investigating why tests fail
  • keeps the existing tests against PyOM. but adds capabilities for Veros-specific unit and system tests
  • plays nicely with codecov.

Lock files while downloading assets

I am trying to run global_flexible setup from Veros' setup gallery.

Run script:

#!/bin/bash -l
# 
#SBATCH -p aegir
#SBATCH -A ocean
#SBATCH --job-name=flexdeg
#SBATCH --time=23:59:59
#SBATCH --constraint=v1
#SBATCH --nodes=2
#SBATCH --ntasks=32
#SBATCH --cpus-per-task=1
#SBATCH --exclusive
##SBATCH --mail-type=ALL
##SBATCH --mail-user=<REDACTED>
##SBATCH --output=slurm.out

export OMP_NUM_THREADS=1
module load veros/23052019

srun -v --mpi=pmi2 --kill-on-bad-exit python -m mpi4py global_flexible.py -n 8 4 -b bohrium >& veros_run.log

and getting md5 sum mismatch of forcing & bathymetry files

Veros log file:

srun: defined options for program `srun'
srun: --------------- ---------------------
srun: user           : `nutrik'
srun: uid            : 16001
srun: gid            : 16000
srun: cwd            : /lustre/hpc/ocean/nutrik/veros_cases/global_flexible
srun: ntasks         : 32 (set)
srun: cpus_per_task  : 1
srun: nodes          : 2 (set)
srun: jobid          : 13466356 (default)
srun: partition      : default
srun: profile        : `NotSet'
srun: job name       : `2deg'
srun: reservation    : `(null)'
srun: burst_buffer   : `(null)'
srun: wckey          : `(null)'
srun: cpu_freq_min   : 4294967294
srun: cpu_freq_max   : 4294967294
srun: cpu_freq_gov   : 4294967294
srun: switches       : -1
srun: wait-for-switches : -1
srun: distribution   : unknown
srun: cpu_bind       : default (0)
srun: mem_bind       : default (0)
srun: verbose        : 1
srun: slurmd_debug   : 0
srun: immediate      : false
srun: label output   : false
srun: unbuffered IO  : false
srun: overcommit     : false
srun: threads        : 60
srun: checkpoint_dir : /var/slurm/checkpoint
srun: wait           : 0
srun: nice           : -2
srun: account        : (null)
srun: comment        : (null)
srun: dependency     : (null)
srun: exclusive      : false
srun: bcast          : false
srun: qos            : (null)
srun: constraints    : mincpus-per-node=1 mem-per-cpu=1024M
srun: geometry       : (null)
srun: reboot         : yes
srun: rotate         : no
srun: preserve_env   : false
srun: network        : (null)
srun: propagate      : NONE
srun: prolog         : (null)
srun: epilog         : (null)
srun: mail_type      : NONE
srun: mail_user      : (null)
srun: task_prolog    : (null)
srun: task_epilog    : (null)
srun: multi_prog     : no
srun: sockets-per-node  : -2
srun: cores-per-socket  : -2
srun: threads-per-core  : -2
srun: ntasks-per-node   : -2
srun: ntasks-per-socket : -2
srun: ntasks-per-core   : -2
srun: plane_size        : 4294967294
srun: core-spec         : NA
srun: power             :
srun: remote command    : `python -m mpi4py global_flexible.py -n 8 4 -b bohrium'
srun: launching 13466356.0 on host node172, 16 tasks: [0-15]
srun: launching 13466356.0 on host node173, 16 tasks: [16-31]
srun: route default plugin loaded
srun: Node node172, 16 tasks started
srun: Node node173, 16 tasks started
WARNING: Error in initializing MVAPICH2 ptmalloc library.Continuing without InfiniBand registration cache support.
2019-05-28 16:26:47.708 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.708 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.713 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.708 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.713 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.708 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.713 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.708 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.713 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.708 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.713 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.708 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.713 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.708 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.713 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.708 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.713 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.708 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.713 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.708 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.713 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.708 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.713 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.708 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.713 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.708 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.713 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.708 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.713 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.708 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.713 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:47.713 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset ETOPO5_Ice_g_gmt4.nc ...
2019-05-28 16:26:48.468 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset forcing_1deg_global_interpolated.nc ...
2019-05-28 16:26:48.496 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset forcing_1deg_global_interpolated.nc ...
2019-05-28 16:26:48.583 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset forcing_1deg_global_interpolated.nc ...
2019-05-28 16:26:48.595 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset forcing_1deg_global_interpolated.nc ...
2019-05-28 16:26:48.606 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset forcing_1deg_global_interpolated.nc ...
2019-05-28 16:26:48.611 | INFO     | veros.tools.assets:get_asset:73 - Downloading asset forcing_1deg_global_interpolated.nc ...
Traceback (most recent call last):
  File "/groups/ocean/software/python/gcc/3.6.7/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/groups/ocean/software/python/gcc/3.6.7/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/groups/ocean/software/mpi4py_mvapich231/gcc/3.0.1/lib/python3.6/site-packages/mpi4py/__main__.py", line 7, in <module>
    main()
  File "/groups/ocean/software/mpi4py_mvapich231/gcc/3.0.1/lib/python3.6/site-packages/mpi4py/run.py", line 196, in main
    run_command_line(args)
  File "/groups/ocean/software/mpi4py_mvapich231/gcc/3.0.1/lib/python3.6/site-packages/mpi4py/run.py", line 47, in run_command_line
    run_path(sys.argv[0], run_name='__main__')
  File "/groups/ocean/software/python/gcc/3.6.7/lib/python3.6/runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/groups/ocean/software/python/gcc/3.6.7/lib/python3.6/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/groups/ocean/software/python/gcc/3.6.7/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "global_flexible.py", line 16, in <module>
    DATA_FILES = veros.tools.get_assets('global_flexible', os.path.join(BASE_PATH, 'assets.yml'))
  File "/lustre/hpc/ocean/software/veros/repo23052019/veros/tools/assets.py", line 81, in get_assets
    return {key: get_asset(val['url'], val.get('md5', None)) for key, val in assets.items()}
  File "/lustre/hpc/ocean/software/veros/repo23052019/veros/tools/assets.py", line 81, in <dictcomp>
    return {key: get_asset(val['url'], val.get('md5', None)) for key, val in assets.items()}
  File "/lustre/hpc/ocean/software/veros/repo23052019/veros/tools/assets.py", line 77, in get_asset
    raise AssetError('Mismatching MD5 checksum on asset %s' % target_filename)
veros.tools.assets.AssetError: Mismatching MD5 checksum on asset forcing_1deg_global_interpolated.nc
srun: Complete job step 13466356.0 received
slurmstepd: error: *** STEP 13466356.0 ON node172 CANCELLED AT 2019-05-28T16:26:49 ***
srun: Job step aborted: Waiting up to 32 seconds for job step to finish.
srun: Complete job step 13466356.0 received
srun: Received task exit notification for 16 tasks (status=0x0009).
srun: error: node172: tasks 0-15: Killed
srun: Terminating job step 13466356.0
srun: Complete job step 13466356.0 received
srun: Received task exit notification for 16 tasks (status=0x0009).
srun: error: node173: tasks 16-31: Killed

inconsistent licence

The licence stated in the discussion paper is GPL. The licence file in the repository is MIT. If you want the code for instance to be licensed under GPL>=3.0, the LICENSE file should be replaced, and it is best to be precise in the paper, e.g. "... code is available under the GNU General Public License version 3.0, or, at your option, any higher version."

By the way, it is fantastic that this code is developed and shared (as free software)—an important contribution to ocean modelling and oceanography!

Changes in grid origin

Hi Dion,
When I change x_origin of Veros grid (91. --> 181.) in setup/global_1deg/global_one_degree.py, the resulted Land mass and perimeter (see below) contain New Zealand as a lake area (or sea) and the model returns the following:

/groups/ocean/software/bohrium/gcc/14112017/lib64/python2.7/site-packages/bohrium/array_create.py:167: RuntimeWarning: Encountering an operation not supported by Bohrium. It will be handled by the original NumPy.
  return numpy.array(ary, dtype=dtype, copy=copy, order=order, subok=subok, ndmin=ndmin, fix_biclass=False)
/lustre/hpc/ocean/nutrik/veros/veros/core/external/solve_poisson.py:75: RuntimeWarning: divide by zero encountered in divide
  Z[2:-2, 2:-2] = np.where(Y != 0., 1. / Y, 1.)
Traceback (most recent call last):
  File "global_one_degree.py", line 273, in <module>
    simulation.setup()
  File "/lustre/hpc/ocean/nutrik/veros/veros/veros.py", line 234, in setup
    external.streamfunction_init(self)
  File "/lustre/hpc/ocean/nutrik/veros/veros/decorators.py", line 50, in veros_method_wrapper
    res = function(*args, **kwargs)
  File "/lustre/hpc/ocean/nutrik/veros/veros/core/external/streamfunction_init.py", line 166, in streamfunction_init
    solve_poisson.initialize_solver(vs)
  File "/lustre/hpc/ocean/nutrik/veros/veros/decorators.py", line 50, in veros_method_wrapper
    res = function(*args, **kwargs)
  File "/lustre/hpc/ocean/nutrik/veros/veros/core/external/solve_poisson.py", line 18, in initialize_solver
    preconditioner = _jacobi_preconditioner(vs, matrix)
  File "/lustre/hpc/ocean/nutrik/veros/veros/decorators.py", line 50, in veros_method_wrapper
    res = function(*args, **kwargs)
  File "/lustre/hpc/ocean/nutrik/veros/veros/core/external/solve_poisson.py", line 76, in _jacobi_preconditioner
    return scipy.sparse.dia_matrix((Z.flatten(), 0), shape=(Z.size, Z.size)).tocsr()
  File "/groups/ocean/software/python/gcc/2.7.13/lib/python2.7/site-packages/scipy/sparse/base.py", line 764, in tocsr
    return self.tocoo(copy=copy).tocsr(copy=False)
  File "/groups/ocean/software/python/gcc/2.7.13/lib/python2.7/site-packages/scipy/sparse/dia.py", line 354, in tocoo
    mask &= (self.data != 0)
  File "ufuncs.pyx", line 594, in ufuncs._handle__array_ufunc__ (/groups/ocean/software/tarballs/bohrium/bohrium14112017/build/bridge/npbackend/ufuncs.c:12150)
  File "bhary.pyx", line 105, in bhary.fix_biclass_wrapper.inner (/groups/ocean/software/tarballs/bohrium/bohrium14112017/build/bridge/npbackend/bhary.c:3381)
  File "ufuncs.pyx", line 161, in ufuncs.Ufunc.__call__ (/groups/ocean/software/tarballs/bohrium/bohrium14112017/build/bridge/npbackend/ufuncs.c:5755)
AttributeError: 'tuple' object has no attribute 'shape'

Similar story happens to the setup/wave_propagation case or the model cannot initialise barotropic stream function solver and returns the following error:

  File "..../veros/core/external/streamfunction_init.py", line 69, in streamfunction_init
    raise RuntimeError("found no starting point for line integral")

Do you know how algorithms for "Land mass and perimeter" setup + streamfunction_init work?

                                                      Land mass and perimeter

    0    5   10   15   20   25   30   35   40   45   50   55   60   65   70   75   80   85   90   95  100  105  110  115  120
163 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
162 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
161 ****1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
160 000*******1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
159 000*2********1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
158 *******1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
157 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
156 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
155 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
154 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
153 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
152 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
151 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
150 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
149 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111
148 111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111*1111111111111111111111
147 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111*11111111***111111111111111111111
146 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111*****11******11111111111111111111
145 1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111**000*****66****111111111111111111
144 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111*0000000******1111111111111111111
143 11111111111111111111111111111111111111111111111111111111111111111111111111111111111111*******00000000000*11111111111111111111
142 11111111111111111111111111111111111111111111111111111111111111111111****11111111111****00000000000000000*11111111111111111111
141 1111111111111111111111111111111111111111111111111111111111111111111**0**11111*******00000000000000000000****11111111111111***
140 111111111111111111111111111111111111111111111111111111*****************11111**00000000000000000000000000000***1111111111***00
139 1111111111111111111111111111111111111111111111111111***000000000000**111111**00000000000000000000000000000000****1111111*0000
138 11111111111111111111111111111111111111111111111111***0000000000000**1111111*00000000000000000000000000000000****111**1***0000
137 1111111111111111111111111111111111111111111111111**000000000000000*1111111**00000000000000000000000000000****1111******000000
136 11111111111111111111111111111111111111111111111111******0000000000*1111111*00000000000000000000000000000**1111****00000000000
135 1111111111111111111111111111111111111111111111111111111*0000000000**1111***00000000000000000000000000000*1*****00000000000000
134 1111111111111111111111111111111111111111111111111111111**0000000000*111**0000000000000000000000000000000***000000000000000000
133 11111111111111111111111111111111111111111111111111111111*0000000000*11**00000000000000000000000000000000000000000000000000000
132 11111111111111111111111111111111111111111111111111111*11*0000000000****000000000000000000000000000000000000000000000000000000
131 11111111111111111111111111111111111111111111111111111*11*00000000000000000000000000000000000000000000000000000000000000000000
130 1111111111111111111111111111111111111111111111111111**1**00000000000000000000000000000000000000000000000000000000000000000000
129 111111111111111111111111111111111111111111111111111*****000000000000000000000000000000000000000000000000000000000000000000000
128 111111111111111111111111111111111111111111111111111****0000000000000000000000000000000000000000000000000000000000000000000000
127 11111111111111111111111111111111111111111111111111***8***00000000000000000000000000000000000000000000000000000000000000000000
126 1111111111111111111111111111111111111111111111111**0*888****00000000000000000000000000000000000000000000000000000000000000000
125 11111111111111111111111111111111111111111111111***0**888888*00000000000000000000000000000000000000000000000000000000000000000
124 1111111111111111111111111111111111111111111*****000*8888****00000000000000000000000000000000000000000000000000000000000000000
123 111111111111111111111111111111111111111111**0000000*8****00000000000000000000000000000000000000000000000000000000000000000000
122 11111111111111111111111111111111111111111**00000000*88*0000000000000000000000000000000000000000000000000000000000000000000000
121 1111111111111111111111111111111111111111**00000000**88*0000000000000000000000000000000000000000000000000000000000000000000000
120 1111111111111111111111111111111111111111**00000****888*0000000000000000000000000000000000000000000000000000000000000000000000
119 11111111111111111111111111111111111***111**0000*8*888**0000000000000000000000000000000000000000000000000000000000000000000000
118 1111111111111111111111111111111111**0*1111******88888*00000000000000000000000000000000000000000000000000000000000000000000000
117 111111111111111111111111111111111**00*111***888888888*00000000000000000000000000000000000000000000000000000000000000000000000
116 111111111111111111111111111111111*000*111**888888*****00000000000000000000000000000000000000000000000000000000000000000000000
115 111111111111111111111111111111111**00*1***88888***000000000000000000000000000000000000000000000000000000000000000000000000000
114 1111111111111111111111111111111111*00****888****00000000000000000000000000000000000000000000000000000000000000000000000000000
113 1111111111111111111111111111111111*00000**8**00000000000000000000000000000000000000000000000000000000000000000000000000000000
112 1111111111111111111111111111111111*000000***000000000000000000000000000000000000000000000000000000000000000000000000000000000
111 1111111111111111111111111111111111*000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
110 1111111111111111111111111111111111*000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
109 111111111111111111111111111111111**000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
108 11111111111111111111111111111111**0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
107 11111111111111111111111111111111***000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
106 1111111111111111111111111111111*11*000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
105 111111111111111111111111111111**1**000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
104 111111111111111111111111111*****1*0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
103 1111111111111111111111111***000***0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
102 ****111111111111111**11***00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000***0000000
101 000**1111111111111**111*00000000***00000000000000000000000000000000000000000000000000000000000000000000000000000000*4*0000000
100 0000**111111111111**11**0000000**6*00000000000000000000000000000000000000000000000000000000000000000000000000000000***0000000
 99 00000*1111111111111****00000000*66*000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 98 00000*11111111111111**000000000*66*000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 97 00000***1111111111111*000000000*66*000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 96 0000000***11111111111*000000000*66***0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 95 000000000*11111111111*000000000*6666*0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 94 000000000*11*11111111*000000000******0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 93 000000000*11***111111*00000000**7*8**0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 92 000000000*11*0*11111**0000000**7**88**000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 91 000000000*11*0**111**0000000**7*****8**00000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 90 000000000*11**0**1**00000000*7**0***88*00000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 89 000000000**11**0***00000000**7*00*8888*00000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 88 000000***0*111**0000000000**77**0***8**00000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 87 000000*1****111**00000000**7777*000***000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 86 000000*111**1111*0000000**77777*000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 85 000000***1111111*00000***77777**000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 84 00000000*1111111*000***7777777**0000000***00000000000000000000000000000000000000000000000000000000000000000000000000000000000
 83 00000000**111111*000*7777777777*******0*9*00000000000000000000000000000000000000000000000000000000000000000000000000000000000
 82 00000000*1*11111**00*777777777***0000*****00000000000000000000000000000000000000000000000000000000000000000000000000000000000
 81 00000000****11111*00*77777777***0******1*******000000000000000000000000000000000000000000000000000000000000000000000000000000
 80 00000000000*11111*****7777777**0000*00****2222*******000000000000000000000000000000000000000000000000000000000000000000000000
 79 00000000000**1111111**7777777**000**0*******22222222****000000000000000000000000000000000000000000000000000000000000000000000
 78 000000000000**1111*1***777777**0*00*0*3*444*22222222222***00000***00000000000000000000000000000000000000000000000000000000000
 77 0000000000000**111***0****77***0**0*0********2*2222222222**00***5***000000000000000000000000000000000000000000000000000000000
 76 00000000000000**11******0****0*0**0*00000000******22222222****55**7**00000000000000000000000000000000000000000000000000000000
 75 000000000000000**1111*1****000******000000****8*0*2222222222*******7*00000000000000000000000000000000000000000000000000000000
 74 0000000000000000***1111111*****00000000000*9****0*222222222**00000***0***0000000000000000000000000000000000000000000000000000
 73 000000000000000000*********000*0000***0000***0000***2222**22**00000000*1**000000000000000000000000000000000000000000000000000
 72 00000000000000000000000000*****000**2*0000000000000**222***22**0000000*11*000000000000000000000000000000000000000000000000000
 71 0000000000000000000000000000000000*2**000*****000000**22*0***2*0000000****000000000000000000000000000000000000000000000000000
 70 0000000000000000000000000000000000***0000*222*****000*22*000***00000000000000000000000000000000000000000000000000000000000000
 69 00000000000000000000000000000000000000000*2222222*00**22*00000000000000000000000000000000000000000000000000000000000000000000
 68 000000000000000000000000000000000000******2222222*00*222**0000000000000000000000000000000000000000000000000000000000000000000
 67 00000000000000000000000000000000000**222222222222*00*2222*0000000000000000000000000000000000000000000000000000000000000000000
 66 0000000000000000000000000000000000**2222222222222****2222**0000000000000000000000000000000***00000000000000000000000000000000
 65 000000000000000000000000000000000**22222222222222222*22222*00000000000000000000000000000***6*00000000000000000000000000000000
 64 000000000000000000000000000000000*222222222222222222222222**0000000000000000000000000000*7***00000000000000000000000000000000
 63 00000000000000000000000000000000**2222222222222222222222222***0000000***00***00000000000***0000000000000000000000000000000000
 62 00000000000000000000000000000****2222222222222222222222222222**000000*8*00*9**00000000000000000000000000000000000000000000000
 61 00000000000000000000000000****22222222222222222222222222222222**00000*8*00**9**0000000000000000000000000000000000000000000000
 60 0000000000000000000000000**222222222222222222222222222222222222*00000***000**9**000000000000000000000000000000000000000000000
 59 0000000000000000000000000*2222222222222222222222222222222222222*000000000000**9*000000000000000000000000000000000000000000000
 58 0000000000000000000000000*2222222222222222222222222222222222222***00000000000***000000000000000000000000000000000000000000000
 57 000000000000000000000000**222222222222222222222222222222222222222*00000000000000000000000000000000000000000000000000000000000
 56 000000000000000000000000*2222222222222222222222222222222222222222*00000000000000000000000000000000000000000000000000000000000
 55 000000000000000000000000*2222222222222222222222222222222222222222**0000000000000000000000000000000000000000000000000000000000
 54 000000000000000000000000**2222222222222222222222222222222222222222*0000000000000000000000000000000000000000000000000000000000
 53 0000000000000000000000000*2222222222222222222222222222222222222222*0000000000000000000000000000000000000000000000000000000000
 52 0000000000000000000000000**222222222222222222222222222222222222222*0000000000000000000000000000000000000000000000000000000000
 51 00000000000000000000000000*22222222222222222222222222222222222222**0000000000000000000000000000000000000000000000000000000000
 50 00000000000000000000000000*22222222222222222222222222222222222222*00000000000000000000000000000000000000000000000000000000000
 49 00000000000000000000000000*22222222222*******2222222222222222222**00000000000000000000000000000000000000000000000000000000000
 48 00000000000000000000000000*2222222222**00000**222222222222222222*000000000000000000000000000000000000000000000000000000000000
 47 00000000000000000000000000*2222*******0000000**2222222222222222**0000000000000000000****0000000000000000000000000000000000000
 46 00000000000000000000000000******00000000000000*2*22222222222222*00000000000000000000*00**000000000000000000000000000000000000
 45 0000000000000000000000000000000000000000000000****2*2222222222**00000000000000000000**00*000000000000000000000000000000000000
 44 0000000000000000000000000000000000000000000000000***2222222222*0000000000000000000000*00***0000000000000000000000000000000000
 43 000000000000000000000000000000000000000000000000000**2222222***000000000000000000000**0000*0000000000000000000000000000000000
 42 0000000000000000000000000000000000000000000000000000***2*****00000000000000000000000*0000**0000000000000000000000000000000000
 41 000000000000000000000000000000000000000000000000000000*******0000000000000000000000**0000*00000000000000000000000000000000000
 40 0000000000000000000000000000000000000000000000000000000*2222*000000000000000000000**000***00000000000000000000000000000000000
 39 0000000000000000000000000000000000000000000000000000000**222*0000000000000000000***000**0000000000000000000000000000000000000
 38 00000000000000000000000000000000000000000000000000000000**2**000000000000000000**0000**00000000000000000000000000000000000000
 37 000000000000000000000000000000000000000000000000000000000***000000000000000000**0000**000000000000000000000000000000000000000
 36 000000000000000000000000000000000000000000000000000000000000000000000000000000*0000**0000000000000000000000000000000000000000
 35 000000000000000000000000000000000000000000000000000000000000000000000000000000*000**00000000000000000000000000000000000000000
 34 000000000000000000000000000000000000000000000000000000000000000000000000000000*0***000000000000000000000000000000000000000000
 33 000000000000000000000000000000000000000000000000000000000000000000000000000000***00000000000000000000000000000000000000000000
 32 00000000000000000000000000000000000000000000000000000000000000000000000000000***000000000000000000000000000000000000000000000
 31 00000000000000000000000000000000000000000000000000000000000000000000000000000*5*000000000000000000000000000000000000000000000
 30 00000000000000000000000000000000000000000000000000000000000000000000000000000***000000000000000000000000000000000000000000000
 29 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 28 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 27 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 26 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 25 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 24 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 23 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 22 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 21 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 20 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 19 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 18 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
 17 000000000000000000000000000000000000000000000***00000000000000000000000000000000000000000000000000000000000000000000000000000
 16 ********************0*******0000000*****0*****8******000000000000000000000000000000000000000000000000000000000000000000000000
 15 8**8888888888888888***88888*********888***8888888888*******000000000000000000000000000000000000000000000000000000000000000000
 14 8888888888888888888888888888888888888888888888888888888888******0000000000000000000000000000000000000000000000000000000000000
 13 888888888888888888888888888888888888888888888888888888888888888**********0000000000000000000000000000000000000000000000000000
 12 888888888888888888888888888888888888888888888888888888888888888888888888********000000000000000000000000000000000000000000000
 11 8888888888888888888888888888888888888888888888888888888888888888888888888888888*****00000000000000000000000000000000000000000
 10 88888888888888888888888888888888888888888888888888888888888888888888888888888888888*00000000000000000000000000000000000000000
  9 8888888888888888888888888888888888888888888888888888888888888888888888888888888888**00000000000000000000000000000000000000000
  8 888888888888888888888888888888888888888888888888888888888888888888888888888888888**000000000000000000000000000000000000000000
  7 88888888888888888888888888888888888888888888888888888888888888888888888888888*****0000000000000000000000000000000000000000000
  6 888888888888888888888888888888888888888888888888888888888888888888888888888***0000000000000000000000000000000000000000000****
  5 888888888888888888888888888888888888888888888888888888888888888888888888888***00000000000000000000000000000000000*********888
  4 88888888888888888888888888888888888888888888888888888888888888888888888888888*************************************88888888888
  3 88888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888
  2 88888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888
  1 88888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888
  0 88888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888888
    0    5   10   15   20   25   30   35   40   45   50   55   60   65   70   75   80   85   90   95  100  105  110  115  120

Compute initial streamfunction from initial velocity

Not knowing the numerics, and possibly not reading the docs carefully enough, it is unclear to me how to initialize the velocities in the model.

For a channel run, re-entrant in x, with intial velocity 0.1 m/s everywhere, no forcing, I tried do in set_initial_conditions: s.u = update(vs.u, at[...], 0.1 * vs.maskU[..., None]).

The velocity signal only lasts for one time step, and then it is gone. It does create a small pressure perturbations that drive internal waves, but the mean flow of 0.1 m/s is immediately gone. Conversely, the initial conditions have psi=0 everywhere, and then immediately on the next time step there is a stream function, but if the units are really m^3/s it is far too small.

Was I to initialize psi at the beginning instead of u, or in addition to u?

Allow MPI-enabled runs from the Python API

Possible through mpi4py.futures, but this will require some prototyping.

The dream:

>>> with MPIContext(n_proc=8) as ctx:
...     sim = MySetup(context=ctx)
...     sim.setup()
...     sim.run()
>>> sim.state
<gathered state>

where workers only live inside the MPIContext and are destroyed afterwards.

Post video!

This sounds awesome but I and I'm sure many others don't want to go through the trouble of installing it to see what it looks like.

Non-hydrostatic solver

Hi, everyone, I just stumbled into this project after watching this JuliaCon session and let me start by saying that I really enjoy its vision and scope!

I've been reading through the docs and it seems like you don't have the capability of running nonhydrostatic LES, do I understand that correctly?

If so, do you envision them to be implemented soon? From the intro I get the feeling that the nonhydrostatic solver will be ported soon from pyOM2, but I see no mention of LES closures. I ask this because I think that if LES are possible with this package, it'll open up many more research avenues given the extensibility of the model and ability to run on multiple GPUs.

Thanks!

Ensure CF compliance

This mostly implies to add canonical names as attributes to all variables (at least those relevant for output).

Memory leaks

Veros is leaking memory, which is particularly noticeable in long-running low-resolution runs. This seems to be caused mostly by pyamg/pyamg#198 and to a lesser degree by bh107/bohrium#360.

As a workaround, it is advisable to re-start simulations every 10,000 time steps or so.

More useful tools to simplify setup definitions

Input field interpolator

E.g. in global_1deg, data is just assumed to be on the same grid as the model setup. Ideally, we would have a smart reader for external data that validates this assumption, and possibly does some interpolation if grids don't match.

Domain and bathymetry

Hello Dear,
1- I have provided the boundary (Domain) and bathymetry file in ".txt "format in my study area. How can I add it in Veros?
also I have sea level and wind time series data in ."txt" format. How can I tuning them in Veros?
2- Unfortunately , I could not find the explanation about the output of the model in the website, How can I illustrate U and V components of currents and sea level?
3- Another issue that I have confused in website is which of scripts (please say the name of them) I should change to produce the physical condition of my region?

Thank you so much

Work on flexible setup

  • Erodes landmasses a bit too aggressively
  • Clearly mark as unvalidated
  • Grid refinement might be a bit too steep

RuntimeError: Two parallel Veros+Bohrium runs on two GPUs on the same node

Hi Dion,

It seems impossible to make 2 parallel/standalone Veros runs on 2 GPUs on the same node. The error message is below. Is it something to do with OpenCL keys?
Do you think that should work if I would have two separate versions of veros and/or bohrium and use them for individual runs?

Time step took 4.14e+00s
Current iteration: 17
build program: binary cache hit (key: 219cf206d832af2614aabaa6095b2a6d)
build program: start
build program: completed, success
pyopencl-invoker-cache-v1: in mem cache hit [key=1fc007e39493fed6f5e0c45672144578c269a48fd12e4bc28dc2ce3b7c1dc753]
build program: binary cache hit (key: 219cf206d832af2614aabaa6095b2a6d)
build program: start
build program: completed, success
pyopencl-invoker-cache-v1: in mem cache hit [key=1fc007e39493fed6f5e0c45672144578c269a48fd12e4bc28dc2ce3b7c1dc753]
Error code: -4
terminate called after throwing an instance of 'cl::Error'
  what():  clEnqueueNDRangeKernel
Traceback (most recent call last):
  File "/groups/ocean/software/veros/inst06032018/bin/veros-resubmit", line 11, in <module>
    load_entry_point('veros', 'console_scripts', 'veros-resubmit')()
  File "/groups/ocean/software/python/gcc/2.7.14/lib/python2.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/groups/ocean/software/python/gcc/2.7.14/lib/python2.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/groups/ocean/software/python/gcc/2.7.14/lib/python2.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/groups/ocean/software/python/gcc/2.7.14/lib/python2.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/lustre/hpc/ocean/software/veros/repo06032018/veros/cli/veros_resubmit.py", line 81, in cli
    resubmit(*args, **kwargs)
  File "/lustre/hpc/ocean/software/veros/repo06032018/veros/cli/veros_resubmit.py", line 61, in resubmit
    call_veros(veros_cmd, identifier, current_n, length_per_run)
  File "/lustre/hpc/ocean/software/veros/repo06032018/veros/cli/veros_resubmit.py", line 46, in call_veros
    raise RuntimeError("Run {} failed, exiting".format(n))
RuntimeError: Run 0 failed, exiting

Distributed memory support

This is going to be me talking to myself for a while to explore the feasibility of introducing distributed memory support to Veros.

Building blocks

Communication primitives from PyOM

  • global barrier
  • broadcast an array (used in streamfunction init)
  • global max / min / sum
  • exchange overlap
  • enforce cyclic boundaries
  • gather an array
  • zonal sum (used in overturning diagnostic)

All of these should be easy to implement with MPI.

Special routines

  • CG solver: Naïve CG as user kernel? Could even be main process only since we are only in 2D.
  • TDMA solver: Can run in chunks
  • Disk I/O: Main process only or truly parallel?

Open questions

Should it be possible to write routines without distributed memory support?

I am worried that requiring all routines to support distributed execution would be too daunting for less experienced programmers. On top of that, we lose some interoperability with 3rd party libraries (e.g. SciPy).

If somewhat feasible, we could introduce a parameter to the @veros_method decorator:

@veros_method(dist_safe=False)
def mylocalmethod(vs):
    ...

Methods of this type could be executed on the main process only, but there are some challenges:

  • We would need some way to detect which arrays have to be communicated before and after the function. Could be done magically via introspection.
  • How should we deal with cases where syncing to the main process consumes to much memory?
  • Is this use case important enough?

An alternative would be to simply throw an error when such a function is executed in a distributed context.

How much should we abstract?

E.g., instead of introducing separate functions for global max, min, sum, zonal mean etc., we could have a generic reduction operator.

Which MPI abstraction should we use?

I only know of MPI4py, but there might be others. How difficult are they to install?

Can we somehow support multi-GPU systems without excessive overhead?

Should setup take place on workers?

Missing net short-wave radiation forcing in global_1deg setup?

It seems like there are missing qnet reading & assignment statements in set_initial_conditions routine of global_1deg setup.
There must be 2 extra lines

        qnet_data = self._read_forcing("q_net")
        vs.qnet = update(vs.qnet, at[2:-2, 2:-2, :], -qnet_data * vs.maskT[2:-2, 2:-2, -1, npx.newaxis])

according to pyOM.

Overhaul backend handling

The goal is to get rid of the np injection magic in veros_methods, and to allow for specialized implementations based on the backend.

Idea

Introduce a backend module that implements all necessary functions, and dynamically dispatch the right function from the backend. Example:

def sum(arr, axis=None):
    if rs.backend == 'bohrium':
        return bh.sum(arr, axis=axis)
    elif rs.backend == 'numpy':
        return np.sum(arr, axis=axis)
    ...

Usage:

import veros.backend as vb

@veros_method
def my_parameterization(vs):
    temp_sum = vb.sum(vs.temp)

pressure solver

Just opening to encourage the pressure solver as an option to the stream function solver...

For the problems I do, I often want to specify the initial velocity (see #271), and apply a momentum source to the velocity (body forcing). The stream function formulation appears to need the initial conditions and the forcing to be explicitly separated into baroclinic and barotropic components. I think with some work and looking at the pyOM manual I could figure out how to do that, but it might be more natural for the model to simply figure it out for me by solving the pressure equation.

Thanks!

More structure and abstraction in core routines

Currently, all core routines are modeled very closely after the corresponding PyOM routines, including variable names and code structure. Additionally, all gradients are calculated via index shifting and slicing. This makes the code hard to read and understand, but has a performance impact, too, since temporary arrays cannot be freed until the (often overly long) routine has finished.

From the top of my head, possible enhancements include:

  • An xarray-style shifting / gradient function instead of explicit index shifts
  • Re-working of some particularly problematic variables like flux_east / flux_north / ...
  • Giving meaningful names to all variables (especially everything currently named fxa, temp, ...)
  • More (inline?) functions instead of docstring-separated code blocks

only acc and eady example setups work for me

Hi there

Firstly, thanks for sharing this code!

I can run the acc and eady models out of the box which is great but when I come to run any of global_1deg, global_4deg, north_atlantic or wave_propagation models, I get the following type of error. Is this something that you have seen before?

(my_root) clim01|Tue Jan 30|00:08:52|veros-run> cd global_1deg/
(my_root) clim01|Tue Jan 30|00:08:56|global_1deg> python global_one_degree.py
/scale_akl_persistent/filesets/home/williamsjh/veros/veros/core/numerics.py:10: UserWarning: Special OpenCL implementations could not be imported
warnings.warn("Special OpenCL implementations could not be imported")
Traceback (most recent call last):
File "global_one_degree.py", line 10, in
DATA_FILES = veros.tools.get_assets("global_1deg", os.path.join(BASE_PATH, "assets.yml"))
File "/scale_akl_persistent/filesets/home/williamsjh/veros/veros/tools/assets.py", line 43, in get_assets
return {key: get_asset(val["url"], val.get("md5", None)) for key, val in assets.items()}
File "/scale_akl_persistent/filesets/home/williamsjh/veros/veros/tools/assets.py", line 43, in
return {key: get_asset(val["url"], val.get("md5", None)) for key, val in assets.items()}
File "/scale_akl_persistent/filesets/home/williamsjh/veros/veros/tools/assets.py", line 36, in get_asset
_download_file(url, target_path)
File "/scale_akl_persistent/filesets/home/williamsjh/veros/veros/tools/assets.py", line 48, in _download_file
with requests.get(url, stream=True, timeout=timeout) as response:
AttributeError: exit
(my_root) clim01|Tue Jan 30|00:09:02|global_1deg>

I have run wget on the source file for this example and it is there.

Thanks for any ideas, I'm a bit stuck!

Jonny

Tone it down a bit

The introduction might be a bit too tacky / salesman-like. I think a slightly more factual tone might help.

Document overturning variables

Hello everyone! I have been trying to run one of the basic setups available in Versos, i.e. , the gloabl_1deg setup on my PC. But each time I try to run, its showing a "killed" message in the terminal. My PC has 8 GB RAM ; Is it too low to run the setup on my PC?

Also I would like to know what the output variable names actually mean. There are some variables for example the "bolus_depth" and "bolus_iso" from the global_4deg setup which inspite of having the same coordinates and attributes (i.e., meridional transport), represent different things. I looked into the official documentation but it has description for only model variables.
Thanks for any kind of help!

Veros + Bohrium on GPU - RuntimeError

Hi Dion,

I cannot make Veros run on GPU.
It seems like a problem with compilation of TDMA solver.
Veros and Bohrium versions are up to date, i.e. cloned 1 day and 6 days ago, respectively.

Run script:

#!/bin/bash -l

export BH_STACK=opencl
veros-resubmit -i wp.05deg.cah -n 50 -l 31104000 -c "python wave_propagation.py -b bohrium -v debug" --callback "veros_gpu_run.sh"

Command line output:

Current iteration: 1
stopping integration at iteration 1
Waiting for lock wp.05deg.cah.0000.restart.h5 to be released
Timing summary:
 setup time               = 74.21s
 main loop time           = 1.24s
     momentum             = 0.67s
       pressure           = 0.00s
       friction           = 0.40s
     thermodynamics       = 0.00s
       lateral mixing     = 0.00s
       vertical mixing    = 0.00s
       equation of state  = 0.00s
     EKE                  = 0.04s
     IDEMIX               = 0.00s
     TKE                  = 0.51s
 diagnostics and I/O      = 0.00s
Traceback (most recent call last):
  File "wave_propagation.py", line 413, in <module>
    run()
  File "/groups/ocean/software/python/gcc/2.7.14/lib/python2.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/groups/ocean/software/python/gcc/2.7.14/lib/python2.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/groups/ocean/software/python/gcc/2.7.14/lib/python2.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/groups/ocean/software/python/gcc/2.7.14/lib/python2.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/lustre/hpc/ocean/software/veros/repo26012018/veros/tools/cli.py", line 49, in wrapped
    run(*args, **kwargs)
  File "wave_propagation.py", line 409, in run
    simulation.run()
  File "/lustre/hpc/ocean/software/veros/repo26012018/veros/veros.py", line 267, in run
    momentum.momentum(self)
  File "/lustre/hpc/ocean/software/veros/repo26012018/veros/decorators.py", line 50, in veros_method_wrapper
    res = function(*args, **kwargs)
  File "/lustre/hpc/ocean/software/veros/repo26012018/veros/core/momentum.py", line 76, in momentum
    friction.implicit_vert_friction(vs)
  File "/lustre/hpc/ocean/software/veros/repo26012018/veros/decorators.py", line 50, in veros_method_wrapper
    res = function(*args, **kwargs)
  File "/lustre/hpc/ocean/software/veros/repo26012018/veros/core/friction.py", line 82, in implicit_vert_friction
    res, mask = utilities.solve_implicit(vs, kss, a_tri, b_tri, c_tri, d_tri, b_edge=b_tri_edge)
  File "/lustre/hpc/ocean/software/veros/repo26012018/veros/decorators.py", line 50, in veros_method_wrapper
    res = function(*args, **kwargs)
  File "/lustre/hpc/ocean/software/veros/repo26012018/veros/core/utilities.py", line 52, in solve_implicit
    return solve_tridiag(vs, a_tri, b_tri, c_tri, d_tri), water_mask
  File "/lustre/hpc/ocean/software/veros/repo26012018/veros/decorators.py", line 50, in veros_method_wrapper
    res = function(*args, **kwargs)
  File "/lustre/hpc/ocean/software/veros/repo26012018/veros/core/numerics.py", line 257, in solve_tridiag
    return tdma_opencl.tdma(a, b, c, d)
  File "/lustre/hpc/ocean/software/veros/repo26012018/veros/core/special/tdma_opencl.py", line 58, in tdma
    prg = compile_tdma(ret.shape[-1], bh.interop_pyopencl.type_np2opencl_str(a.dtype))
  File "/lustre/hpc/ocean/software/veros/repo26012018/veros/core/special/tdma_opencl.py", line 35, in compile_tdma
    """.format(sys_depth=sys_depth, dtype=dtype)
ValueError: Unknown format code 'd' for object of type 'str'
Traceback (most recent call last):
  File "/groups/ocean/software/veros/inst26012018/bin/veros-resubmit", line 11, in <module>
    load_entry_point('veros', 'console_scripts', 'veros-resubmit')()
  File "/groups/ocean/software/python/gcc/2.7.14/lib/python2.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/groups/ocean/software/python/gcc/2.7.14/lib/python2.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/groups/ocean/software/python/gcc/2.7.14/lib/python2.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/groups/ocean/software/python/gcc/2.7.14/lib/python2.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/lustre/hpc/ocean/software/veros/repo26012018/veros/cli/veros_resubmit.py", line 80, in cli
    resubmit(*args, **kwargs)
  File "/lustre/hpc/ocean/software/veros/repo26012018/veros/cli/veros_resubmit.py", line 60, in resubmit
    call_veros(veros_cmd, identifier, current_n, length_per_run)
  File "/lustre/hpc/ocean/software/veros/repo26012018/veros/cli/veros_resubmit.py", line 46, in call_veros
    raise RuntimeError("Run {} failed, exiting".format(n))
RuntimeError: Run 0 failed, exiting

Throughput diagnostic

Implement a simple diagnostic that prints the current throughput in regular intervals (ratio of simulated time to real time). This makes it easier to (i) estimate how long a simulation is going to take, and (ii) allows better comparisons to other packages.

Protect settings and variables

Currently, all settings and variables can be overridden at any time. This might cause subtle bugs, both from inside Veros core routines (e.g. when assigning directly to an array instead of updating its values), setups, or even higher-level code.

It may be worth considering protecting variables from being set to entirely new objects (e.g. by overloading setitem in the Veros class). An equivalent check could be made for settings that cannot be safely changed after setup (this should at least warrant a warning).

Use proper logging

Veros is using the top-level logger provided by logging. To play nicely with other packages, we should instead register a custom logger that offers more fine-grained control via the getLogger(__name__) pattern.

Re-work grid creation process

Currently, set_grid is expected to set the grid origin and spacings. In my experience it is usually more practical to set the grid points directly. We will have to figure out a clean way to do so without breaking compliance with PyOM, though.

Streamfunction solver on GPU

Since the Poisson solver uses scipy.sparse.linalg.bicgstab, all stream function data is copied to CPU and back once per time step. Additionally, this solver is not parallelized. This causes the streamfunction solver to be the most expensive routine on high-end GPU systems.

Ideally, an implementation will be available through Bohrium soon; if that should not be the case, we could implement a solver through Bohrium's PyOpenCL interoperability, or wrap a library.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.