ncar / dart Goto Github PK

Data Assimilation Research Testbed

License: Apache License 2.0

Fortran 70.10% HTML 2.55% MATLAB 8.74% Perl 0.88% Shell 14.05% C++ 0.07% C 0.88% Makefile 0.04% Roff 0.84% Awk 0.01% Python 1.41% Slice 0.01% Tcl 0.20% Brainfuck 0.01% Forth 0.01% GAP 0.01% Mercury 0.04% OpenEdge ABL 0.05% q 0.05% Rebol 0.07%

data-assimilation

dart's People

Stargazers

Watchers

Forkers

kdraeder aydogduali eastwind2000 huangynj rameezu prasanthvkrishna myying timhoar xiaolai2020 walterwang1990 learn2free mapoet maggiezj jlaucar alabahrami lgstarn g-romine chmod444 nancycollins fov000 weather4evr giovanniconti83 jmccreight arezoorn ackellerman amrhein bruceyoung01 chenyan2009 daltonkell criedel40 stfmrc shady-ahmed braczka hkershaw-brown yz842614503 wblumberg laspg utplanets jpoterjoy peggypan0411 yacine-prog chihchihu1125 chopinzi jonlabriola mgharamti fairaque1999 mollymwieringa sreeds82 leosiqueira qinminzheng lkugler chanjeunlam senyu30 menakarevel t1-wilson timkod chalachew-github hulaba jsteward-tomorrow benjamin-gunn esaikawa fossell fnrliu raghu330 andrewsoong isabella232 chinghuas hsiang-chi0125 amadibrians roycaitsith achauphan fakechan matteops syha cmbitz miya1010 tjhoar lkampoli hsuct noa-react kenta9638 josephcmy twhan777 ensemblexinxin xuelihuo mleprovost iangrooms ameyabp robin-armstrong sannilaaksonen mjs2369 bgaubertncar senchiao alperaltuntas curian127 ywangwof mattwisniew zsimin97 c-merchant ann-norcio

dart's Issues

obs_converter run_tests.csh upgrade

The obs_converter/run_tests.csh runs preprocess twice, basically. Once in the quickbuild.csh and then again during the 'run' portion of the script.

Also need to have separate input.nml.testing scripts for some converters. Some converters have hardcoded output names that do not match the input.nml examples in the converters' work directory.

Noah model_interpolate returns NaN for single column case

Issue reported on DART_development: https://github.com/NCAR/DART_development/issues/80

TIEGCM posterior spread

Ed note: This issue was originally opened 9 Dec 2016 and is being manually ported over to GitHub

Hi Tim,

My SSUSI OSSE completed, and it seems the observations are having almost no effect on the RMSE and bias. Is that normal? I expected to see an improvement in RMSE and bias.

The observation errors are typically about 10% of the observed values.
tiegcm_ssusi.zip

generate CESM cpl forcing files

Ed note: This issue was originally opened 3 March 2015 and is being manually ported over to GitHub

The NCAR Research Data Archives (RDA) dataset DS 199.1 (DOI: 10.5065/38ED-RZ08) is an 80 member ensemble of 'Data Atmosphere' forcing files from an atmospheric reanalyses from an assimilation with CAM and DART. A description of the data set is available here. The spatial resolution is 2.5° x 1.9° from 0.0E to 357.5E and 90.0S to 90.0N (144 x 96 Longitude/Latitude) and spans the timeframe 1997-12-01 06:00:00 to 2010-12-31 18:00:00

The RDA dataset DS 345.0 is nearly complete (as of 22 June 2020) and does not yet have a DOI.
DS 345.0 contains much more than the DATM files, which are the focus of this Issue.
A usage guide for all of the data types is available here
The data will span the timeframe 2011-01-01 00:00:00 to 2020-01-01 00:00:00. Here is the description of DS 345.0:

These CAM6 Data Assimilation Research Testbed (DART) Reanalysis data products are designed to facilitate a broad variety of research using NCAR's CESM2 models, ranging from model evaluation to (ensemble) hindcasting, data assimilation experiments, and sensitivity studies. They come from an 80 member ensemble reanalysis of the global troposphere and stratosphere using DART and CAM6 from CESM2.1. The data products represent the actual states of the atmosphere ... at a 1 degree horizontal resolution and 6 hourly frequency. Each ensemble member is an equally likely description of the atmosphere, and is also consistent with dynamics and physics of CAM6.

The use of these products to provide atmospheric forcing for CESM model components like CLM or POP is provided by example 'stream text files' found in the models/clm/shell_scripts/cesm2_0/ directory.

ROMS interfaces requested

Ed note: This issue was originally reported 29 September 2015 and is being manually ported over to GitHub.

My name is <...> and I'm ocean modeller using ROMS. Currently I am using 4DVAR and my research domain covers North Pacific basin. My final aim for data assimilation is to predict the ocean state and give initial/boundary condition for regional ocean modeller. I'm now testing 4dvar but I also want to test Ensemble Kalman filter based on DART system. So I would deeply appreciate if anyone give me some guidance or tips to use DART system on ROMS.

29/Sep/15 11:23 AM: The developers of the ROMS interfaces for DART are still working on them and feel they are not at the point they can release the code for testing. If this changes, I will certainly be happy to tell you how to obtain their test code.

Please know that the DART developers and support team are available to answer questions and provide guidance if you choose to pursue a DART & ROMS system.

WOD dataset

Mon, Jun 22, 8:32 PM

I'm a freshman on using the DART. And I want to know where I can find the correponding WOD data, which are used in the DART, after the quality control?

assimilating satellite solar-induced fluorescence (SIF) into CLM4.5

Ed note: This issue was originally reported 26 Nov 2014 and is being manually ported over to GitHub.

Dear Dr. Hoar,

My name is Xi Yang. I am currently working with Dr. Jung-Eun Lee on assimilating satellite solar-induced fluorescence (SIF) into CLM4.5 to see if we can improve the estimation of terrestrial carbon and water fluxes. We have already written a module that is embedded in CLM to produce SIF (thus SIF is produced as a state variable like photosynthesis). We are wondering if we can get your help on the data assimilation. Our goal is:

Perform an OSSE. Simply run the model to harvest some 'truth', and see if we can improve the model estimation using DART.

Assimilate real observations of SIF from satellite (in netcdf format).

I am in the process of 1). I have went through the codes in DART/lanai/models/clm. However I still have one question: according to section 17 of the tutorial, I need to use create_obs_sequence and create_fixed_network_sequence to create obs_seq.in for the assimilation process. Do you have suggestions on how to directly take the SIF value produced by CLM ('truth') and put them into the obs_seq.in file? Thank you!

Have a nice thanksgiving!

Best,
-Xi

Querying model information from a forward operator.

Ed note: This issue was submitted by email.

Is there an established way to get model information (like number of vertical and horizontal grid points) into a forward operator?
What about a variable like temperature all dimensions? What about a vertical profiles at a specific grid point?

Background: DART is designed so that forward operators work with any model. As such, there can be no reliance on knowing the geometry of the model. However: it is frequently very convenient (if not necessary) for a forward operator to know something about the model geometry. The most common case is when the forward operator is a vertical integration - one needs to know how many vertical layers and the layer thicknesses in the model grid.

The common strategy is to keep asking the model to interpolate some known quantity at each vertical level
and keep track of how many successes there are. This is an example from the obs_def_COSMOS_mod.f90:

!=================================================================================
! Determine the number of soil layers and their depths
! using only the standard DART interfaces to model
! set the locations for each of the model levels
!=================================================================================

loc_array = get_location(location) ! loc is in DEGREES
loc_lon   = loc_array(1)
loc_lat   = loc_array(2)

nlevels = 0
COUNTLEVELS : do i = 1,maxlayers
   loc = set_location(loc_lon, loc_lat, real(i,r8), VERTISLEVEL)
   call interpolate(state_handle, ens_size, loc, QTY_GEOPOTENTIAL_HEIGHT, loc_value, loc_istatus)
   if ( any(loc_istatus /= 0 ) ) exit COUNTLEVELS
   nlevels = nlevels + 1
enddo COUNTLEVELS

if ((nlevels == maxlayers) .or. (nlevels == 0)) then
   write(string1,*) 'FAILED to determine number of soil layers in model.'
   if (debug) call error_handler(E_MSG,'get_expected_neutron_intensity',string1,source,revision,revdate)
   istatus = 1
   val     = MISSING_R8
   return
else
!   write(*,*)'get_expected_neutron_intensity: we have ',nlevels,' soil layers'
endif

In this case, the model has to know how to interpolate QTY_GEOPOTENTIAL_HEIGHT. If your forward operator requires some other quantity, you could interpolate that at every model level. Search the forward operator modules for VERTISLEVEL for examples.

Many models support this capability; and for quantities that have a fixed number of layers at all locations, determining the number of layers in the model can be done once and the value stored in the module for use in subsequent operations. But - as noted before - the forward operator should be constructed to avoid assumptions about model structure. Some ocean and land models have a different number of layers at each horizontal gridpoint because of bathymetry or soil depth, so a single value may not be appropriate. For a really sinister thought experiment, consider the case when different ensemble members have a different number of snow layers.

As for knowing the horizontal limits - no - I know of no way to query the horizontal extent of an arbitrary model. And before you try to implement something specific for your model and your forward operator, you might want to recast the question thinking about a solution that applies to any model.

CLM vegetation transmissivity - netCDF write

Ed note: This issue was originally opened 18 March 2016 and is being manually ported over to GitHub

Dear Tim,

How are you?

I have tried to add some code lines in 'model_mod.f90' to output the vegetation transmissivity. The vegetation transmissivity outputs are created but looks like only the first ensemble member has transmissivity values. All other ensemble members show '0' for all grid cells. Could you please help me with correcting the code lines?

I am attaching modified 'model_mod.f90' in which the commented lines by 'kyh03092016' are added parts.
model_mod.txt

Batch submission script for gen_sampling_err_table

The batch submission script for gen_sampling_err_table in the repo, which can be found here:
${DARTROOT}/assimilation_code/programs/gen_sampling_err_table/work/runme.lsf
was created for the LSF system used by Yellowstone/Caldera and is thus named runme.lsf.

I am using gen_sampling_err_table for the Kilo-CAM experiment. While this experiment is being conducted on Shaheen, the queues will be full for the next 24 hours, so I plan to run gen_sampling_err_table on Casper instead, which uses SLURM.

Questions

Should I update the runme.lsf script to use the SLURM SBATCH directives instead of the LSF BSUB directives and make a pull request?
If so, should I: change the file suffix and git mv the file to the new suffix?
Or, should I just make a new script for submitting a SLURM job?

Note well: I could not figure out a way to keep both the LSF BSUB and SLURM SBATCH directives in the same file because SLURM throws an error saying that the "#BSUB -P" line is not acceptable:

$ sbatch runme.lsf
sbatch: invalid option -- 'P'
sbatch: error: Unrecognized command line parameter ?

io routines, unused and legacy

pnetcdf version of aread_state_restart - is this used?

several models have:
use assim_model_mod, only : aread_state_restart, open_restart_read, close_restart

WRF init_ensemble_var.csh perturbations

Ed note: This issue was originally an email and is being manually ported over to GitHub.

I struggle a bit with the DART-WRF tutorial now (https://www.image.ucar.edu/wrfdart/tutorial/):
Step 2: Initial conditions I call
$ ./init_ensemble_var.csh 2017042700
and it returns
$ ncks: ERROR file ../filter_restart_d01.0001 not found. It does not exist on the local filesystem, nor does it match remote filename patterns (e.g., http://foo or foo.bar.edu:file).
Where is the file filter_restart_d01.${icnum} written? I don’t seem to have the file and I can’t find any script producing this file.

Regarding the inquiry on the WRF/DART tutorial earlier today, I'm
looking through the latest set of scripts. One thing that seems to be
off is some changes that were made to init_ensemble_var.csh script.
The block of interest is this:

       if ( -z add_perts.err ) then
          echo "Perts added to member ${n}"
       else
          echo "ERROR! Non-zero status returned from add_bank_perts.ncl. Check ${RUN_DIR}/advance_temp${n}/add_perts.err."
          cat add_perts.err
          exit
       endif

Note that nothing generates a add_perts.err file, so the script ends up
exiting. There is a file generated called add_perts.out, but the file
size is not zero. I think this block was added as a way to make sure the
perturbations were successfully added, but maybe this is a placeholder?

CLM column of values for RTM support

Ed note: This issue was originally reported 2 March 2015 and is being manually ported over to GitHub.

Thanks for you answer. However, In my understanding, 'get_grid_vertval' subroutine does not support above-ground level (e.g., snow layers). You have used the following code lines to get soil temperature at top two layers. I am not sure if I can also use the same approach by just replacing LEVGRND to another varible (for snow layers).

mylocation = set_location( loc_lon, loc_lat, LEVGRND(1), VERTISHEIGHT)
call get_grid_vertval(x, mylocation, 'T_SOISNO', interp_temp_1  , istatus_temp_1   )
mylocation = set_location( loc_lon, loc_lat, LEVGRND(2), VERTISHEIGHT)
call get_grid_vertval(x, mylocation, 'T_SOISNO', interp_temp_2  , istatus_temp_2   )

Tim: I am having trouble with modifying 'model_mod.f90' script for getting posterior TB. Do you have any idea about my question? If I can use 'get_grid_vertval' for snow layer information, which variable I can use instead of LEVGRND?

Remove shell_exit.sh in $DARTROOT/models/POP/shell_scripts?

I'm updating CESM_DART_config for POP and CESM2.

The script references shell_exit.sh which serves to provide the script an exit code if a serial section of the batch script exits on LSF.

There seems to be a maintained version shell_exit.sh script here:
$DARTROOT/assimilation_code/scripts/shell_exit.sh

For some reason, there's a not-exactly-duplicate version of the script in:
$DARTROOT/models/POP/shell_scripts/shell_exit.sh

As I'm updating the shell scripts in $DARTROOT/models/POP/shell_scripts/cesm2_1 should I remove shell_exit.sh from the directory and references to it in CESM_DART_config?

forward_operator mod checking for MISSING_R8 values rather than looking at the istatus.

Ed note: This issue was originally opened 27 January 2016 and is being manually ported over to GitHub

Line 406. Can't rely on checking missing_r8 values. For example CAM passes non missing_r8 values and a non-zero istatus.

In commit 9605 (rma_cam_fv branch) I have jammed in a fix using existing code so I could work on cam_fv. However there needs to be a function in the qc mod the reproduces the good_forward_op test in obs_space_diag on the trunk.

GSI2DART build/run question

While testing the DART software, I compile and then try to run the gsi_to_dart program. Apparently it is designed to require more mpi tasks than ensemble members, but the quickbuild.csh and mkmf_gsi_to_dart did not force an mpi build, so I am making that the default. That leads me to another observation ... the code issues the following run-time errors:

0[1125] cisl-minnetonka:~/<3>obs_converters/GSI2DART/work % mpirun -np 4 ./gsi_to_dart 
 running on            4  processors ...
 FATAL ERROR in find_namelist_in_file
    initialize_utilities() or initialize_mpi_utilities()
    must be called before calling find_namelist_in_file().
   utilities_mod.f90
   
   
    stopping.
 FATAL ERROR in find_namelist_in_file
    initialize_utilities() or initialize_mpi_utilities()
    must be called before calling find_namelist_in_file().
   utilities_mod.f90
   
   
    stopping.
 FATAL ERROR in find_namelist_in_file
    initialize_utilities() or initialize_mpi_utilities()
    must be called before calling find_namelist_in_file().
   utilities_mod.f90
   
   
    stopping.

 --------------------------------------
 Starting ... at YYYY MM DD HH MM SS = 
                 2020  8 18 10 41  6
 Program gsi2dart
 --------------------------------------

  set_nml_output Echo NML values to log file only
 total number of mpi tasks must be >= ens_size+1
 tasks, ens_size+1 =            4          81
 ****STOP2****  ABORTING EXECUTION w/code=          19
 ****STOP2****  ABORTING EXECUTION w/code=          19
application called MPI_Abort(MPI_COMM_WORLD, 19) - process 0

So - I know people are using this, but I don't see how they are using it unless they make modifications. Specifically,

! Print out some info
if ( nproc == 0 ) call initialize_utilities('gsi2dart')

Doesn't seem like the right course of action. Any advice from the people who are using this? How?

append_to_netcdf = .false. doesn't work for the program obs_seq_to_netcdf

I found a problem about the obs_seq_to_netcdf program. I set append to be false in the section of obs_seq_to_netcdf_nml in the input.nml file. Here are the settings related in input.nml.

&schedule_nml
calendar = 'Gregorian'
first_bin_start = 2000, 1, 9, 12, 0, 0
first_bin_end = 2000, 1, 22, 12, 0, 0
last_bin_end = 2001, 1, 1, 12, 0, 0
bin_interval_days = 15
bin_interval_seconds = 0
max_num_bins = 1000
print_table = .true.
/
&obs_seq_to_netcdf_nml
obs_sequence_name = ''
obs_sequence_list = 'file_list_clm5.0.06_f09_assim_inf_updatehistlai_LAI_global_e60_2000year.txt'
append_to_netcdf = .false.
lonlim1 = 0.0
lonlim2 = 360.0
latlim1 = -90.0
latlim2 = 90.0
verbose = .true.
/
The settings above is expected to generate 24 obs_epoch files. Before I execute obs_seq_to_netcdf, there are 24 obs_epoch*files existing.

When I execute the obs_seq_to_netcdf program, the program only rewrites the first obs_epoch file,obs_epoch_001.nc, and append values to the following 23 obs_epoch*files.
So the append_to_netcdf = .false. doesn't work.

Could anybody help to resolve the problem ?

TIE-GCM 2.0 issues with DART Lanai

Dear DART Team,

I have recently been attempting to get started using TIE-GCM 2 with DART-8.4.1. The main changes that have been made have been incorporating new namelist variables and ensuring TIE-GCM is executed with mpirun (as required by the new version, even in the single processor case).

I’ve been attempting to run the script run_filter.csh (modified version attached) without success. This test is done with obs_seq.out from run_perfect_model_obs.csh and one processor. The first sign that anything is going wrong in the attached output is when I get:

mbind: Invalid argument

The program filter (which propagates an ensemble of TIE-GCM runs) does continue to run after this point, though seemingly ultimately fails to run TIE-GCM. See L529 onward of the attached file, where the code notices that it has failed to advance the model and produces an error.

run_tiegcm_assim_gadi.o6872291.txt
run_filter.csh.txt
input.nml.txt
advance_model.csh.txt

Is this an error that you are familiar with and do you have any recommendations for fixing it?

Example of finding the source of a CESM compile+scripts error

Ed note: This issue was originally reported 25 May 2016 and is being manually ported over to GitHub.

here's an example of how I typically try to track a CESM compile+script error
back to its source. I use several augmented unix commands:

alias findfrag grep -iI !* `find . -type f`
alias findfile find . -name "!**" -print

ls_frag.csh (maybe there's a better Unix tool for this, but this does what I want):

#!/bin/csh

# Script to look for a fragment of a filename in the directory tree below
# the execution directory.

if ($#argv == 0) then
   echo "Usage: ls_frag.csh frag_to_find #_levels_down_to_search"
   exit
endif

set roo = '*'$1'*'
ls -d $roo
set lvl = 1
# :x keeps the *s from being expanded before they should be
while ($lvl <= $2)
  set roo = '*/'$roo:x
  ls -d $roo
  @ lvl++
end

exit

The error I'll chase is

error #5082: Syntax error, found ',' when expecting one of:
( <CHAR_CON_KIND_PARAM> <CHAR_NAM_KIND_PARAM>
<CHARACTER_CONSTANT> <INTEGER_CONSTANT> ...
block_size_x = , & ! size of block in first horiz dimension
ice_domain_size.F90 (line 52)

The error message tells me that block_size_x is being assigned the value ' ',
which must be coming from a script. I go to the subroutine where that
statement lives, and I see that block_size_x gets a value from environment variable BLCKX.
I use

   findfrag BLCKX

to see that BLCKX is not assigned in the cice source code,
So I look next in the ice log file. Searching in there reveals that the program
was compiled without a BLCKX value being supplied:

   ...' -DBLCKX= -DBLCKY= '...

Then I look in the $CASEROOT directory for where BLCKX should be defined.

   grep BLCKX *
   env_build.xml: <entry id="CICE_BLCKX" value="">

So then I look in env_build.xml for why it's not set there.
It tells me that it shouldn't be set there if CICE_AUTO_DECOMP is TRUE, which it is.
So now the question is "why is CICE_AUTO_DECOMP not being used to set BLCKX?"
It's not anywhere in the CASEROOT directory, (using 'findfrag CICE_AUTO_DECOMP') so I go back to the source code, in the directories controlling building CICE.
It's not set in ...components/cice/...
so I go to .../cesm1_5_beta06/cime and search in there.
The most promising result is in doc/usersguide/faq.xml:

"The scripts that generate the decompositions are
$CASEROOT/Buildconf/generate_cice_decomp.pl. Those tools leverage
decompositions stored in xml files,
$CIMEROOT/../components/cice/bld/cice_decomp.xml"

Then I use findfile generate_cice_decomp.pl to see that generate_....
doesn't exist in the Buildconf directory, so I search for it.
I don't find it, or anything with 'generate_cice' in the name
(ls_frag.csh generate_cice 4), so apparently the documentation is out of date.
This may be related to the switch from perl to python in cesm1_5_beta06.
At this point it would be reasonable to ask CSEG where to find the place where BLCKX
and CICE_AUTO_DECOMP are set. Look in the ChangeLog fileS for people to pester,
or submit a bugzilla issue.

Or I look in an older CESM, to see the contents of generate_cice_decomp.pl,
so that I can search in beta06 for something similar.
I find cesm1_5_beta03/components/cice/bld/generate_cice_decomp.pl.
It is called by other scripts, so my interest shifts to those:

./bld/check_decomp.csh: set config = \
    ` ./generate_cice_decomp.pl -res $res[$grid] -nproc $tasks -thrds $thrd -output all`
./bld/configure: my $cice_decomp_gen = \
        "$cesmroot/components/cice/bld/generate_cice_decomp.pl";
./cime_config/buildnml: my $config_args = \
        `./generate_cice_decomp.pl -ccsmroot $SRCROOT -res $hgrid \
        -nx $ICE_NX -ny $ICE_NY -nproc $ntasks -thrds $NTHRDS_ICE -output all `;

I suspect that bld/configure may be used in some kind of stand-alone context,
and cime is the environment I'm working in, so I pursue cime_config/buildnml first.
Where is that used? I don't find it in the beta03 tree, so I look in $CASEROOT.
Buildnml is called by preview_namelists for every component.
I confirm that the buildnml being used is the one I was looking for by tracking back
the CONFIG_ICE_... and find that

env_case.xml: <entry id="CONFIG_ICE_FILE"
                value="/glade/scratch/bitz/darttest/cesm1_5_beta06c/components/cice/\
                       cime_config/config_component.xml">

So cime_config/buildnml is the one to focus on.
Case.build calls preview_namelists, and gets CONFIG_ICE_FILE from env_case.xml too.
Comparing cime_config/buildnml in beta03 and beta06 using xxdiff I see that
the lines defining BLCKX, etc. have been commented out in beta06.
So we know where the crime was committed, but not who did it, or why.

To track down who did it I look in cesm1_5_beta06/components/cice/docs/ChangeLog.
There are other ChangeLogs higher up in the directory tree, but this one tells me

Originator: dbailey
Date: 09 Mar 2016
One-line Summary: New CIME bld changes.
M cime_config/buildnml

Farther down is

Originator: klindsay
Date: 13 Nov, 2015

but that's older, and seems not directly related.

So I would start with Dave Bailey to as about why those lines are commented out.

Converting sea ice freeboard observations

Ed note: This issue came as an email to the DART team.

I’m a graduate student at UW working with CC Bitz (if she has already reached out to you about this, I apologize for doubling down). I am interested using DART to assimilate sea ice freeboard measurements from ICESat-2. I have the data (currently as .h5 files) and from what I understand, I will need to create a new observation TYPE and/or QUANTITY and then write/adapt an observation converter and forward operator. The DART website kindly encouraged me to reach out for pointers and advice regarding how best to go about these steps. I’ve found the website and the tutorials extremely helpful, but I am new to DART and would like to make sure that I set up the converter and forward operator correctly.

radar data assimilation

Ed note: This issue was originally an email and is being manually ported over to GitHub.

Dear DART Team;

I have a query related to radar data assimilation. It is found from the DART documentation that only NEXRAD data can be assimilated using create_obs_radar_sequence. But I want to use our own Doppler Weather Radar data. I have converted the binary data to "lat, long, Reflectivity, Radial Velocity, Elevation" format. Now, what do I need to do so that the data can be read by the DART module?

Your suggestions will be highly appreciated.

Emojis don't show up correctly on https://dart.ucar.edu/pages/Getting_Started.html

moved from https://github.com/NCAR/DART_development/issues/60

Individual Member CESM hindcast development

Giovani Conti (CMCC) wants to know about the possibility
of running the hindcast ensemble one at a time,
to avoid the MPI(? or other resource) problem they're seeing
with larger (~40) ensembles.

Background:

Multi-instance means that CESM runs all the instances at the same time.
CESM can run multi-instance 2 ways; multi-coupler and single coupler.
Single-coupler is very inefficient (all the instances communicate
through a single coupler), so we never use that.
We've been using multi-coupler for years; each instance has it's own coupler.
That was developed initially by Raffaele Montuoro.
We and CESM don't have a way to run each instance separately (either in succession
or as separate, parallel jobs), which would be useful for people

on small clusters
having problems with large ensembles on big clusters,
with very large model states (e.g. 1/10th degree POP).

That's not many people, but they pop up occasionally,
so we need to discuss how to handle it.

George Modica developed sequential ensemble hindcasting for for WACCM in Lanai.
Kevin has found his scripts, which should be added to DART somewhere.
? They're not consistent with Manhattan or CESM2,
so how should that be handled?

Tim wonders about identifying the actual command that CESM uses to run jobs.
The command andor script was hidden in CESM1, but Kevin found it at one point.
Now (CESM2) the command andor script is generated by python scripting
and will take some unravelling to identify. That code is mostly in
${CESM}/cime/scripts/lib/CIME/case:"case.run" and ${CASEROOT}/.case.run (and ...?).
This might be useful for running job arrays.

WRF runs it's member hindcasts as completely separate jobs (not even job arrays)
and keeps track of how they finish. Then it runs a separate assimilation job.
That makes for complicated scripting, but if any members fail, they can be
re-run individually, which saves core hours.

Strategy Possibilities:

Don't bother with this, but help CMCC get their assimilation working.
Adapt Modica's sequential hindcasts to Manhattan and CESM2.
Figure out a job array architecture to do the hindcasts separately but in parallel.
Like WRF; parallel, completely independent member hindcasts.
?

(KDR: Github (in Firefox) won't let me assign more than 1 person to this,
which I've successfully done in PRs. Also, I don't see a place to add watchers.)

advancing the model inside PMO

Ed note: This issue was submitted by email.

Now I am trying to run the OSSE, namely, the perfect_model_obs experiments. When I went to the step 2, running ./perfect_model_obs gives the follwing errors:

PE 0: filter:
routine: filter:
message: advancing the model inside PMO and multiple file output not currently supported
message: ... support will be added in subsequent releases
message: ... set "single_file_out=.true" for PMO to advance the model, or advance the model outside PMO
application called MPI_Abort(comm=0x84000000, 99) - process 0
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=99
:
system msg for write_line failure : Bad file descriptor

Manhattan branch ... merging new content

On August 14th, 2020 - I mistakenly merged a pull request against Manhattan instead of master. (This was PR#44)
Those changes were intended for master, so I immediately reverted the merge (PR#45) and hoped for the best.
[Aside: I used a GitHub facility to revert the merge]
All seemed well - until I wanted to update Manhattan for a new (minor) release.

git now thinks the merge-base between Manhattan and master is b4ce252
which was the last commit before the bad merge. This seems like the right commit - however, the commits to
revert the bad merge (change/add a bunch of files, remove/revert those same files) are still intefering with attempts to
merge the current master onto Manhattan. All attempts to update Manhattan seem to ignore the files from the original
bad merge - so those files (lots of them) remain unchanged or simply do not exist.

6b6c084c70c63ec05ba06a280ecb16e83b1e950d    bad commit (add new files, modify others)
72d905da819325bcb8c442f0bdcb6474cf2d32e8    commit to fix (i.e. delete, revert a bunch of files)
2a66909778450bcc2f585d20ee84eac944233de7    what I think should be the new merge-base

Until we figure out a way to fix the situation, I see no good path forward for clean upgrades from Manhattan to future releases.
My apologies ... this has been quite an illuminating mistake.

If there is a git expert who can help me out of this predicament, I will be very grateful.

gendocs script in DART/docs/doxgen does not work as is

This is a small fix (changing CHANGELOG to CHANGELOG.md)
However the output produced is uninspiring and I would argue counter-productive since it is not complete.
Propose removing the DARTdocs/doxygen directory.

side note: I am now able to fry eggs on my laptop after running dot on every DART program.

is the TPW operator correct as-is? in CAM, WRF, MPAS

Ed note: This issue was originally reported 12 June 2015 and is being manually ported over to GitHub.

I have code which uses surface fields as part of integrating a total column value. is this actually the wrong thing to do?

Glen Romine added a comment - 11/Jul/15 9:50 AM
I don't think the DART forward operator is correct. Below I review code differences, but this needs some more rigorous testing and a proposed revision to the code.

I took a look at the TPW operator with the UPP code. There, TPW is the integrated cloud water and water vapor in the column multiplied by a correction factor for the mass in the column:

DO L = 1,LM

DO J=JSTA,JEND
DO I=1,IM
Qdum(I,J)=Q(I,J,L) ! here, Q is the sum of cloud water and water vapor mixing ratios
ENDDO
ENDDO

DO J=JSTA,JEND
DO I=1,IM
DP =PINT(I,J,L+1)-PINT(I,J,L) ! PINT is the full (dry + water vapor) pressure at edges, so this is the pressure thickness of each layer
PW(I,J)=PW(I,J)+Qdum(I,J)*DP*GI*HTM(I,J,L) ! GI is 1/G (the gravitation constant 9.81), and HTM is always 1 for sigma coordinate
ENDDO
ENDDO

ENDDO ! level loop

=================================================================================

Here is the TPW operator in DART, which has some interesting differences:
First - there is an option to use surface state variables (separate_surface_level) instead of the lowest model level. I'm not sure why you would ever want to do that. This is the default - so at a minimum should instead be false.

Next, there is an option to construct a pressure column by dividing the pressure difference by the model top (fixed at 200 mb) and surface pressure by a fixed number of levels (fixed at 40). This should also be removed.

Using the 'model_levels' option (default), the pressure column is constructed by calling model interpolate, same as for constructing the (only) specific humidity column:

lastk = 2
LEVELS: do k=first_non_surface_level, 10000 ! something unreasonably large

! call the model_mod to get the pressure and specific humidity at each level
! from the model and fill out the pressure and qv arrays. the model must
! support a vertical type of level number.

which_vert = VERTISLEVEL
location2 = set_location(lon, lat, real(k, r8), which_vert)

call interpolate(state_vector, location2, KIND_PRESSURE, pressure(lastk), istatus)
if (istatus /= 0 .or. pressure(lastk) < pressure_top) exit LEVELS

call interpolate(state_vector, location2, KIND_SPECIFIC_HUMIDITY, qv(lastk), istatus)
if (istatus /= 0) return

lastk = lastk + 1
end do LEVELS

In the above - you should end up with a column of pressures that is 1 index larger than specific humidity values.

....
tpw = 0.0
do k=1, lastk - 1
tpw = tpw + 0.5 * (qv(k) + qv(k+1) ) * (pressure(k) - pressure(k+1) )
enddo

! convert to centimeters of water and return
tpw = 100.0 * tpw /(density*gravity) ! -> cm

Note above that the vertical average QV (so the mean value between model levels) is multiplied by the pressure difference of a model layer. Thinking it should be:

do k=1, lastk-1
dp = pressure(k) - pressure(k+1)
tpw = tpw + qv(k)*dp
enddo

Further, instead of just specific humidity, probably should also add cloud water. I've also see rain water included in TPW - but just adding cloud water is probably sufficient. Notably, the pressure returned from model interpolate is the DRY pressure, while the pressure column in UPP includes water vapor pressure.

Comment not clear

DART/assimilation_code/modules/assimilation/filter_mod.f90

Line 192 in 18106dd

 logical :: distributed_state = .true. ! Default to do state complete forward operators. 

distributed_state = .true. is ensemble complete, correct?

wrf_hydro model_mod.html seems to be out of date

e.g.

There are several scripts in the DART/models/noah/shell_scripts

I'm not sure how much model_mod.html is relevant to the wrf_hydro directory. There isn't a model_mod.html in models/noah

CICE parameter estimation support

Ed note: This issue was originally reported 28 June 2020 and is being manually ported over to GitHub.

I also had to do a bit of hacking to dart_to_cice.f90 to make the code run without parameter estimation. It was hard wired to do it. I made it turn on/off with a namelist parameter now. It would be good to update it in the code base someday.

I agree ... volunteers?

Rename Lorenz '04 to Lorenz '05

The Lorenz '04 model in DART is called that because Jim Hansen was working on the model mod in 2004, while Lorenz's manuscript was still in review in the Journal of the Atmospheric Sciences, where it was subsequently published in 2005.

The year discrepancy in Lorenz '96 makes sense, since even though Lorenz presented the model at ECMWF in Autumn of 1995, the seminar proceedings didn't get published until 1996 and thus everyone refers to it as Lorenz '96.

For Lorenz '04, it seems that the collection of models are only referred to as '04 within DART and is commonly referred to as Lorenz '05 in the community.

After discussing with @timhoar and @jlaucar it seems appropriate to rename the model as Lorenz '05.

I've rewritten the model's README, however, completing this rename entails several additional steps:

renaming the directory, source files, any references in the source file
changing the path_names & mkmf files
changing the references to lorenz_04 in the matlab files
changing the references in the test_dart world
possibly changing the references in the tutorial - which will also need some tweaks based on the last implementation at AMS
changing some of the docs/pages/* material
Doing an exhaustive recursive case-insensitive grep through the repository

Multi-cplr (CESM) ensemble hindcast development

Ed note: This issue was originally reported 28 April 2016 and is being manually ported over to GitHub.

running 1/10th degree pop in multi-instance may be infeasible because:

communication bottlenecks in the coupler
sequentially initialization time
number of nodes needed to run all instances at the same time

this means we will have to script running N separate instances of CESM, like the old days, and then moving the files into place so filter can run.

radically different way to preprocess the source code

Pull request from Nancy here https://github.com/NCAR/DART_development/pull/117

To do:

describe what this enhancement does
what needs to be tested?
corresponding update of documentation.

Observations to run DART

Some observations (most frequently atmospheric) have restrictions placed upon them by the data providers. As such, DAReS chooses to provide the necessary conversion tools and instructions (in the observations/obs_converters directory) on how to download and create observation sequence files necessary for your particular experiment. https://dart.ucar.edu/pages/Observations.html

Creating synthetic observations can be a quick way to get started with DART. Become familiar with the material on 'Perfect Model Experiments' or 'Observations System Simulation Experiments' (OSSEs) in the DART documentation. Be aware that the method for manually creating synthetic observations explained in the DART material should not be extended to convert real observations. There is a whole directory of observation converters for a multitude of input formats.

pop_assimilate.csh inflation file trouble

I am currently working with DART in ocean data assimilation with fully coupled CESM. I have made some modifications to the pop_assimilate.csh, such that the code works correctly in the Manhattan DART. It works well when I disabled the time-evluation of inflation. (&filter_nml :: inf_initial_from_restart = .false.)
Now I am trying to set (&filter_nml :: inf_initial_from_restart = .true.) to enable time-evluation. I have noticed that some namelist entries such as inf_in_file_name, inf_out_file_name , inf_diag_file_name seems not supported in Manhattan. And my code always fails to find the restart file for inflation_ics. The instructions in pop_assimilate.csh is a little confusing, seems like some old features are involved.
I wonder if you can give me some suggestions about the time-evluation of inflation in current version. Thanks.

p.s. In case you have many different copies of the pop_assimilate.csh, I attached the one I am using with (&filter_nml :: inf_initial_from_restart = .false.) currently.

spatial plot of bias

Ed note: This issue was originally reported 7 May 2015 and is being manually ported over to GitHub.

I am trying to make spatial diagnostics map from which we can see the spatial diatribution of MBE
(mean bias error) of brightness temperature for certain period. I want to do it using obs_diag_output.nc .
So far I have used the matlab code plot_evolution.m to calclate time series of the RMSE for certain range of the region.

Can I also draw the spatial map of MBE for the simulation period?
Then, which matlab diagnostics code do you recommend me to use for my purpose?

Since we have observations and the expected values of the observations, computing the bias is easy. Under certain circumstances, the tricky part is making them available as a spatial plot. Since the observation network is treated as 'randomly located' (even if it is not), the simplest thing to do is to convert the observation sequence files to netCDF and plot them as a scatterplot. Given the density of observations and the timescales, this may be sub-optimal.

obs_diag.f90 can handle an arbitrary number of regions, but the input is tedious and Ryan Torn indicated years ago that it is horribly slow for lots of regions. I have not tried to deduce why, although obs_diag has been revised several times since then. obs_diag.f90 calculates the mean bias over regions and times ... but was never intended to work on 'gridcell' sized areas - which is what I think is the desire here.

If you are performing a perfect model experiment, you have the True_State.nc and Prior_Diag.nc to work with. That would work
for any model. The question was specifically about CLM, which has a state-space representation of the gridcell constitutent pieces which must be reconstituted into gridcell values. There is a matlab script clm/matlab/clm_get_var.m that will create 'gridcell-based' quantities from CLM and DART restart files. There is also a clm_plot_var.m function.

MPAS "bus error", "shepherd terminated"

I'm in trouble running filter now, with the updated model_mod and obs_seq.out (w/ modified obs error variance in some obs types).
It worked well with ens_size = 3, but now failed w/ ens_size = 100.
I checked with different obs types separately, and the filter seemed to be running fine w/ individual obs type.
But when I tried to run them all together in the full ensemble, the filter fails.
At first, I got a bus error - no idea what it is -, then I increased the number of node (select) from 16 to 36, and I got the error as below.

MPT ERROR: Rank 334(g:334) received signal SIGBUS(7).
        Process ID: 49730, Host: r5i2n9, Program: /glade/work/xxxx/DART/models/mpas_atm/work/filter
        MPT Version: HPE MPT 2.19  02/23/19 05:30:09

MPT: --------stack traceback-------

MPT: -----stack traceback ends-----
MPT: On host r5i2n9, Program /glade/work/xxxx/DART/models/mpas_atm/work/filter, Rank 334, Process 49730: Dumping core on signal SIGBUS(7) into directory /glade/scratch/xxxx/MPAS_DART/MPASg
MPT: shepherd terminated: r5i2n9.ib0.cheyenne.ucar.edu - job aborting
duration_secs = 143

Tim, I put "#PBS -k eod" as you suggested.
And let me attach input.nml, jfyi.

Just to be clear, before I submitted my last changes to the branch, it worked just fine in my test w/ 100 members. So I really think nothing was wrong with the updates.
But then I changed obs_seq.out w/ different error values and now the filter doesn't work any more.
Can you find any clues from the error message shown above? I'm stuck...

in filter_main.90 state_ens_handle is in the call to adaptive_inflate_init before the state_ensemble is initialized

In filter_mod.f90, state_ens_handle is not used in adaptive_inflate_init, but is in the arguments.

408 call adaptive_inflate_init(post_inflate, inf_flavor(2), inf_initial_from_restart(2),  &
   inf_sd_initial_from_restart(2), output_inflation, inf_deterministic(2),            &
   inf_initial(2),  inf_sd_initial(2), inf_lower_bound(2), inf_upper_bound(2),        &
   inf_sd_lower_bound(2), inf_sd_max_change(2), state_ens_handle,                     &
   allow_missing, 'Posterior')

This is before the call to

 497   call init_ensemble_manager(state_ens_handle, num_state_ens_copies, model_size)

Not causing problems at the moment, since state_ens_handle is not used, but state_ens_handle should be removed from adaptive_inflate_init.

avoid updating coastal regions in POP

Ed note: This issue was originally reported 28 Jan 2015 and is being manually ported over to GitHub.

... getting crashes in POP which appear to be made worse by assimilating obs near coastal regions.

unfortunately in get_close_obs() in the pop model_mod, there is no easy way to tell what the
corresponding KMT/KMU array is -- we only have access to the location lat/lon/depth and not the column index.

so she's going to try in the sv_to_restart_file() routine to add code to read back in the original field
values before doing the update, and for coastal regions keep the original state values instead of
overwriting them with the assimilated values.

24 Feb 2015
I just touched base and she is not yet happy with her proposed solution because she is still getting some
intermittent crashes in POP in shallow areas. so this is still an open issue.

text observation converters

This is a warning to all who choose to implement a text-to-observation converter.

There are some pretty rough examples in the observation converter directory.
I am trying to clean up a couple to provide good examples, but in the meantime -
please follow the advice in the https://dart.ucar.edu/pages/Observations.html#obs_seq_overview
section "Data Sources and Formats"

wrf_hydro python code

How much of the wrf_hydro directory python code should be shipped with DART vs. in its own repo.

If all of the wrf_hydro python is distributed with DART, needs review/documentation on how to test.

README description of assimilation code seems incomplete

Directory	Purpose
`assimilation_code/`	Low-level library and fonts required by NCAR Graphics and NCL

I think Low-level library and fonts required by NCAR Graphics and NCL is a bit misleading, suggest assimilation tools and programs

wrf_hydro python security vulnerability

notebook needs upgrading to 6.1.5 or later (see dependabot alert)
https://github.com/NCAR/DART/blob/Manhattan/models/wrf_hydro/python/experiment/requirements.txt

Wider issue, how much of this code should be in the dart repo vs. wrf_hydro_py.

obs_diag rmse normalization

obs_diag computes the root-mean-squared-error (rmse) which begs the question: 'Error from what?'
The observation is considered to be the truth. This has subtle ramifications when normalizing the rmse ... do you normalize by 'N' or 'N-1' ? The reason statisticians normalize by N-1 is to account for the loss of the degree of freedom when you have to use the data to estimate the 'truth' used to compute the error. Since obs_diag presumes the observation is the truth, it has been normalizing by N.

create_ensemble_from_single_file allocating a 'model_size' variable

filter_mod:create_ensemble_from_single_file() has a bug creating an ensemble from a single file when the single file is allowed to have MISSING values. The logic is supposed to be

record the location of the missing values
copy the single member to all the other members
add perturbations
ensure the new members have missing values in the same place.

At present, the array recording the locations of the missing values is allocated to the wrong length.
Allocating it to the "correct" 'model_size' is the wrong approach - we work with some models that are too big. Another approach is needed ... working on one variable at a time, perhaps ...

observation location roundoff errors

DART encodes the location of the observations as radians, not degrees. Furthermore, you can write observations as ASCII files or binary files. Binary files can be 32-bit or 64-bit. All of this makes for a good situation to encounter roundoff errors. DART writes observations locations with a 'fixed' precision to help compare observation sequence files - different compilers wrote out free-format representations of the same number with a different number of digits.

Looking for "equality" given the representation variations is a problem. Even using relational operators (<, >) may lead to unexpected behavior. Consider the following code that is in the threed_sphere/location_mod.f90:

if(lat < -90.0_r8 .or. lat > 90.0_r8) then
   write(msgstring,*)'latitude (',lat,') is not within range [-90,90]'
   call error_handler(E_ERR, 'set_location', msgstring, source, revision, revdate)
endif

We have encountered the situation when assimilating radiosonde observations at the South Pole trigger this error.

The original ASCII file has the latitude (in radians) as -1.570796326794879
The file is combined with other observations using a program (mpas_dart_obs_preprocess) that does some preprocessing, but not on the radiosonde observations. mpas_dart_obs_preprocess is written correctly and fundamentally just reads the ASCII into a real variable and then writes out ASCII to the output file.

When you compile mpas_dart_obs_preprocess with 32bit reals (obs_seaq.r4) and 64bit reals (obs_seq.r8) and just write out the same observations as ASCII using the core DART routines - the locations you get are (longitude, latitude, vertical, vertical coordinate sytem code):


compiled precision	longitude	latitude	vertical	code
32 bit	0.000000000000000	-1.570796370506287	69600.00000000000	2
64 bit	0.000000000000000	-1.570796326794879	69600.00000000000	2

Notice the difference in the last few digits of the latitude.
When you read those files into a program (obs_loop) compiled with 32bit or 64bit reals and print using a free-format print statement, you get:


obs_loop compiled with 64 bit reals
input precision	fixed precision ASCII radians	free-format degrees
32 bit	-1.570796370506287	-90.0000025044782
64 bit	-1.570796326794879	-89.9999999999990
obs_loop compiled with 32 bit reals
input precision	fixed precision ASCII radians	free-format degrees
32 bit	-1.570796370506287	-90.00000
64 bit	-1.570796326794879	-90.00000

Here is the scoop: There is a loss of precision reading ASCII files written from 64 bit reals into a 32 bit real and then writing out ASCII files with what amounts to false precision. You don't have to do any math on them whatsoever.

If you compile the preprocessing program with r8=r4, (i.e. all reals are 32bit) the fixed-precision low-order digits are different. When reading them back into a program that can represent all those digits (like my obs_loop test), you get a slightly different value.

I think the bottom line is that it is safest to process the observations using programs compiled to use 64bit reals.
Those ASCII observation sequence files get read correctly by programs using 32bit or 64bit reals.

If you know you are going to run filter or perfect_model_obs with 32bit reals, you can process the observations with programs compiled to use 32 bit reals - just be aware that if you ever use those observation sequence files with 64bit reals, you can suffer the consequences.

MADIS converter examples

The developer_tests for the observation converters needs work in general, and the MADIS
converters have hardwired (generic) input filenames that do not match the provided data files.
It might be useful to implement namelists to specify the filenames.

OBS_TEMP is not allocated

27 May 2020:

... we get the following (here attached the full log):

...
 Before computing prior observation values TIME: 2020/05/27 14:44:55
 After  computing prior observation values TIME: 2020/05/27 14:45:54
 PE 0:  filter trace: After  computing prior observation values
 PE 0:  filter trace: Before observation space diagnostics
forrtl: severe (408): fort: (8): Attempt to fetch from allocatable variable OBS_TEMP when it is not allocated

Image              PC                Routine            Line        Source
filter             000000000079F1F6  Unknown               Unknown  Unknown
filter             00000000006A7805  filter_mod_mp_obs        1680  filter_mod.f90
filter             00000000006A0412  filter_mod_mp_fil         877  filter_mod.f90
filter             0000000000698BBF  MAIN__                     20  filter.f90
filter             000000000040E2A2  Unknown               Unknown  Unknown
libc-2.17.so       00002B9EABACE3D5  __libc_start_main     Unknown  Unknown
filter             000000000040E1A9  Unknown               Unknown  Unknown

Did you ever met this error? We use these flags to compile DART:

 FFLAGS = -g -C -check noarg_temp_created -fpe0 \
          -fp-model precise  -ftrapuv -traceback -convert big_endian -assume buffered_io -assume realloc_lhs \
          -warn declarations,uncalled,unused  $(INCS)

DART QC = 7 and a failed posterior forward operator - obs_diag bug

There is a bug that involves the rare situation when the prior observation is rejected by the outlier threshold and the posterior forward operator fails. DART does not have an error code that covers that situation. The obs_diag program correctly records the prior observation rejection as a DART QC of 7, if the posterior forward operator failed, the obs_diag program (incorrectly) assigned the posterior QC to DART QC of 4. The correct error code for a failed posterior forward operator is 2, not 4.

Worse, the logic to check for a failed posterior forward operator when the prior was rejected was not implemented correctly in all parts of the code. Let me repeat that the statistics for the priors are and have been correct. Only in rare situations were the posterior statistics incorrect. Since we all know that posterior statistics are not the best metric to begin with, I hope this has not caused you any problems.

Even before 2015, the run-time output and log messages of obs_diag correctly report the number of observations in this situation - see the last line below:

 # observations used  :         9600
 Count summary over all regions - obs may count for multiple regions:
 # identity           :            0
 # bad Innov  (ratio) :            0
 # bad UV (wind pairs):            0
 # bad Level          :            0
 # big (original) QC  :            0
 # bad DART QC prior  :            0
 # bad DART QC post   :            0
 # priorQC 7 postQC 4 :            0

I am in the process of correcting the logic. Stay tuned for a GitHub pull request to address the issue.

mpas VERTISURFACE change

Change required for surface obs for mpas using rttov

see commit https://github.com/NCAR/rttov-mpas-dart/commit/317179742c21ad4c0ca4c1284d8f726cb6f1985d

also has the change to how the vertical coordinate is set for
VERTISSURFACE obs - this change should percolate into the real
FO code at some point.