geoschem / integrated_methane_inversion Goto Github PK

Integrated Methane Inversion workflow repository.

License: MIT License

Shell 21.00% Python 50.04% Jupyter Notebook 27.24% Perl 1.73%

atmospheric-chemistry atmospheric-composition atmospheric-modeling aws climate-change climate-modeling cloud-computing geos-chem greenhouse-gases inverse-modeling

integrated_methane_inversion's Introduction

Integrated Methane Inversion (IMI) Workflow

Overview:

This directory contains the source code for setting up and running the Integrated Methane Inversion with GEOS-Chem.

Documentation:

Please see the IMI readthedocs site

Reference:

Varon, D. J., Jacob, D. J., Sulprizio, M., Estrada, L. A., Downs, W. B., Shen, L., Hancock, S. E., Nesser, H., Qu, Z., Penn, E., Chen, Z., Lu, X., Lorente, A., Tewari, A., and Randles, C. A.: Integrated Methane Inversion (IMI 1.0): a user-friendly, cloud-based facility for inferring high-resolution methane emissions from TROPOMI satellite observations, Geosci. Model Dev., 15, 5787–5805, https://doi.org/10.5194/gmd-15-5787-2022, 2022.

integrated_methane_inversion's People

Contributors

Stargazers

Watchers

Forkers

hannahnesser sarahhancock eastjames sabourbaray pradhyumna85 yantosca alexoort djvaron laestrada ddalton-swe doudouwa002 pixalytics-ltd kboonma srijanalama992 megan-he psadavarte sharadgupta27 yunxiaot hmechanic

integrated_methane_inversion's Issues

Default hyperthreading setting in AMI

What should we use for the default hyperthreading setting in the AMI? By default sbatch submits as many jobs as there are vCPU’s, but this might be slower or only marginally faster than submitting jobs equal to the number of physical CPUs (#vCPU/2).

I think Will tested this. Right now I believe the default setting is to do hyperthreading, so num_jobs = num_vCPU.

[FEATURE REQUEST] Kalman filter

@djvaron Feel free to add a description

[FEATURE REQUEST] Incorporate information on point sources

Include information on known point sources in region of interest where available.

Possibilities:

Display known point sources as part of the preview
Increase prior error for grid cells with known point sources
Increase prior estimate for grid cells with known point sources

[FEATURE REQUEST] Average TROPOMI observations for each grid cell in GC

From Daniel:

Presently we use individual TROPOMI observations in the observation vector y, and compare to the Kx from GEOS-Chem with TROPOMI operator applied. But there are typically manyTROPOMI observations per GEOS-Chem grid cell per day, and we could average them to reduce the size of y and resulting SO. We haven’t done it this way because the TROPOMI avker (and hence the TROPOMI operator) is different for each observation, but that’s not a good reason. Averaging would help reduce the dimensions of y and SO, and decrease the error correlation within SO that partly contributes to our need for the regularization coefficient gamma. Here’s how to do it:

Continue to apply the TROPOMI operator to the GEOS-Chem fields for each observation – no change here.
Average the observations for each grid cell and day to populate y and correspondingly average the simulated TROPOMI observations to populate Kx.
Keep track of the number of observations being averaged so that we can adjust SO – we don’t have a formula for that yet, but Zhen’s current work on error characterization will give us that.

Adjust recommended vCPU configuration for default user

Followed documentation to set up first EC2 instance. In step 3 for setting up the EC2 instance, documentation suggests c5.9xlarge 36 core vCPU instance, however as a new user to AWS there appears to be a limit on 32core vCPUs which results in the following error when attempting to launch the instance:

Suggested resolution: Alter documentation to recommend c5a.8xlarge 32 core vCPU for first time users to bypass this limitation.

[FEATURE REQUEST] Add more flexibility to use other observation operators instead of or in addition to TROPOMI

Currently the default observation operator is for TROPOMI, however this should be made more flexible to allow for use of other observational datasets.

[FEATURE REQUEST] Add capability for global simulations in the IMI

Theoretically it is possible to run the IMI on a global grid, as opposed to the default nested-grid domains. However, more testing is needed.

Can we avoid user needing to set slurm resources when submitting with sbatch?

Submitting simulations with sbatch, the user needs to know how many cores to request. Can we automate this for them?

Otherwise we need to tell them to edit all the run scripts to use the number of cores their instance has. E.g. on c5.xlarge I had to change num cores from 8 to 4.

At a minimum we should set the default num cores to 1 instead of 8, which is easier to explain.

Users also need to know:

how much memory to request
how much time to request — can this be set to infinite?

If we can't automate the memory/cores to request, then we need to provide instructions on choosing values in the Readthedocs. That would add more steps to the workflow. Avoidable?

[FEATURE REQUEST] Add capability to optimize offshore emissions

The current state vector creation scripts only consider regions over land. Ideally we would also include offshore emissions; however that requires emissions output from a prior simulation (perhaps even just a HEMCO standalone simulation). Hannah Nesser has developed this capability in her work and we may be able to bring that into the IMI as an option.

[FEATURE REQUEST] Expand IMI to other species such as CO2

As part of JPL's Earth Information System (EIS) there are plans to expand the IMI to include CO2 and additional observation operators.

[BUG/ISSUE] Significant speed-up when running the preview with tmux vs. sbatch

The preview runs ~3-4 times faster when using tmux vs. sbatch.

For the 1-month Permian basin test case:

with tmux: 133 seconds
with sbatch: 471 seconds

It's not clear why this is the case. Something to do with the sbatch parallel configuration?

Duplicated lines in UMI setup script

The setup script contains some duplicated lines for the spin-up simulation settings.

[BUG/ISSUE] Grid centers, edges for state vector/HEMCO

Minor grid differences between state-vector and HEMCO diagnostics emission fields:

Presently input.geos uses the minimum/maximum grid-cell centers from the gridded state vector as the domain boundaries (grid-cell edges) for the simulation. As a result, the GEOS-Chem/HEMCO domain is slightly smaller than the gridded state vector domain (0-2 grid cells along lat/lon).

Currently we are using a dimension-matching function to ensure total emissions are still properly computed across the region of interest (match_size() in src/inversion_scripts/utils.py).

It's possible this could lead to incomplete removal of the 3 3 3 3 GC buffer zone from the TROPOMI analysis in some cases, but this hasn't been observed yet in test inversions.

We should consider revamping the domain/grid definitions across the IMI so that HEMCO and the state vector both produce an identical domain as close as possible to the user's settings for lat/lon boundaries, obviating the need for thematch_size() function.

Unnecessary loop when submitting Jacobian simulations

The AMI script for running the Jacobian simulations still has a {0..3} loop from my Permian work on Cannon. We don't want this on AWS.

I'll post the corrected script asap.

[FEATURE REQUEST] Diagnostic for overfitting

Elise Penn provided the following feedback:

The Gamma regularization factor should depend on the state vector and how many observations are used. Users might unwittingly overfit to observations with a fixed Gamma value. It could be helpful to output J_A/n and J_O/m in post-processing as a diagnostic for overfitting. If there is overfitting, users could then choose a new Gamma value using Xiao Lu’s method.

This could go into the visualization notebook in an update.

[BUG/ISSUE] StateVector.nc fails to build for very narrow region of interest

Using a very small latitude range for the region of interest:

LonMin: 3
LonMax: 7
LatMin: 42
LatMax: 43
REGION: "EU"

BufferDeg: 5
nBufferClusters: 8
LandThreshold: 0.25
CreateStateVectorFile: true

Res: "0.5x0.625"
Met: "merra2"

IMI throws the following error when building and then attempting to read the state vector file:

/home/ubuntu/CH4_Workflow/Test_France_3days/make_state_vector_file.py:110: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result.
  statevector.values[land.isnull()] = 0
Traceback (most recent call last):
  File "/home/ubuntu/CH4_Workflow/Test_France_3days/make_state_vector_file.py", line 183, in <module>
    make_state_vector_file(land_cover_pth, save_pth, lat_min, lat_max, lon_min, lon_max, buffer_deg, land_threshold, k_buffer_clust)
  File "/home/ubuntu/CH4_Workflow/Test_France_3days/make_state_vector_file.py", line 110, in make_state_vector_file
    statevector.values[land.isnull()] = 0
IndexError: too many indices for array: array is 2-dimensional, but 19 were indexed

Works normally when using a larger region of interest, e.g. setting LatMax: 53.

Add environment files for spack environment and conda environment on aws

Up to this point we have added the dependencies manually on the ami, but going forward we should use environment files to track, load, and install the dependencies needed for the IMI workflow. This will help to improve our documentation of dependencies and make future updates of the ami easier/ more predictable.

[FEATURE REQUEST] State vector clustering options for large inversion domains

Inversions for large domains (e.g., CONUS, Russia) would currently be very expensive, with very large state vectors.

Some possible options to shrink the state vector:

Smart clustering of grid cells to user-specified number of state vector elements
Gaussian mixture model (GMM) with radial basis functions

download_tropomi script doesn't check whether files are previously downloaded

[FEATURE REQUEST] Add capability for log normal errors

Permian/Cannon relics in the AMI

We left some relics from previous Permian/Cannon testing in the AMI. Here are two:

AMI comes with a backup_files/input_data_permian/ directory. Some of the contents may not be needed, and the permian should not be in the standard directory name. Maybe backup_files/input_data instead.
The sbatch header of run_inversion.sh contains huce_intel.

[FEATURE REQUEST] Automatic inversion ensembles

Currently users need to manually re-run the IMI with different inversion parameters to generate an inversion ensemble for better error characterization.

Add feature to automatically generate an inversion ensemble.

An example would be to vectorize the config input for Gamma and have the IMI run the inversion once for each value. This would necessitate a new prior/posterior run directory for each Gamma value.

[FEATURE REQUEST] Additional regional domains offered by the IMI

The IMI originally included the following regions:

China/SE Asia
Europe
North America

It will also be expanded to offer:

South America
Middle East
Oceania
Africa

This primarily involves processing the global 0.25x0.3125 meteorology fields available to the IMI. The global fields are cropped to the regions above to reduce file size and speed up file I/O in the GEOS-Chem simulations run within the IMI.

[BUG/ISSUE] s3 cp error for boundary condition file

There appears to be a (seemingly) innocuous s3 cp (or sync) error in the output file of the IMI:

fatal error: An error occurred (404) when calling the HeadObject operation: Key "HEMCO/../BoundaryConditions/GEOSChem.BoundaryConditions.20180401_0000z.nc4" does not exist

config and output files are attached to reproduce, but this error has been noted in other runs as well.

config.yml.txt
imi_output.log

HEMCO and automating the cluster file

Melissa is working on automating the generation of the cluster file.

Just a quick note that HEMCO currently points to an irrelevant default cluster file. So we will need to make sure HEMCO knows where to look for the automatically generated one.

[FEATURE REQUEST] Evaluate impact on model performance of using multiple methane species vs running multiple methane simulations

To construct the Jacobian, we currently run N+1 methane simulations perturbing the emissions for each state vector element. This can result in running a large number of GEOS-Chem simulations. One possible improvement may be to utilize multiple methane tracers in a single GEOS-Chem simulation.

HEMCO and automating the cluster file

Melissa is working on automating the generation of the cluster file.

Just a quick note that HEMCO currently points to an irrelevant default cluster file. So we will need to make sure HEMCO knows where to look for the automatically generated one.

[FEATURE REQUEST] Docker image of IMI AMI

Feature Request

For users that may need to comply with their organization's governance and security procedures, it would be beneficial to have a docker image.

Bad boundary conditions directory in HEMCO_Config.rc

The UMI template for HEMCO_Config.rc has a bad target directory for the boundary condition files. Currently, HEMCO looks for the boundary conditions in:

/home/ubuntu/ExtData/HEMCO/SAMPLE_BCs/v2019-05/CH4/

but this doesn't exist in the AMI. As a result our GEOS-Chem simulations show XCH4 levels that are systematically far too low.

Instead, HEMCO should look here for the boundary conditions:

/home/ubuntu/ExtData/BoundaryConditions/

[FEATURE REQUEST] Add capability for 12-km simulations

Native-resolution GEOS-FP met fields are now available at 12-km resolution. We plan to evaluate the use of these in the IMI.

Add flexibility to choose between running IMI on AWS or on local cluster

The IMI was originally developed for running locally on Harvard's Cannon Cluster (currently in main branch). Will Downs added the capability to run the IMI on AWS (currently in the add_download branch). The updates specific to AWS should be merged into main and an option will be added for users to select whether they are running on AWS or on a local cluster. Based on the user selection, the setup script, GEOS-Chem configuration, and scripts in CH4_TROPOMI_INV will automatically choose the correct settings for that system.

AMI does not exist

The docs do not actually specify an AMI. This make setting up an EC2 instance way more difficult.

Setup script requires pre-built ExtData folder to run successfully

Because it tries to build the state vector before running the dry-runs to built ExtData.

Running the setup script with no pre-existing ExtData folder throws two errors:

(1) can't find the landcover file for the state vector
(2) can't find the BC file for the first day of the spinup simulation

Solution may be to perform dry-run(s) earlier in the script.

release of AMI ID & Name

Hello,
What's the ETA on the release of the AMI ID & Name for the model?

Problem switching between EC2 instance types

To save money, I used a c5.xlarge instance to set up the UMI and run the spin-up simulation, and then switched to a c5.9xlarge instance to run the Jacobian simulations.

Everything works normally when I do this, but switching back to c5.xlarge from c5.9xlarge introduces a problem with slurm.

For some reason, slurm shows the node's resources as drained, and jobs are no longer initiated after submission.

Will found a fix for this:

sudo scontrol update nodename=ip-172-31-20-68 state=idle

which resets the slurm state if I understand well.

If we can't avoid this problem, then we should document it on the Readthedocs.

[BUG/ISSUE] Multiple inversions creates error in file path in visualization notebook

The visualization notebook reads from the config.yml, which is edited when new inversions are run. If you have an instance with multiple inversions and you run the visualization notebook of an inversion that wasn't the most recently run, the prior_pth variable will refer to the most recently run inversion and you get an error. It could be worth storing a copy of config.yml somewhere after submitting a run, for future reference in the vis notebook.

[FEATURE REQUEST] Launch AMI from different instance types

ncmax function fails on t2.large instance type.

Add end-to-end script

This issue specifically tracks the progress of adding and end-to-end script, as originally requested in #10.

An initial (rough draft) version of this script has been pushed to the feature/EndToEndScript branch, but a lot of work is still needed.

Executing the setup script with `source` boots user from instance

Because of the exit statement at end. Is it needed?

Instruct user to setup EC2 "Security Group" for connecting to Jupyter Server

Had difficulty connecting to Jupyter Server setup on EC2 in order to run the visualization ipynb file.

Did some googling and realized that the user needs to setup a security group to allow access to the right ports.

Followed step 2 in guide here:
https://dataschool.com/data-modeling-101/running-jupyter-notebook-on-an-ec2-server/

Would be worthwhile to add to the readthedocs page for assistance to other users.

[BUG/ISSUE] STATE "drained" after restarting instance

Using c5.9xlarge instance.

Closed instance after running an inversion with sbatch. After restarting the instance, sbatch cannot schedule new jobs due to (Resources).

sinfo shows drained STATE:

sinfo --Node --long

Thu Feb  3 22:23:32 2022
NODELIST         NODES PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON              
ip-172-31-79-91      1    debug*     drained   36   1:36:1  70235        0      1   (null) Low RealMemory

Not sure why this is happening, but to correct I needed to do:

sudo scontrol update nodename=ip-172-31-79-91 state=idle

Extract user settings from setup script and put in YML file instead

Users can currently modify the settings for their inversion at the top of setup_ch4_inversion.sh but there are many options now and it can be confusion on what settings users should modify. We should think about moving the settings to a separate file to clean up the setup script and make things more user friendly.

[BUG/ISSUE] Error setting up preview GC simulation

The dry run for the 1-day preview simulation throws this error:

fatal error: An error occurred (404) when calling the HeadObject operation: Key "HEMCO/../BoundaryConditions/GEOSChem.BoundaryConditions.20180401_0000z.nc4" does not exist

This path would work:

../../ExtData/HEMCO/../BoundaryConditions/GEOSChem.BoundaryConditions.20180401_0000z.nc4

Unclear why ../../ExtData is missing for this file.

All that said, the error seems to be irrelevant -- the needed boundary condition file is present and the simulation runs correctly.

[FEATURE REQUEST] More flexibility for error variances and covariances

Currently the observation and prior error covariance matrices are uniform and diagonal.

Add flexibility:

option for non-uniform errors
option for non-zero error covariances

[FEATURE REQUEST] Add sector attributions

Only save out StateMet and LevelEdgeDiags for base run to save on disk space

Zichong Chen wrote:

I am thinking about the Jacobian runs and the large storage it is eating. Assume we will run 1000 perturbation runs, but we actually need the output of StateMet and LevelEdgeDiags only once, right?; since the dry air density and pressure levels are exactly the same at each perturbation run. With that said, we just randomly select one run and have the output of StateMet and LevelEdgeDiags, but for other runs we just turn them off at HISTORY.rc. This will save us a lot storage and also will likely make our model run faster (via avoiding large I/O in writing out output).

I already did that, and I think I am almost done for the perturbation run. (1) After the modification, for each perturbation run it consumes 33G instead of 195G (as before) in my case (a quarter res over China); (2) I chatted with Hannah Nesser Elise Penn the other day, to calculate the obs operator, for StateMet output, the only variable we may need is Met_AD; for LevelEdgeDiags, we may only need Met_PEDGE; But if we only need these pressure/air density output once, it does not bother that much even if we want more variables. (3) I am now using 12T, but will soon go down to 1T when my operator scripts finish running.

Jacobian sbatch jobs fail silently

If geoschem runs in jacobian steps fail, the imi continues running -- it should fail and print an error message.

Problem setting IAM roles

The UMI requires the user to set up an AWS IAM role to allow EC2 to access S3.

Will provided instructions for how to do this on the Readthedocs, but when I tried, something went wrong. The role I made was not automatically applied to the instance, and when I applied it manually it still didn’t work.

Luckily I already had an equivalent IAM role set up, but fresh AWS users may get stuck here.

[FEATURE REQUEST] Include capability to optimize annual OH

For global inversions, we should also include optimization of OH in addition. This will require updates to the GEOS-Chem simulations to perturb OH. These perturbations may be applied globally or separately to the North and South Hemispheres, but for most applications the global perturbation is sufficient.

Comments and suggestions from Daniel Jacob

In comments on the manuscript, Daniel made some suggestions for improving the UMI user experience.

Right now the user needs to think about the buffer clusters when they define their inversion domain. They also need to think about the GEOS-Chem "buffer zone" (3 pixels on all domain sides). Daniel thinks the user should only have to tell the UMI what inversion domain they're interested in, and the UMI should automate everything else -- i.e. it should add pixels around the edges for the buffer clusters and handle the buffer zone automatically.
Daniel also thinks the user should have to execute just one script to make everything work, end-to-end. Can we do this? For the inversion part, run_inversion.sh does the trick. For the spin-up and other simulations, can we make a bash script that runs everything in series, with wait statements between each step? Then combine this with run_inversion.sh? This seems a bit painful to me but maybe it's worth it. Otherwise we need to defend the multi-step UMI (run setup script -> run spin-up simulation -> run jacobian simulations -> run inversion -> visualize results)