openghg / openghg_inversions Goto Github PK

University of Bristol Atmospheric Chemistry Research Group RHIME Inversion code (with openghg dependency)

License: MIT License

Python 100.00%

openghg_inversions's Introduction

OpenGHG - a cloud platform for greenhouse gas data analysis and collaboration

OpenGHG is a project based on the prototype HUGS platform which aims to be a platform for collaboration and analysis of greenhouse gas (GHG) data.

The platform will be built on open-source technologies and will allow researchers to collaborate on large datasets by harnessing the power and scalability of the cloud.

For more information please see our documentation.

Install locally

To run OpenGHG locally you'll need Python 3.8 or later on Linux or MacOS, we don't currently support Windows.

You can install OpenGHG using pip or conda, though conda allows the complete functionality to be accessed at once.

Using `pip`

To use pip, first create a virtual environment

python -m venv openghg_env

Then activate the environment

source openghg_env/bin/activate

It's best to make sure you have the most up to date versions of the packages that pip will use behind the scenes when installing OpenGHG.

pip install --upgrade pip wheel setuptools

Then we can install OpenGHG itself

pip install openghg

Each time you use OpenGHG please make sure to activate the environment using the source step above.

NOTE: Some functionality is not completely accessible when OpenGHG is installed with pip. This only affects some map regridding functionality. See the Additional Functionality section below for more information.

Using `conda`

To get OpenGHG installed using conda we'll first create a new environment

conda create --name openghg_env

Then activate the environment

conda activate openghg_env

Then install OpenGHG and its dependencies from our conda channel and conda-forge.

conda install --channel conda-forge --channel openghg openghg

Note: the xesmf library is already incorporated into the conda install from vx.x onwards and so does not need to be installed separately.

Create the configuration file

OpenGHG stores object store and user data in a configuration file in the user's home directory at ~/.config/openghg/openghg.conf. As this sets the path of the object store, the user must create this file in one of two ways

Command line

Using the openghg command line tool

openghg --quickstart

OpenGHG configuration
---------------------

Enter path for object store (default /home/gareth/openghg_store):
INFO:openghg.util:Creating config at /home/gareth/.config/openghg/openghg.conf

INFO:openghg.util:Configuration written to /home/gareth/.config/openghg/openghg.conf

Python

Using the create_config function from the openghg.util submodule.

from openghg.util import create_config

create_config()

OpenGHG configuration
---------------------

Enter path for object store (default /home/gareth/openghg_store):
INFO:openghg.util:Creating config at /home/gareth/.config/openghg/openghg.conf

INFO:openghg.util:Configuration written to /home/gareth/.config/openghg/openghg.conf

You will be prompted to enter the path to the object store, leaving the prompt empty tells OpenGHG to use the default path in the user's home directory at ~/openghg_store.

Additional functionality

Some optional functionality is available within OpenGHG to allow for multi-dimensional regridding of map data (openghg.tranform sub-module). This makes use of the xesmf package. This Python library is built upon underlying FORTRAN and C libraries (ESMF) which cannot be installed directly within a Python virtual environment.

To use this functionality these libraries must be installed separately. One suggestion for how to do this is as follows.

If still within the created virtual environment, exit this using

deactivate

We will need to create a conda environment to contain just the additional C and FORTRAN libraries necessary for the xesmf module (and dependencies) to run. This can be done by installing the esmf package using conda

conda create --name openghg_add esmf -c conda-forge

Then activate the Python virtual environment in the same way as above:

source openghg_env/bin/activate

Run the following lines to link the Python virtual environment to the installed dependencies, doing so by installing the esmpy Python wrapper (a dependency of xesmf):

ESMFVERSION='v'$(conda list -n openghg_add esmf | tail -n1 | awk '{print $2}')
$ export ESMFMKFILE="$(conda env list | grep openghg_add | awk '{print $2}')/lib/esmf.mk"
$ pip install "git+https://github.com/esmf-org/esmf.git@${ESMFVERSION}#subdirectory=src/addon/ESMPy/"

Note: The pip install command above for esmf module may produce an AttributeError. At present (19/07/2022) an error of this type is expected and may not mean the xesmf module cannot be installed. This error will be fixed if PR #49 is merged.

Now the dependencies have all been installed, the xesmf library can be installed within the virtual environment

pip install xesmf

Developers

If you'd like to contribute to OpenGHG please see the contributing section of our documentation. If you'd like to take a look at the source and run the tests follow the steps below.

Clone

git clone https://github.com/openghg/openghg.git

Install dependencies

We recommend you create a virtual environment first

python -m venv openghg_env

Then activate the environment

source openghg_env/bin/activate

Then install the dependencies

cd openghg
pip install --upgrade pip wheel setuptools
pip install -r requirements.txt -r requirements-dev.txt

Next you can install OpenGHG in editable mode using the -e flag. This installs the package from the local path and means any changes you make to the code will be immediately available when using the package.

pip install -e .

OpenGHG should now be installed in your virtual environment.

See above for additional steps to install the xesmf library as required.

Run the tests

To run the tests

pytest -v tests/

NOTE: Some of the tests require the udunits2 library to be installed.

The udunits package is not pip installable so we've added a separate flag to specifically run these tests. If you're on Debian / Ubuntu you can do

sudo apt-get install libudunits2-0

Running CF Checker tests

You can then run the cfchecks marked tests using

pytest -v --run-cfchecks tests/

Running ICOS tests

Some of our tests retrieve data from the ICOS Carbon Portal and so to avoid load on the ICOS severs these should not be run frequently. They should only be run when working on this functionality or before a release. These tests are marked icos with pytest.mark.

pytest -v --run-icos tests/

If all the tests pass then you're good to go. If they don't please open an issue and let us know some details about your setup.

Documentation

For further documentation and tutorials please visit our documentation.

Community

If you'd like further help or would like to talk to one of the developers of this project, please join our Gitter at gitter.im/openghg/lobby.

openghg_inversions's People

Contributors

Stargazers

Watchers

openghg_inversions's Issues

Update: Please update PyMc3 to PyMC

Specifying bc_basis_directory as an input from the .ini file

The utils.bc_sensitivity functions accepts bc_basis_directory as an optional input, but the main code is not currently written to accept bc_basis_directory as an input from the .ini file – this would be useful for non-Bristol people who have to specify a path for this.

Error caused by os.path.join command in utils.py

In utils.py :-

            #name.append(''.os.path.join(name_temp[ii]))
            name.append(os.path.join(name_temp[ii]))

The commented out line gave an error :-

AttributeError: 'str' object has no attribute 'os'

I’m not quite sure what that line is trying to do? It seems to work ok with the line that I changed it to (the uncommented line above), though.

Update: Please provide an environment requirements file

Option for excluding BCs

Include an option for running inversion code without boundary conditions

Hyperparameter on/off

Include option to run model without hyperparameters

Post-processing output sometimes fails due to "Filter error"

Sometimes the "Post-processing output" in inversions_pymc.inferpymc_postprocessouts fails due to a "Filter error":

Post-processing output
Traceback (most recent call last):
File "/user/home/bm13805/openghg_inversions/openghg_inversions/hbmcmc/run_hbmcmc.py", line 208, in
mcmc_function(**param)
File "/user/home/bm13805/openghg_inversions/openghg_inversions/hbmcmc/hbmcmc.py", line 413, in fixedbasisMCMC
mcmc.inferpymc_postprocessouts(xouts,bcouts, sigouts, convergence,
File "/user/home/bm13805/openghg_inversions/openghg_inversions/hbmcmc/inversion_pymc.py", line 604, in inferpymc_postprocessouts
outds.to_netcdf(output_filename, encoding=encoding, mode="w")
File "/user/home/bm13805/.venv/lib/python3.10/site-packages/xarray/core/dataset.py", line 1946, in to_netcdf
return to_netcdf( # type: ignore # mypy cannot resolve the overloads:(
File "/user/home/bm13805/.venv/lib/python3.10/site-packages/xarray/backends/api.py", line 1272, in to_netcdf
dump_to_store(
File "/user/home/bm13805/.venv/lib/python3.10/site-packages/xarray/backends/api.py", line 1319, in dump_to_store
store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
File "/user/home/bm13805/.venv/lib/python3.10/site-packages/xarray/backends/common.py", line 278, in store
self.set_variables(
File "/user/home/bm13805/.venv/lib/python3.10/site-packages/xarray/backends/common.py", line 316, in set_variables
target, source = self.prepare_variable(
File "/user/home/bm13805/.venv/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 495, in prepare_variable
nc4_var = self.ds.createVariable(
File "src/netCDF4/_netCDF4.pyx", line 2962, in netCDF4._netCDF4.Dataset.createVariable
File "src/netCDF4/_netCDF4.pyx", line 4202, in netCDF4._netCDF4.Variable.init
File "src/netCDF4/_netCDF4.pyx", line 2029, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: Filter error: bad id or parameters or duplicate filter

Steps to recreate

The following .ini file was used to run a small inversion on the login and compute notes of Blue Pebble.
The obs/ancillary data is from the object store in /group/chemistry/acrg/ES_store.

; Configuration file for HBMCMC code                                                                                                         

[INPUT.MEASUREMENTS]
; Input values for extracting observations                                                                                                   

species     = 'ch4'   ; (required)                                                                                                           
sites       = ["MHD", "TAC"] ; (required)                                                                                                    
meas_period = ["4H","4H"]  ; (required)                                                                                                      
start_date  = '2019-01-01'      ; (required - but can be specified on command line instead)                                                  
end_date    = '2019-02-01'      ; (required - but can be specified on command line instead)                                                  

inlet         = ["10m", "185m"]
instrument    = ["gcmd", "picarro"]
filters = ["local_influence"]

[INPUT.PRIORS]
; Input values for extracting footprints, emissions and boundary conditions files (also uses values from INPUT.MEASUREMENTS)                 

domain = "EUROPE"       ; (required)                                                                                                         
met_model = 'UKV'
fp_model = "NAME"
fp_height = ["10m", "185m"]
emissions_name = ["total-ukghg-edgar7"]

[INPUT.BASIS_CASE]
bc_basis_case  = "NESW"
bc_basis_directory = "/group/chemistry/acrg/LPDM/bc_basis_functions/"
fp_basis_case  = None ;'16x16_EUROPE_2012.nc'                                                                                                
quadtree_basis = True
nbasis = 50
basis_directory = "/group/chemistry/acrg/LPDM/basis_functions/"
country_file = None

[MCMC.TYPE]
; Which MCMC setup to use. This defines the function which will be called and the expected inputs.                                           

mcmc_type = "fixed_basis"

[MCMC.PDF]
; Definitions of PDF shape and parameters for inputs                                                                                         

xprior   = {"pdf":"lognormal", "mu":0.15005, "sigma":0.51083}
bcprior  = {"pdf":"lognormal", "mu":0.15005, "sigma":0.51083}
sigprior = {"pdf":"lognormal", "mu":-0.98704, "sigma":0.587787}

[MCMC.BC_SPLIT]
bc_freq    = "monthly"
sigma_freq = None
sigma_per_site = True


[MCMC.ITERATIONS]
; Iteration parameters                                                                                                                       

nit  = 25
burn = 5
tune = 12


[MCMC.NCHAIN]
; Number of chains to run simultaneously. Must be >=2 to allow convergence to be checked.                                                    

nchain = 2

[MCMC.ADD_ERROR]
; Add variability in averaging period to the measurement error                                                                               

averagingerror = True

[MCMC.OUTPUT]
; Details of where to write the output

outputpath = '/user/home/bm13805/my_inversions'  ; (required)                                                                                
outputname = 'mhd_tac_test_out'  ; (required)

Attempted fixes

Changing the encoding option to None in to_netcdf (around line 600 in inversions_pymc.py) seem to fix the problem. This might not be a satisfactory fix since it turns off compression.

This issue Unidata/netcdf4-python#1175 suggests that the problem might be caused by the zlib=True option being used on "variable length data", but I'm not sure exactly where this occurs in inferpymc_postprocessouts.

Move data folder within package directory

We should move the data folder within the openghg_inversions directory to make it easier to create an installable package.

Add some basic tests

Update: Please update readme to include setup instructions.

Speed improvements for HBMCMC - extra info

The table below contains info on the runtimes for various HBMCMC setups, from the code at commit 8d2ecf1.

No filtering was applied to the 4-hourly averaged observations.

Trace parameters were set at:
nit = 250000
burn = 50000
tune = 125000

Ideally, the largest model runs (20 sites, 500 basis functions) should be brought down to 8 hours, but this target can definitely be adjusted if needed.

Inputs	Num. obs	Run ID	Run time	Memory usage
n_basis 50 n_sites = 10	1741	5411818	2H 15mins	16 GB
n_basis = 100 n_sites = 10	1741	5411819	3H 15mins	16 GB
n_basis = 500 n_sites = 10	1741	5418801	5H 50mins	20 GB

n_basis = 100 n_sites = 15	2626	5411820	4H 45mins	23 GB
n_basis = 500 n_sites = 15	2626	5418813	12H 50mins	26 GB

n_basis = 100 n_sites = 20	3377	5411821	6H 30mins	30 GB
n_basis = 200 n_sites = 20	3377	5460436	11H 30mins	30 GB
n_basis = 500 n_sites = 20	3377	5418855	16H 50mins	31 GB

This links to a larger task in the project overview: https://github.com/ACRG-Bristol/projects/issues/30

Remove site but continue inversion

Ability to remove a site without data from the inversion and continue the inversion without that site being added to acrg repo. This could be added here too.

Error: Initial Evaluation of model

An 'Initial Evaluation of model' error is being raised when using openghg_inversions. Arrays of all mu values appear from bc_prior
I think the problem is down to one of the following input variables: Hx, Hbc, Y, error, siteindicator, sigma_freq_index
rather than the choice of bc_prior.

Add a simple GitHub Actions workflow

Adding a simple GitHub Actions workflow will help make sure all the tests are passing and the package is installable.

openghg / openghg_inversions Goto Github PK

openghg_inversions's Introduction

OpenGHG - a cloud platform for greenhouse gas data analysis and collaboration

Install locally

Using pip

Using conda

Create the configuration file

Command line

Python

Additional functionality

Developers

Clone

Install dependencies

Run the tests

Running CF Checker tests

Running ICOS tests

Documentation

Community

openghg_inversions's People

Contributors

Stargazers

Watchers

openghg_inversions's Issues

Steps to recreate

Attempted fixes

Recommend Projects

Recommend Topics

Recommend Org

Using `pip`

Using `conda`