Git Product home page Git Product logo

openghg_inversions's Introduction

OpenGHG logo

OpenGHG - a cloud platform for greenhouse gas data analysis and collaboration

License OpenGHG tests

OpenGHG is a project based on the prototype HUGS platform which aims to be a platform for collaboration and analysis of greenhouse gas (GHG) data.

The platform will be built on open-source technologies and will allow researchers to collaborate on large datasets by harnessing the power and scalability of the cloud.

For more information please see our documentation.

Install locally

To run OpenGHG locally you'll need Python 3.8 or later on Linux or MacOS, we don't currently support Windows.

You can install OpenGHG using pip or conda, though conda allows the complete functionality to be accessed at once.

Using pip

To use pip, first create a virtual environment

python -m venv openghg_env

Then activate the environment

source openghg_env/bin/activate

It's best to make sure you have the most up to date versions of the packages that pip will use behind the scenes when installing OpenGHG.

pip install --upgrade pip wheel setuptools

Then we can install OpenGHG itself

pip install openghg

Each time you use OpenGHG please make sure to activate the environment using the source step above.

NOTE: Some functionality is not completely accessible when OpenGHG is installed with pip. This only affects some map regridding functionality. See the Additional Functionality section below for more information.

Using conda

To get OpenGHG installed using conda we'll first create a new environment

conda create --name openghg_env

Then activate the environment

conda activate openghg_env

Then install OpenGHG and its dependencies from our conda channel and conda-forge.

conda install --channel conda-forge --channel openghg openghg

Note: the xesmf library is already incorporated into the conda install from vx.x onwards and so does not need to be installed separately.

Create the configuration file

OpenGHG stores object store and user data in a configuration file in the user's home directory at ~/.config/openghg/openghg.conf. As this sets the path of the object store, the user must create this file in one of two ways

Command line

Using the openghg command line tool

openghg --quickstart

OpenGHG configuration
---------------------

Enter path for object store (default /home/gareth/openghg_store):
INFO:openghg.util:Creating config at /home/gareth/.config/openghg/openghg.conf

INFO:openghg.util:Configuration written to /home/gareth/.config/openghg/openghg.conf

Python

Using the create_config function from the openghg.util submodule.

from openghg.util import create_config

create_config()

OpenGHG configuration
---------------------

Enter path for object store (default /home/gareth/openghg_store):
INFO:openghg.util:Creating config at /home/gareth/.config/openghg/openghg.conf

INFO:openghg.util:Configuration written to /home/gareth/.config/openghg/openghg.conf

You will be prompted to enter the path to the object store, leaving the prompt empty tells OpenGHG to use the default path in the user's home directory at ~/openghg_store.

Additional functionality

Some optional functionality is available within OpenGHG to allow for multi-dimensional regridding of map data (openghg.tranform sub-module). This makes use of the xesmf package. This Python library is built upon underlying FORTRAN and C libraries (ESMF) which cannot be installed directly within a Python virtual environment.

To use this functionality these libraries must be installed separately. One suggestion for how to do this is as follows.

If still within the created virtual environment, exit this using

deactivate

We will need to create a conda environment to contain just the additional C and FORTRAN libraries necessary for the xesmf module (and dependencies) to run. This can be done by installing the esmf package using conda

conda create --name openghg_add esmf -c conda-forge

Then activate the Python virtual environment in the same way as above:

source openghg_env/bin/activate

Run the following lines to link the Python virtual environment to the installed dependencies, doing so by installing the esmpy Python wrapper (a dependency of xesmf):

ESMFVERSION='v'$(conda list -n openghg_add esmf | tail -n1 | awk '{print $2}')
$ export ESMFMKFILE="$(conda env list | grep openghg_add | awk '{print $2}')/lib/esmf.mk"
$ pip install "git+https://github.com/esmf-org/esmf.git@${ESMFVERSION}#subdirectory=src/addon/ESMPy/"

Note: The pip install command above for esmf module may produce an AttributeError. At present (19/07/2022) an error of this type is expected and may not mean the xesmf module cannot be installed. This error will be fixed if PR #49 is merged.

Now the dependencies have all been installed, the xesmf library can be installed within the virtual environment

pip install xesmf

Developers

If you'd like to contribute to OpenGHG please see the contributing section of our documentation. If you'd like to take a look at the source and run the tests follow the steps below.

Clone

git clone https://github.com/openghg/openghg.git

Install dependencies

We recommend you create a virtual environment first

python -m venv openghg_env

Then activate the environment

source openghg_env/bin/activate

Then install the dependencies

cd openghg
pip install --upgrade pip wheel setuptools
pip install -r requirements.txt -r requirements-dev.txt

Next you can install OpenGHG in editable mode using the -e flag. This installs the package from the local path and means any changes you make to the code will be immediately available when using the package.

pip install -e .

OpenGHG should now be installed in your virtual environment.

See above for additional steps to install the xesmf library as required.

Run the tests

To run the tests

pytest -v tests/

NOTE: Some of the tests require the udunits2 library to be installed.

The udunits package is not pip installable so we've added a separate flag to specifically run these tests. If you're on Debian / Ubuntu you can do

sudo apt-get install libudunits2-0

Running CF Checker tests

You can then run the cfchecks marked tests using

pytest -v --run-cfchecks tests/

Running ICOS tests

Some of our tests retrieve data from the ICOS Carbon Portal and so to avoid load on the ICOS severs these should not be run frequently. They should only be run when working on this functionality or before a release. These tests are marked icos with pytest.mark.

pytest -v --run-icos tests/

If all the tests pass then you're good to go. If they don't please open an issue and let us know some details about your setup.

Documentation

For further documentation and tutorials please visit our documentation.

Community

If you'd like further help or would like to talk to one of the developers of this project, please join our Gitter at gitter.im/openghg/lobby.

openghg_inversions's People

Contributors

ericsaboya avatar gareth-j avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

openghg_inversions's Issues

Specifying bc_basis_directory as an input from the .ini file

The utils.bc_sensitivity functions accepts bc_basis_directory as an optional input, but the main code is not currently written to accept bc_basis_directory as an input from the .ini file – this would be useful for non-Bristol people who have to specify a path for this.

Error caused by os.path.join command in utils.py

In utils.py :-

            #name.append(''.os.path.join(name_temp[ii]))
            name.append(os.path.join(name_temp[ii]))

The commented out line gave an error :-

AttributeError: 'str' object has no attribute 'os'

I’m not quite sure what that line is trying to do? It seems to work ok with the line that I changed it to (the uncommented line above), though.

Post-processing output sometimes fails due to "Filter error"

Sometimes the "Post-processing output" in inversions_pymc.inferpymc_postprocessouts fails due to a "Filter error":

Post-processing output
Traceback (most recent call last):
File "/user/home/bm13805/openghg_inversions/openghg_inversions/hbmcmc/run_hbmcmc.py", line 208, in
mcmc_function(**param)
File "/user/home/bm13805/openghg_inversions/openghg_inversions/hbmcmc/hbmcmc.py", line 413, in fixedbasisMCMC
mcmc.inferpymc_postprocessouts(xouts,bcouts, sigouts, convergence,
File "/user/home/bm13805/openghg_inversions/openghg_inversions/hbmcmc/inversion_pymc.py", line 604, in inferpymc_postprocessouts
outds.to_netcdf(output_filename, encoding=encoding, mode="w")
File "/user/home/bm13805/.venv/lib/python3.10/site-packages/xarray/core/dataset.py", line 1946, in to_netcdf
return to_netcdf( # type: ignore # mypy cannot resolve the overloads:(
File "/user/home/bm13805/.venv/lib/python3.10/site-packages/xarray/backends/api.py", line 1272, in to_netcdf
dump_to_store(
File "/user/home/bm13805/.venv/lib/python3.10/site-packages/xarray/backends/api.py", line 1319, in dump_to_store
store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
File "/user/home/bm13805/.venv/lib/python3.10/site-packages/xarray/backends/common.py", line 278, in store
self.set_variables(
File "/user/home/bm13805/.venv/lib/python3.10/site-packages/xarray/backends/common.py", line 316, in set_variables
target, source = self.prepare_variable(
File "/user/home/bm13805/.venv/lib/python3.10/site-packages/xarray/backends/netCDF4_.py", line 495, in prepare_variable
nc4_var = self.ds.createVariable(
File "src/netCDF4/_netCDF4.pyx", line 2962, in netCDF4._netCDF4.Dataset.createVariable
File "src/netCDF4/_netCDF4.pyx", line 4202, in netCDF4._netCDF4.Variable.init
File "src/netCDF4/_netCDF4.pyx", line 2029, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: Filter error: bad id or parameters or duplicate filter

Steps to recreate

The following .ini file was used to run a small inversion on the login and compute notes of Blue Pebble.
The obs/ancillary data is from the object store in /group/chemistry/acrg/ES_store.

; Configuration file for HBMCMC code                                                                                                         

[INPUT.MEASUREMENTS]
; Input values for extracting observations                                                                                                   

species     = 'ch4'   ; (required)                                                                                                           
sites       = ["MHD", "TAC"] ; (required)                                                                                                    
meas_period = ["4H","4H"]  ; (required)                                                                                                      
start_date  = '2019-01-01'      ; (required - but can be specified on command line instead)                                                  
end_date    = '2019-02-01'      ; (required - but can be specified on command line instead)                                                  

inlet         = ["10m", "185m"]
instrument    = ["gcmd", "picarro"]
filters = ["local_influence"]

[INPUT.PRIORS]
; Input values for extracting footprints, emissions and boundary conditions files (also uses values from INPUT.MEASUREMENTS)                 

domain = "EUROPE"       ; (required)                                                                                                         
met_model = 'UKV'
fp_model = "NAME"
fp_height = ["10m", "185m"]
emissions_name = ["total-ukghg-edgar7"]

[INPUT.BASIS_CASE]
bc_basis_case  = "NESW"
bc_basis_directory = "/group/chemistry/acrg/LPDM/bc_basis_functions/"
fp_basis_case  = None ;'16x16_EUROPE_2012.nc'                                                                                                
quadtree_basis = True
nbasis = 50
basis_directory = "/group/chemistry/acrg/LPDM/basis_functions/"
country_file = None

[MCMC.TYPE]
; Which MCMC setup to use. This defines the function which will be called and the expected inputs.                                           

mcmc_type = "fixed_basis"

[MCMC.PDF]
; Definitions of PDF shape and parameters for inputs                                                                                         

xprior   = {"pdf":"lognormal", "mu":0.15005, "sigma":0.51083}
bcprior  = {"pdf":"lognormal", "mu":0.15005, "sigma":0.51083}
sigprior = {"pdf":"lognormal", "mu":-0.98704, "sigma":0.587787}

[MCMC.BC_SPLIT]
bc_freq    = "monthly"
sigma_freq = None
sigma_per_site = True


[MCMC.ITERATIONS]
; Iteration parameters                                                                                                                       

nit  = 25
burn = 5
tune = 12


[MCMC.NCHAIN]
; Number of chains to run simultaneously. Must be >=2 to allow convergence to be checked.                                                    

nchain = 2

[MCMC.ADD_ERROR]
; Add variability in averaging period to the measurement error                                                                               

averagingerror = True

[MCMC.OUTPUT]
; Details of where to write the output

outputpath = '/user/home/bm13805/my_inversions'  ; (required)                                                                                
outputname = 'mhd_tac_test_out'  ; (required)                                                                                                

Attempted fixes

Changing the encoding option to None in to_netcdf (around line 600 in inversions_pymc.py) seem to fix the problem. This might not be a satisfactory fix since it turns off compression.

This issue Unidata/netcdf4-python#1175 suggests that the problem might be caused by the zlib=True option being used on "variable length data", but I'm not sure exactly where this occurs in inferpymc_postprocessouts.

Speed improvements for HBMCMC - extra info

The table below contains info on the runtimes for various HBMCMC setups, from the code at commit 8d2ecf1.

No filtering was applied to the 4-hourly averaged observations.

Trace parameters were set at:
nit = 250000
burn = 50000
tune = 125000

Ideally, the largest model runs (20 sites, 500 basis functions) should be brought down to 8 hours, but this target can definitely be adjusted if needed.

Inputs Num. obs Run ID Run time Memory usage
n_basis 50 n_sites = 10 1741 5411818 2H 15mins 16 GB
n_basis = 100 n_sites = 10 1741 5411819 3H 15mins 16 GB
n_basis = 500 n_sites = 10 1741 5418801 5H 50mins 20 GB
         
n_basis = 100 n_sites = 15 2626 5411820 4H 45mins 23 GB
n_basis = 500 n_sites = 15 2626 5418813 12H 50mins 26 GB
         
n_basis = 100 n_sites = 20 3377 5411821 6H 30mins 30 GB
n_basis = 200 n_sites = 20 3377 5460436 11H 30mins 30 GB
n_basis = 500 n_sites = 20 3377 5418855 16H 50mins 31 GB

This links to a larger task in the project overview: https://github.com/ACRG-Bristol/projects/issues/30

Remove site but continue inversion

Ability to remove a site without data from the inversion and continue the inversion without that site being added to acrg repo. This could be added here too.

Error: Initial Evaluation of model

An 'Initial Evaluation of model' error is being raised when using openghg_inversions. Arrays of all mu values appear from bc_prior
I think the problem is down to one of the following input variables: Hx, Hbc, Y, error, siteindicator, sigma_freq_index
rather than the choice of bc_prior.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.