znicholls / netcdf-scm Goto Github PK

View Code? Open in Web Editor NEW

5.0 4.0 2.0 293.2 MB

Simple wrappers for processing netcdf files for use in simple climate models

Home Page: https://netcdf-scm.readthedocs.io/en/latest/

climate netcdf netcdf4 file-processing climate-science climate-models climate-analysis

netcdf-scm's Issues

Units in wrangling

Is your feature request related to a problem? Please describe.

The units of the wrangled files always match the source files. This isn't always ideal.

Describe the solution you'd like

Add the ability to specify units in wrangling. If the source file's units are "/m2" then the wrangler should automatically take the area sum of the input variable.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

Use openscm style Dataframes

Is your feature request related to a problem? Please describe.

The dataframes returned at the moment aren't in the openscm style.

Describe the solution you'd like

Update to use openscm dataframe style.

Describe alternatives you've considered

Leave it as is. This will lead to annoying format conversions being necessary all the time.

Additional context

The openscm project aims to unify region naming, units handling etc. to make running simple climate models much simpler for all involved. Matching this standard is a good idea.

General cleanup

Things are pretty messy and definitely need to be split up and tidied

Automatically write conda recipe

Builds on #9. At the moment our conda packaging process is quite manual and it's easy to get dependencies wrong. Improving this so our conda recipe is automatically written/checked would be good.

Testing plotting

Is your feature request related to a problem? Please describe.

At the moment we don't have a way to test our plots.

Describe the solution you'd like

A way to automatically test plots. I think copying what pyam or iris someone similar have done is the best idea.

Describe alternatives you've considered

Always running the notebooks. I think if we can actually compare plots it's a better check though.

Discussion: build docs with conda?

Available here https://docs.readthedocs.io/en/latest/conda.html

Not sure if the extra build time is worth it just to be able to undo the import hacks...

convert_scmdf_to_tuningstruc: reverse dimensions of data

The convert_scmdf_to_tuningstruc routine writes the .mat "data" field with the dimensions around the wrong way.

In the final .mat structure, the field "data" should consist of two columns (years,data), but instead it is two rows. Should be an easy transpose fix :)

Please put the code (ideally in the form of a unit
test) which fails below

Expected behavior

A clear and concise description of what you expected to happen.

Screenshots

If applicable, add screenshots to help explain your problem.

System (please complete the following information):

OS: [e.g. Windows, Linux, macOS]
Python version [e.g. Python 3.5] and output of conda list --export and pip freeze as applicable

Additional context

Add any other context about the problem here.

Cannot read ACCESS CMIP5 file

Describe the bug

When crunching the ACCESS1-0 CMIP5 data, the following error is thrown

'real_datetime' object has no attribute 'nanosecond'

Failing Test

def read_access_data(access_data_path):
    cube = MarbleCMIP5Cube()
    cube.load_data_from_path(TEST_ACCESS_CMIP5_FILE)

Expected behavior

NetCDF-SCM should be able to read this file.

Screenshots

If applicable, add screenshots to help explain your problem.

System (please complete the following information):

OS: macOS
Python version [e.g. Python 3.5] and output of conda list --export and pip freeze as applicable

$ python --version
Python 3.6.6

$ conda list --export
# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: osx-64
asn1crypto=0.24.0=py36_1003
beautifulsoup4=4.6.3=py36_1000
blas=1.0=mkl
bokeh=1.0.1=py36_1000
bzip2=1.0.6=1
ca-certificates=2018.10.15=ha4d7672_0
cartopy=0.16.0=py36h81b52dc_2
certifi=2018.10.15=py36_1000
cf_units=2.0.1=py36h7eb728f_2
cffi=1.11.5=py36h5e8e0c9_1
cftime=1.0.2.1=py36h7eb728f_0
chardet=3.0.4=py36_1003
click=7.0=py_0
cloudpickle=0.6.1=py_0
conda=4.5.11=py36_1000
conda-build=3.16.2=py36_0
conda-env=2.6.0=1
cryptography=2.3.1=py36hdffb7b8_0
cryptography-vectors=2.3.1=py36_1000
curl=7.62.0=h74213dd_0
cycler=0.10.0=py_1
cytoolz=0.9.0.1=py36h470a237_1
dask=0.20.1=py_0
dask-core=0.20.1=py_0
distributed=1.24.1=py36_1000
expat=2.2.5=hfc679d8_2
filelock=3.0.10=py_0
freetype=2.9.1=h6debe1e_4
geos=3.6.2=hfc679d8_3
glob2=0.6=py_0
hdf4=4.2.13=h951d187_2
hdf5=1.10.3=hc401514_2
heapdict=1.0.0=py36_1000
icu=58.2=hfc679d8_0
idna=2.7=py36_1002
intel-openmp=2019.0=118
iris=2.2.0=py36_1
jinja2=2.10=py_1
jpeg=9c=h470a237_1
kiwisolver=1.0.1=py36h2d50403_2
krb5=1.16.2=hbb41f41_0
libarchive=3.3.3=h823be47_0
libcurl=7.62.0=hbdb9355_0
libcxx=7.0.0=h2d50403_2
libedit=3.1.20170329=haf1bffa_1
libffi=3.2.1=hfc679d8_5
libgfortran=3.0.1=h93005f0_2
libiconv=1.15=h470a237_3
libnetcdf=4.6.1=h350cafa_11
libpng=1.6.35=ha92aebf_2
libssh2=1.8.0=h5b517e9_2
libtiff=4.0.9=he6b73bb_2
libxml2=2.9.8=h422b904_5
libxslt=1.1.32=h88dbc4e_2
llvm-meta=7.0.0=0
locket=0.2.0=py_2
lxml=4.2.5=py36hc9114bc_0
markupsafe=1.1.0=py36h470a237_0
matplotlib=3.0.1=1
matplotlib-base=3.0.1=py36h45c993b_1
mkl=2018.0.3=1
mkl_fft=1.0.6=py36_0
mkl_random=1.0.2=py36_0
msgpack-python=0.5.6=py36h2d50403_3
nc-time-axis=1.1.0=py_0
ncurses=6.1=hfc679d8_1
netcdf4=1.4.2=py36hac939d9_0
numpy=1.15.4=py36h6a91979_0
numpy-base=1.15.4=py36h8a80b8c_0
olefile=0.46=py_0
openssl=1.0.2p=h470a237_1
owslib=0.17.0=py_0
packaging=18.0=py_0
pandas=0.23.4=py36hf8a1672_0
partd=0.3.9=py_0
pillow=5.3.0=py36hc736899_0
pip=18.1=py36_1000
pkginfo=1.4.2=py_1
proj4=4.9.3=h470a237_8
psutil=5.4.8=py36h470a237_0
pycosat=0.6.3=py36h470a237_1
pycparser=2.19=py_0
pyepsg=0.3.2=py_1
pyke=1.1.1=py36_1000
pyopenssl=18.0.0=py36_1000
pyparsing=2.3.0=py_0
pyproj=1.9.5.1=py36h508ed2a_6
pyshp=2.0.0=py_0
pysocks=1.6.8=py36_1002
python=3.6.6=h5001a0f_0
python-dateutil=2.7.5=py_0
python-libarchive-c=2.8=py36_1004
pytz=2018.7=py_0
pyyaml=3.13=py36h470a237_1
readline=7.0=haf1bffa_1
requests=2.20.0=py36_1000
ruamel_yaml=0.15.71=py36h470a237_0
scipy=1.1.0=py36h28f7352_1
setuptools=40.5.0=py36_0
shapely=1.6.4=py36h164cb2d_1
six=1.11.0=py36_1001
sortedcontainers=2.0.5=py_0
sqlite=3.25.3=hb1c47c0_0
tblib=1.3.2=py_1
tk=8.6.8=ha92aebf_0
toolz=0.9.0=py_1
tornado=5.1.1=py36h470a237_0
tqdm=4.28.1=py_0
udunits2=2.2.27.6=h3a4f0e9_1
urllib3=1.23=py36_1001
wheel=0.32.2=py36_0
xz=5.2.4=h470a237_1
yaml=0.1.7=h470a237_1
zict=0.1.3=py_0
zlib=1.2.11=h470a237_3

$ pip freeze
alabaster==0.7.12
appdirs==1.4.3
appnope==0.1.0
asn1crypto==0.24.0
atomicwrites==1.2.1
attrs==18.2.0
Babel==2.6.0
backcall==0.1.0
beautifulsoup4==4.6.3
black==18.9b0
bleach==3.0.2
bokeh==1.0.1
Cartopy==0.16.0
certifi==2018.10.15
cf-units==2.0.1
cffi==1.11.5
cftime==1.0.2.1
chardet==3.0.4
Click==7.0
cloudpickle==0.6.1
codecov==2.0.15
conda==4.5.11
conda-build==3.16.2
coverage==4.5.1
cryptography==2.3.1
cryptography-vectors==2.3.1
cycler==0.10.0
cytoolz==0.9.0.1
dask==0.20.1
decorator==4.3.0
defusedxml==0.5.0
distributed==1.24.1
docutils==0.14
entrypoints==0.2.3
ExpectException==0.1.1
f90nml==1.0.2
filelock==3.0.10
flake8==3.6.0
glob2==0.6
heapdict==1.0.0
idna==2.7
imagesize==1.1.0
ipykernel==5.1.0
ipython==7.1.1
ipython-genutils==0.2.0
jedi==0.13.1
Jinja2==2.10
jsonschema==2.6.0
jupyter-client==5.2.3
jupyter-core==4.4.0
kiwisolver==1.0.1
libarchive-c==2.8
locket==0.2.0
lxml==4.2.5
MarkupSafe==1.1.0
matplotlib==3.0.1
mccabe==0.6.1
mistune==0.8.4
mkl-fft==1.0.6
mkl-random==1.0.2
more-itertools==4.3.0
msgpack==0.5.6
nbconvert==5.4.0
nbformat==4.4.0
nbresuse==0.3.0
nbval==0.9.1
nc-time-axis==1.1.0
-e [email protected]:znicholls/netcdf-scm.git@7968219efcec5b4cff0d7473166b07b758ae58ae#egg=netcdf_scm
netCDF4==1.4.2
notebook==5.7.0
numpy==1.15.4
olefile==0.46
OWSLib==0.17.0
packaging==18.0
pandas==0.23.4
pandas-datapackage-reader==0.11.1
pandocfilters==1.4.2
parso==0.3.1
partd==0.3.9
pexpect==4.6.0
pickleshare==0.7.5
Pillow==5.3.0
pkginfo==1.4.2
pluggy==0.8.0
progressbar2==3.38.0
prometheus-client==0.4.2
prompt-toolkit==2.0.7
psutil==5.4.8
ptyprocess==0.6.0
py==1.7.0
pyam-iamc==0.1.1
pycodestyle==2.4.0
pycosat==0.6.3
pycparser==2.19
pyepsg==0.3.2
pyflakes==2.0.0
Pygments==2.2.0
pyke==1.1.1
pymagicc==2.0.0a0
pyOpenSSL==18.0.0
pyparsing==2.3.0
pyproj==1.9.5.1
pyshp==2.0.0
PySocks==1.6.8
pytest==3.10.0
pytest-cov==2.6.0
python-dateutil==2.7.5
python-utils==2.3.0
pytz==2018.7
PyYAML==3.13
pyzmq==17.1.2
readme-renderer==24.0
requests==2.20.0
requests-toolbelt==0.8.0
ruamel-yaml==0.15.71
scipy==1.1.0
scitools-iris==2.2.0
seaborn==0.9.0
Send2Trash==1.5.0
Shapely==1.6.4.post1
six==1.11.0
snowballstemmer==1.2.1
sortedcontainers==2.0.5
Sphinx==1.8.1
sphinx-rtd-theme==0.4.2
sphinxcontrib-websupport==1.1.0
tblib==1.3.2
terminado==0.8.1
testpath==0.4.2
toml==0.10.0
toolz==0.9.0
tornado==5.1.1
tqdm==4.28.1
traitlets==4.3.2
twine==1.12.1
urllib3==1.23
versioneer==0.18
wcwidth==0.1.7
webencodings==0.5.1
xlrd==1.1.0
XlsxWriter==1.1.2
zict==0.1.3

Additional context

Add any other context about the problem here.

Validate with CMIP5 output from AR5 WGI report

Yearly tas timeseries are available in the appendix of AR5 WGI report:

http://www.climatechange2013.org/report/full-report/

Atlas of Global and Regional Climate Projections 44.7MB, EndNote/BibTeX
Supplementary Material: RCP2.6, RCP4.5, RCP6.0, RCP8.5, datafiles
IPCC WGI AR5 Atlas datasets included in KNMI Climate Change Atlas

ZIP Download:
http://www.climatechange2013.org/images/report/WGIAR5_AnnexI_all.zip

From the PDF:

Data and Processing:

The figures have been constructed using the CMIP5 model output available at the time of the AR5 cut-off for accepted papers (15 March 2013). This data set comprises 32/42/25/39
scenario experiments for RCP2.6/4.5/6.0/8.5 from 42 climate models (Table AI.SM2.6.1). Only concentration-driven experiments are used (i.e., those in which concentrations rather than emissions of green-house gases are prescribed) and only one ensemble member from each model is selected, even if multiple realizations exist with different initial conditions and different realizations of natural variability. Hence each model is given equal weight.

It might be possible to get the used model/run combo from this table

Table AI.SM2.6.1 |
The CMIP5 models used in this Annex for each of the historical and RCP scenario experiments. A number in each column is the identifier of the single ensemble
member from that model that is used. A blank indicates no run was used, usually because that scenario run was not available.

If you have ideas on how to include this as a verification of your code, let me know @znicholls !

Fancy database checking

Is your feature request related to a problem? Please describe.

If we've crunched a bunch of files, there's no easy way to tell if they're up to date. For example, did they make assumptions about cell areas which are no longer necessary as we now have the cell area file.

Describe the solution you'd like

We should be able to do something like netcdf-scm-crunch --check-up-to-date <src> <crunch-path> --database-file <database-file> and get output which tells us which files are not up to date and why (new data file is available, new metadata file is available etc.). We might also want to make the --force flag more flexible so it's not just force or not but is rather force, update if outdated or don't force (with default still being don't force).

Describe alternatives you've considered

Force crunch repeatedly, super slow...

Additional context

Required to make our CMIP6 crunching sane.

Land fraction should check bounds before running

Describe the bug

At the moment, when crunching with e.g. get_scm_timeseries, land_mask_threshold expects a percentage (i.e. 0-100) value. Sometimes data is provided with a fraction (i.e. 0-1) value. get_scm_timeseries should do some sort of sensible test.

Failing Test

tas.get_scm_timeseries(
sftlf_cube=sftlf,
land_mask_threshold=50,
areacella_scmcube=None
)

on a cube which has sftlf values from 0-1

Expected behavior

A warning that the threshold looks wrong.

Screenshots

If applicable, add screenshots to help explain your problem.

System (please complete the following information):

OS: macOS
Python version 3.7

Additional context

Add any other context about the problem here.

Reduce setup duplication

Is your feature request related to a problem? Please describe.

At the moment the setup instructions are in three places: Makefile, .travis.yml and README.rst/docs/source/development.rst (depending on whether minimal or for development). This sort of duplication is extremely error prone.

Describe the solution you'd like

Some way to write these setup instructions in a 'master' file, from which the other files are automatically generated. I'm not sure how exactly this would work/look yet.

Describe alternatives you've considered

Continue to try to maintain these three places concurrently. It feels somehow dangerous...

Keep effective areas in crunched metadata

Is your feature request related to a problem? Please describe.

To take area sums of variables, we need to know the effective area of the region in the model.

Describe the solution you'd like

Keep the effective area metadata when crunching.

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context or screenshots about the feature request here.

Migrate to GitLab

We've hit our bandwith limit on git lfs storage. This means we're locked out. GitLab seems to have nowhere near as strict limits so I'm going to move there instead (it also has way better CI support for Docker containers etc. so will speed that up a lot too).

Medium term to-dos

close #7
update pip
update conda
JOSS paper
release next version of pymagicc that includes pymagicc.io
make pymagicc and openscm conda installable so that netcdf-scm will actually run

Guess land bounds if no sftlf

Is your feature request related to a problem? Please describe.

If we don't have an sftlf file, we only get NH/SH split and no ocean/land.

Describe the solution you'd like

We should be able to guess land fractions in the absence of an sftlf file (output tracker can tell us what we used or not)

Describe alternatives you've considered

A clear and concise description of any alternative solutions or features you've considered.

Additional context

Becomes very fail safe once #51 is merged.

Netcdf format

@lewisjared what do you think of something like the following for writing?

import netcdf_scm
ts = test_cube.get_scm_timeseries()

from iris.pandas import as_cube
def to_cube(series, calendars={1: cf_units.CALENDAR_GREGORIAN}):
    cube = as_cube(series, calendars=calendars)
    cube.coord("index").rename("time")
    return cube

def get_cubes(df, calendars={1: cf_units.CALENDAR_GREGORIAN}):
    cube_list = []
    for i, (v, c) in enumerate(ts.timeseries().iterrows()):
        cube = to_cube(c, calendars=calendars)
        cube.units = df.index.get_level_values("unit")[i]
        cube.long_name = df.index.get_level_values("variable")[i]
        metadata = {
            level: df.index.get_level_values(level)[i]
            for level in df.index.names
            if level not in ["unit", "variable"]
        }

        metadata["netcdf_scm_info"] = "netcdf-scm v{}".format(netcdf_scm.__version__)
        metadata.update(test_cube.cube.attributes)
        cube.attributes = metadata

        # we can add parameters like so although it might be smarter
        # to add them as variables with dimension `run_no` so the mapping
        # is simpler later...
        cube.add_aux_coord(iris.coords.AuxCoord(
            1.3, 
            long_name="example_scalar", 
            units="K"
        ))
        cube.add_aux_coord(iris.coords.AuxCoord(
            "string here", 
            long_name="example_generic", 
            units="no_unit"
        ))

        cube_list.append(cube)

    return cube_list

cube_list = get_cubes(ts.timeseries())
iris.save(
    cube_list, 
    "tmp.nc", 
    local_keys=ts.timeseries().index.names  # this lets us separate timeseries metadata from 'info'
)

It gives something like the below. It would be horrible to read without our readers but I think reading should be pretty trivial cause we know what to expect (loop over variables, store data and metadata for timeseries, get parameters, get info, make ScmDataFrame instance). The thing I'm really not sure about is how to save e.g. parameters vs. just metadata? Should we just put all the parameters in as variables and add an extra dimension, run_no, which we can use to save them?

$ ncdump -h tmp.nc 
netcdf tmp {
dimensions:
	time = 120 ;
	string11 = 11 ;
variables:
	double toa_outgoing_longwave_flux(time) ;
		toa_outgoing_longwave_flux:long_name = "toa_outgoing_longwave_flux" ;
		toa_outgoing_longwave_flux:units = "W m^-2" ;
		toa_outgoing_longwave_flux:activity_id = "CMIP" ;
		toa_outgoing_longwave_flux:climate_model = "BCC-CSM2-MR" ;
		toa_outgoing_longwave_flux:member_id = "r1i1p1f1" ;
		toa_outgoing_longwave_flux:model = "unspecified" ;
		toa_outgoing_longwave_flux:region = "World" ;
		toa_outgoing_longwave_flux:scenario = "1pctCO2" ;
		toa_outgoing_longwave_flux:coordinates = "example_generic example_scalar" ;
	double time(time) ;
		time:axis = "T" ;
		time:units = "hours since 1970-01-01 00:00:00" ;
		time:standard_name = "time" ;
		time:calendar = "gregorian" ;
	char example_generic(string11) ;
		example_generic:units = "no_unit" ;
		example_generic:long_name = "example_generic" ;
	double example_scalar ;
		example_scalar:units = "K" ;
		example_scalar:long_name = "example_scalar" ;
	double toa_outgoing_longwave_flux_0(time) ;
		toa_outgoing_longwave_flux_0:long_name = "toa_outgoing_longwave_flux" ;
		toa_outgoing_longwave_flux_0:units = "W m^-2" ;
		toa_outgoing_longwave_flux_0:activity_id = "CMIP" ;
		toa_outgoing_longwave_flux_0:climate_model = "BCC-CSM2-MR" ;
		toa_outgoing_longwave_flux_0:member_id = "r1i1p1f1" ;
		toa_outgoing_longwave_flux_0:model = "unspecified" ;
		toa_outgoing_longwave_flux_0:region = "World|Northern Hemisphere" ;
		toa_outgoing_longwave_flux_0:scenario = "1pctCO2" ;
		toa_outgoing_longwave_flux_0:coordinates = "example_generic example_scalar" ;
	double toa_outgoing_longwave_flux_1(time) ;
		toa_outgoing_longwave_flux_1:long_name = "toa_outgoing_longwave_flux" ;
		toa_outgoing_longwave_flux_1:units = "W m^-2" ;
		toa_outgoing_longwave_flux_1:activity_id = "CMIP" ;
		toa_outgoing_longwave_flux_1:climate_model = "BCC-CSM2-MR" ;
		toa_outgoing_longwave_flux_1:member_id = "r1i1p1f1" ;
		toa_outgoing_longwave_flux_1:model = "unspecified" ;
		toa_outgoing_longwave_flux_1:region = "World|Southern Hemisphere" ;
		toa_outgoing_longwave_flux_1:scenario = "1pctCO2" ;
		toa_outgoing_longwave_flux_1:coordinates = "example_generic example_scalar" ;

// global attributes:
		:branch_method = "branch" ;
		:branch_time_in_child = 0. ;
		:branch_time_in_parent = 0. ;
		:cmor_version = "3.3.2" ;
		:comment = "at the top of the atmosphere (to be compared with satellite measurements)" ;
		:contact = "Dr. Tongwen Wu ([email protected])" ;
		:creation_date = "2018-10-15T06:27:37Z" ;
		:data_specs_version = "01.00.27" ;
		:description = "DECK: 1pctCO2" ;
		:experiment = "1 percent per year increase in CO2" ;
		:experiment_id = "1pctCO2" ;
		:external_variables = "areacella" ;
		:forcing_index = 1 ;
		:frequency = "mon" ;
		:further_info_url = "https://furtherinfo.es-doc.org/CMIP6.BCC.BCC-CSM2-MR.1pctCO2.none.r1i1p1f1" ;
		:grid = "T106" ;
		:grid_label = "gn" ;
		:history = "2018-10-15T06:27:35Z ; CMOR rewrote data to be consistent with CMIP6, CF-1.7 CMIP-6.2 and CF standards." ;
		:initialization_index = 1 ;
		:institution = "Beijing Climate Center, Beijing 100081, China" ;
		:institution_id = "BCC" ;
		:license = "CMIP6 model data produced by BCC is licensed under a Creative Commons Attribution ShareAlike 4.0 International License (https://creativecommons.org/licenses). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file) and at https:///pcmdi.llnl.gov/. The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law." ;
		:mip_era = "CMIP6" ;
		:netcdf_scm_info = "netcdf-scm v1.0.0+41.gbb3285d.dirty" ;
		:nominal_resolution = "100 km" ;
		:original_name = "FLUT" ;
		:parent_activity_id = "CMIP" ;
		:parent_experiment_id = "piControl" ;
		:parent_mip_era = "CMIP6" ;
		:parent_source_id = "BCC-CSM2-MR" ;
		:parent_time_units = "days since 1850-01-01" ;
		:parent_variant_label = "r1i1p1f1" ;
		:physics_index = 1 ;
		:product = "model-output" ;
		:realization_index = 1 ;
		:realm = "atmos" ;
		:references = "Model described by Tongwen Wu et al. (JGR 2013; JMR 2014; submmitted to GMD,2018). Also see http://forecast.bcccsm.ncc-cma.net/htm" ;
		:run_variant = "forcing: GHG" ;
		:source = "BCC-CSM 2 MR (2017):   aerosol: none  atmos: BCC_AGCM3_MR (T106; 320 x 160 longitude/latitude; 46 levels; top level 1.46 hPa)  atmosChem: none  land: BCC_AVIM2  landIce: none  ocean: MOM4 (1/3 deg 10S-10N, 1/3-1 deg 10-30 N/S, and 1 deg in high latitudes; 360 x 232 longitude/latitude; 40 levels; top grid cell 0-10 m)  ocnBgchem: none  seaIce: SIS2" ;
		:source_id = "BCC-CSM2-MR" ;
		:source_type = "AOGCM" ;
		:sub_experiment = "none" ;
		:sub_experiment_id = "none" ;
		:table_id = "Amon" ;
		:table_info = "Creation Date:(30 July 2018) MD5:e53ff52009d0b97d9d867dc12b6096c7" ;
		:title = "BCC-CSM2-MR output prepared for CMIP6" ;
		:tracking_id = "hdl:21.14100/4f3fdd6c-bef7-4ec0-a692-5ca29f51e1ba" ;
		:variable_id = "rlut" ;
		:variant_label = "r1i1p1f1" ;
		:Conventions = "CF-1.5" ;
}

Logging refresh

Is your feature request related to a problem? Please describe.

We need a way to keep track of the files used when crunching to CSV. This provides traceability and the ability to perform smart updating in future when new land area masks become available.

These files will replace the *-failures-and-warnings.txt files written in the root of a dataset.

Describe the solution you'd like

Database of crunched files which is flushed to disk after every crunch. Entries include updated_at, in_files, out_file
Database becomes the authority regarding recrunching
Smarter progress bars
Log everything to disk

JOSS paper

Command line interface

Is your feature request related to a problem? Please describe.

Running this code always through a Python session or a notebook is overly slow, a command line interface would super nice.

Describe the solution you'd like

Use the packages described in this blog post and also add in a few other tricks they recommend (making sure to update docs as well as CI etc. when doing this).

Describe alternatives you've considered

No command line options...

Automatic file namer

Is your feature request related to a problem? Please describe.

When making CMIP data reference syntax compliant files, you need to make the file, set the filename and set the filepath all correctly. However, you can actually deduce the filename and path simply from the file's attributes. Adding this feature would eliminate duplication in the file production process and make generating compliant, properly named files much easier.

Describe the solution you'd like

It would be great to just be able to make a netCDF file, then have it be named and put in a directory automatically based off the file attributes.

Optimise wrangling

Is your feature request related to a problem? Please describe.

The wrangling hasn’t been optimised so is unnecessarily slow.

Describe the solution you'd like

Profile the wrangling functions then optimise based on the results. From brief experience I think parallelisation needs to be done in separate processes as most time is spent interpolating and filtering, not reading data off disk.

Describe alternatives you've considered

Put up with it, relative to crunching it’s still quick.

Additional context

Add any other context or screenshots about the feature request here.

Easier extensability

Is your feature request related to a problem? Please describe.

Adding new cubes requires writing a bunch of new tests and unreasonable amounts of duplication.

Describe the solution you'd like

To be able to define a new cube simply by giving it a name and giving an example of its directory structure and file name construction. A series of basic tests should be automatically built.

Describe alternatives you've considered

Continue with adding lots of tests by hand - seems cumbersome and error prone.

thetao crunching

We can't yet crunch 3D ocean data. We should be able to (wrangling is another question).

Weights normalisation

The weights come back with a range of normalisations and it's a pain. The weights should really all come back in the range [0, 1] and then there should be separate methods for working out areas etc.

license to use

I have no idea about this, @rgieseke or @lewisjared do you have an idea about what's best?

Parallel get_scm_timeseries_cubes

Is your feature request related to a problem? Please describe.

ScmCube.get_scm_timeseries_cubes crunches each of the regional timeseries in serial. On machines with multiple cores, this is a slow way to do this.

Describe the solution you'd like

ScmCube.get_scm_timeseries_cubes should crunch each of the regional timeseries cubes in parallel if the resources are available. The implementation needs to be clever if this is to actually provide a performance boost. In particular:

the implementation needs to dynamically decide whether the parallelisation is worth it or not. For small datasets, the overhead of parallelisation is not worth it.
the implementation needs to fall back to serial, lazy crunching if the computer's memory blows
the implementation needs to avoid pickling self in order to do the parallelisation. This will require a solution like this or this

Describe alternatives you've considered

Just leave things being serial.

Additional context

Add any other context or screenshots about the feature request here.

Docs

Is your feature request related to a problem? Please describe.

At the moment we don't have any docs or a way to automatically build them.

Describe the solution you'd like

I'd like to use sphinx, but just have no idea how so if anyone wants to take that on it would be very welcome. It would be great if we could include automatic testing of examples in docstrings.

Describe alternatives you've considered

N/A

Make conda installable

Is your feature request related to a problem? Please describe.

Installing purely with pip is super hard because of iris' dependencies.

Describe the solution you'd like

Add ability to install with conda.

Describe alternatives you've considered

Not installing with conda and forcing users to install with pip. Given I didn't do this, bad idea.

Masking 2.0

The entire masking scheme needs to be re-thought. The masks should not be based on allocating cells to boxes, rather by applying area * surface land fraction (or area * (1 - surface land fraction) where ocean boxes are sought) weights to the raw data when taking means. This requires some realm awareness too i.e. when the realm is ocean, the land boxes shouldn't be available; when the realm is land, the ocean boxes shouldn't be available; when the realm is atmos, the weightings should work such that an area weighted mean of e.g. World|Land and World|Ocean gives the same as World.

Move test nc files to lfs

https://github.com/git-lfs/git-lfs/wiki/Tutorial#migrating-existing-repository-data-to-lfs

Follow advice here (git-lfs/git-lfs#3238 (comment)) to ensure that gitattributes file is consistent across all history before doing migration

znicholls / netcdf-scm Goto Github PK

netcdf-scm's Issues

Recommend Projects

Recommend Topics

Recommend Org