gribscan / gribscan Goto Github PK

View Code? Open in Web Editor NEW

9.0 3.0 2.0 520 KB

Create indices for GRIB files and provide an xarray interface.

Home Page: https://gribscan.readthedocs.io/

License: MIT License

Python 100.00%

fsspec grib zarr

gribscan's Issues

Test gribscan on latest FDB/eccodes grib output (including Healpix)

I leave this as an issue here since I could not find time yet to test this extensively. Wondering whether #26 also just works with our final IFS-FESOM output for the March hackathon. This includes Healpix for IFS and also for FESOM (new o2d and o3d .data files).

Example data for 30 November 2026. For several reasons this had to use ecCodes 2.32.5, by the way, and I am wondering whether this will lead to issues when trying to load in python with older versions.

Healpix grid:

/work/bm1235/b382776/cycle4/fdb/healpix/root/d1:climate-dt:ScenarioMIP:SSP3-7.0:1:IFS-FESOM:1:hz9o:clte:20261130/

Regular 0.25 grid:

/work/bm1235/b382776/cycle4/fdb/latlon/root/d1:climate-dt:ScenarioMIP:SSP3-7.0:1:IFS-FESOM:1:hz9o:clte:20261130/

Native resolution grid:

/work/bm1235/b382776/cycle4/fdb/native/root/d1:climate-dt:ScenarioMIP:SSP3-7.0:1:IFS-FESOM:1:hz9o:clte:20261130/

new version in kerchunk

I created fsspec/kerchunk#198 , with some of the simpler ideas from here. I am not suggesting that my version should be a replacement for this repo, but it's good, I think, to have something simple in the main kerchunk repo too. Specifically, it creates one output reference set per grib message and does not do any combine - so no magician required. The user can pass such a list or set of lists from multiple files to MultiZarrToZarr and do a combine like that. The example shows this idea in action across several HRRR forecast files. Note that the coordinates are not "simple" but curved, so we don't attempt to generate them (which is a shame!), but rely on eccodes to fetch them from the first valid message.

Merge fixes for IFS output

During the preparation of the NextGEMS hackathon several fixes for IFS output have been tested that didn't find there way back to gribscan, yet. Some of these fixes seem to be useful for other use cases as well.

Support for regular_ll (implement in a consistent way as #22)
Do not fail for unknown GRIB definitions

Add contribution guideline

Add a contribution guideline that among other things should cover:

general ways to contribute to gribscan
a request for first-time contributor to add themselves to the CITATION.cff

Write_index fails b/c of `grib_handle` issue if cfgrib version below `cfgrib-0.9.10.1`

Issue

Any attempt to read a GRIB1 or GRIB2 file with write_index returns the following error:

gs.write_index("rtma_2011022404.grib2")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/gribscan/gribscan.py", line 275, in write_index
    for record in gen:
  File "/usr/local/lib/python3.8/dist-packages/gribscan/gribscan.py", line 232, in scan_gribfile
    global_attrs = {k: m[k] for k in cfgrib.dataset.GLOBAL_ATTRIBUTES_KEYS}
  File "/usr/local/lib/python3.8/dist-packages/gribscan/gribscan.py", line 232, in <dictcomp>
    global_attrs = {k: m[k] for k in cfgrib.dataset.GLOBAL_ATTRIBUTES_KEYS}
  File "/usr/local/lib/python3.8/dist-packages/cfgrib/messages.py", line 209, in __getitem__
    return super(ComputedKeysMessage, self).__getitem__(item)
  File "/usr/local/lib/python3.8/dist-packages/cfgrib/messages.py", line 161, in __getitem__
    return self.message_get(item)
  File "/usr/local/lib/python3.8/dist-packages/cfgrib/messages.py", line 123, in message_get
    values = eccodes.codes_get_array(self.codes_id, item, key_type)
  File "/usr/local/lib/python3.8/dist-packages/cfgrib/bindings.py", line 366, in codes_get_array
    key_type = codes_get_native_type(handle, key)
  File "/usr/local/lib/python3.8/dist-packages/cfgrib/bindings.py", line 359, in codes_get_native_type
    _codes_get_native_type(handle, key.encode(ENC), grib_type)
  File "/usr/local/lib/python3.8/dist-packages/cfgrib/bindings.py", line 163, in wrapper
    code = func(*args)
TypeError: initializer for ctype 'grib_handle *' must be a cdata pointer, not int

Inputs

GRIB1 files from ERA5 Land (temp and precipitation) and GRIB2 files from RTMA (https://www.nco.ncep.noaa.gov/pmb/products/rtma/)

These are saved to here for reference

Setup

Ubuntu 20.04.3 LTS
Python 3.8.3
Eccodes 2.26
cfgrib '0.9.8.5'
Latest released version of gribscan (installed via python -m pip install gribscan)

multiple `stepRange` lead to wrong decoding time

Hi there,

analyzing the most recent data from IFS/FESOM we stepped in one variable which is not decoded as expected - or at least as I expect - by gribscan.

We have an experiment with 6-hour output from IFS, currently available on Levante (I can provide the path), for which we found that one variable, namely litota1 (Averaged total lightning flash density in the last hour) is incorrectly decoded by gribscan and it is stored in the json (but I imagine also in the original index file) as having the time at 5-11-17-23 hour instead of 6-12-18-24 as all the other variables.

This does NOT happen if we open the GRIB file directly using cfgrib. Further digging into the file show that this inconsistency is due by the fact that the original grib code has a stepRangewhich is "multiple", i.e. it covers two time steps (no clue why this is happening for this specific variabile) instead of one. For some reason, gribscan took the first one, while the real time is of course the second). See the attached screenshot, litota1 is the last one.

Do you think this is an expected behaviour of gribscan or something that should be fixed? This leads to the issue of having an irregular time axis in the final xarray, having time 0-5-6-11-12-17-18-23 etc which are NaN every other time. We can certainly filter this a posteriori, but perhaps it can be tackled at the source.

Many thanks, please let me know if you need further details

Clean-up registration of GRIB coded

transferred issue from gitlab

Currently the RawGribCodec is registered through both numcodes.register_codec() and through the recently implemented entry points.

After the next numcodecs release we remove the functional approach and only use the entrypoints.

Failing on ECMWF ensemble

I was just giving this a try.

I have 3 files coming from the ECMWF ensemble with all the members inside.

I just tried to

gribscan-index *.grib2 -n 8
gribscan-build *.index -o dataset.json --prefix /home/ekman/ssd/guido/ecmwf-ens/

But received

Traceback (most recent call last):
  File "/home/ekman/miniconda3/envs/models/bin/gribscan-build", line 8, in <module>
    sys.exit(build_dataset())
  File "/home/ekman/miniconda3/envs/models/lib/python3.10/site-packages/gribscan/tools.py", line 29, in build_dataset
    refs = gribscan.grib_magic(args.indices, global_prefix=args.prefix)
  File "/home/ekman/miniconda3/envs/models/lib/python3.10/site-packages/gribscan/gribscan.py", line 424, in grib_magic
    global_attrs, coords, varinfo = inspect_grib_indices(messages, magician)
  File "/home/ekman/miniconda3/envs/models/lib/python3.10/site-packages/gribscan/gribscan.py", line 323, in inspect_grib_indices
    dims, dim_id, shape = map(tuple, zip(*((dim, i, len(coords))
ValueError: not enough values to unpack (expected 3, got 0)

I don't know whether it's because of the files dimensions or whether I'm doing something wrong.

Automate PyPI releases

transferred issue from gitlab

This is a proposal to add a Gitlab CI that automates PyPI releases. Of course, the CI could/should also cover basic installation (and eventually tests) for normal commits.

Use `.peek` instead of `.read` and `.seek`

Currently gribscan uses .read in combination with .seek to read some header bytes without advancing the data stream (e.g. in _split_file).

It might be better to use .peek instead (hoping this will be nicer on the underlying filesystem).

Enable Github actions for CI/CD

We should enable some sort of testing

Enable CI/CD to check if gribscan builds at all
Run actual tests on artificial GRIB data

is a Magician essentially the same as MultiZarrToZarr ?

As pointed out by @martindurant at this comment, (some of) the functionality of the Magician workflow might be possible to implement using MultiZarrToZarr.

We might want to investigate this and try to simplify code if possible.

archive on Zenodo?

Do we want to archive gribscan on Zenodo? - We'd be able to get a DOI for this project then, but do we want this?

If we do, we could do this similar to eurec4a/eurec4a-intake#147, where the details for filling out the required forms are stored in .zenodo.json.

Currently, the contributors are @lkluft, @trackow and @d70-t. In case we do want to publish this on Zenodo, everyone would have to fill out their information in the creators object. See docs here.

gribscan / gribscan Goto Github PK

gribscan's Issues

Test gribscan on latest FDB/eccodes grib output (including Healpix)

new version in kerchunk

Merge fixes for IFS output

Add contribution guideline

Write_index fails b/c of `grib_handle` issue if cfgrib version below `cfgrib-0.9.10.1`

multiple `stepRange` lead to wrong decoding time

Clean-up registration of GRIB coded

Failing on ECMWF ensemble

Automate PyPI releases

Use `.peek` instead of `.read` and `.seek`

Enable Github actions for CI/CD

is a Magician essentially the same as MultiZarrToZarr ?

archive on Zenodo?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent