gribscan / gribscan Goto Github PK
View Code? Open in Web Editor NEWCreate indices for GRIB files and provide an xarray interface.
Home Page: https://gribscan.readthedocs.io/
License: MIT License
Create indices for GRIB files and provide an xarray interface.
Home Page: https://gribscan.readthedocs.io/
License: MIT License
I leave this as an issue here since I could not find time yet to test this extensively. Wondering whether #26 also just works with our final IFS-FESOM output for the March hackathon. This includes Healpix for IFS and also for FESOM (new o2d and o3d .data files).
Example data for 30 November 2026. For several reasons this had to use ecCodes 2.32.5, by the way, and I am wondering whether this will lead to issues when trying to load in python with older versions.
Healpix grid:
/work/bm1235/b382776/cycle4/fdb/healpix/root/d1:climate-dt:ScenarioMIP:SSP3-7.0:1:IFS-FESOM:1:hz9o:clte:20261130/
Regular 0.25 grid:
/work/bm1235/b382776/cycle4/fdb/latlon/root/d1:climate-dt:ScenarioMIP:SSP3-7.0:1:IFS-FESOM:1:hz9o:clte:20261130/
Native resolution grid:
/work/bm1235/b382776/cycle4/fdb/native/root/d1:climate-dt:ScenarioMIP:SSP3-7.0:1:IFS-FESOM:1:hz9o:clte:20261130/
I created fsspec/kerchunk#198 , with some of the simpler ideas from here. I am not suggesting that my version should be a replacement for this repo, but it's good, I think, to have something simple in the main kerchunk repo too. Specifically, it creates one output reference set per grib message and does not do any combine - so no magician required. The user can pass such a list or set of lists from multiple files to MultiZarrToZarr and do a combine like that. The example shows this idea in action across several HRRR forecast files. Note that the coordinates are not "simple" but curved, so we don't attempt to generate them (which is a shame!), but rely on eccodes to fetch them from the first valid message.
During the preparation of the NextGEMS hackathon several fixes for IFS output have been tested that didn't find there way back to gribscan, yet. Some of these fixes seem to be useful for other use cases as well.
regular_ll
(implement in a consistent way as #22)Add a contribution guideline that among other things should cover:
CITATION.cff
Issue
Any attempt to read a GRIB1 or GRIB2 file with write_index
returns the following error:
gs.write_index("rtma_2011022404.grib2")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.8/dist-packages/gribscan/gribscan.py", line 275, in write_index
for record in gen:
File "/usr/local/lib/python3.8/dist-packages/gribscan/gribscan.py", line 232, in scan_gribfile
global_attrs = {k: m[k] for k in cfgrib.dataset.GLOBAL_ATTRIBUTES_KEYS}
File "/usr/local/lib/python3.8/dist-packages/gribscan/gribscan.py", line 232, in <dictcomp>
global_attrs = {k: m[k] for k in cfgrib.dataset.GLOBAL_ATTRIBUTES_KEYS}
File "/usr/local/lib/python3.8/dist-packages/cfgrib/messages.py", line 209, in __getitem__
return super(ComputedKeysMessage, self).__getitem__(item)
File "/usr/local/lib/python3.8/dist-packages/cfgrib/messages.py", line 161, in __getitem__
return self.message_get(item)
File "/usr/local/lib/python3.8/dist-packages/cfgrib/messages.py", line 123, in message_get
values = eccodes.codes_get_array(self.codes_id, item, key_type)
File "/usr/local/lib/python3.8/dist-packages/cfgrib/bindings.py", line 366, in codes_get_array
key_type = codes_get_native_type(handle, key)
File "/usr/local/lib/python3.8/dist-packages/cfgrib/bindings.py", line 359, in codes_get_native_type
_codes_get_native_type(handle, key.encode(ENC), grib_type)
File "/usr/local/lib/python3.8/dist-packages/cfgrib/bindings.py", line 163, in wrapper
code = func(*args)
TypeError: initializer for ctype 'grib_handle *' must be a cdata pointer, not int
Inputs
GRIB1 files from ERA5 Land (temp and precipitation) and GRIB2 files from RTMA (https://www.nco.ncep.noaa.gov/pmb/products/rtma/)
These are saved to here for reference
Setup
Ubuntu 20.04.3 LTS
Python 3.8.3
Eccodes 2.26
cfgrib '0.9.8.5'
Latest released version of gribscan (installed via python -m pip install gribscan)
Hi there,
analyzing the most recent data from IFS/FESOM we stepped in one variable which is not decoded as expected - or at least as I expect - by gribscan.
We have an experiment with 6-hour output from IFS, currently available on Levante (I can provide the path), for which we found that one variable, namely litota1
(Averaged total lightning flash density in the last hour) is incorrectly decoded by gribscan and it is stored in the json (but I imagine also in the original index file) as having the time at 5-11-17-23 hour instead of 6-12-18-24 as all the other variables.
This does NOT happen if we open the GRIB file directly using cfgrib
. Further digging into the file show that this inconsistency is due by the fact that the original grib code has a stepRange
which is "multiple", i.e. it covers two time steps (no clue why this is happening for this specific variabile) instead of one. For some reason, gribscan took the first one, while the real time is of course the second). See the attached screenshot, litota1
is the last one.
Do you think this is an expected behaviour of gribscan or something that should be fixed? This leads to the issue of having an irregular time axis in the final xarray, having time 0-5-6-11-12-17-18-23 etc which are NaN every other time. We can certainly filter this a posteriori, but perhaps it can be tackled at the source.
Many thanks, please let me know if you need further details
transferred issue from gitlab
Currently the RawGribCodec
is registered through both numcodes.register_codec()
and through the recently implemented entry points.
After the next numcodecs
release we remove the functional approach and only use the entrypoints.
I was just giving this a try.
I have 3 files coming from the ECMWF ensemble with all the members inside.
I just tried to
gribscan-index *.grib2 -n 8
gribscan-build *.index -o dataset.json --prefix /home/ekman/ssd/guido/ecmwf-ens/
But received
Traceback (most recent call last):
File "/home/ekman/miniconda3/envs/models/bin/gribscan-build", line 8, in <module>
sys.exit(build_dataset())
File "/home/ekman/miniconda3/envs/models/lib/python3.10/site-packages/gribscan/tools.py", line 29, in build_dataset
refs = gribscan.grib_magic(args.indices, global_prefix=args.prefix)
File "/home/ekman/miniconda3/envs/models/lib/python3.10/site-packages/gribscan/gribscan.py", line 424, in grib_magic
global_attrs, coords, varinfo = inspect_grib_indices(messages, magician)
File "/home/ekman/miniconda3/envs/models/lib/python3.10/site-packages/gribscan/gribscan.py", line 323, in inspect_grib_indices
dims, dim_id, shape = map(tuple, zip(*((dim, i, len(coords))
ValueError: not enough values to unpack (expected 3, got 0)
I don't know whether it's because of the files dimensions or whether I'm doing something wrong.
transferred issue from gitlab
This is a proposal to add a Gitlab CI that automates PyPI releases. Of course, the CI could/should also cover basic installation (and eventually tests) for normal commits.
Currently gribscan uses .read
in combination with .seek
to read some header bytes without advancing the data stream (e.g. in _split_file
).
It might be better to use .peek
instead (hoping this will be nicer on the underlying filesystem).
We should enable some sort of testing
gribscan
builds at allAs pointed out by @martindurant at this comment, (some of) the functionality of the Magician workflow might be possible to implement using MultiZarrToZarr
.
We might want to investigate this and try to simplify code if possible.
Do we want to archive gribscan
on Zenodo? - We'd be able to get a DOI for this project then, but do we want this?
If we do, we could do this similar to eurec4a/eurec4a-intake#147, where the details for filling out the required forms are stored in .zenodo.json
.
Currently, the contributors are @lkluft, @trackow and @d70-t. In case we do want to publish this on Zenodo, everyone would have to fill out their information in the creators
object. See docs here.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.