try2code / cdo-bindings Goto Github PK
View Code? Open in Web Editor NEWRuby/Python bindings for CDO
Ruby/Python bindings for CDO
Moin Brian @Try2Code,
it seems like I have no option to access the stdout of operators which makes it hard to use verifygrid
or cmor
.
Could we implement sth to enable that? Maybe a logfile or sth?
Best,
Fabi
It seems an error (FileNotFoundError: [WinError 2]) when input:
''from cdo import *
cdo=Cdo()''
in python3.8~3.9 or 2.7 in windows10, I dont know how to work it out, and I dont understand what is CDO binary where can I get it? Save my ass please!
I am working with the python CDO binding in a docker container which runs a debian bullseye distribution (python:3.9.15-slim-bullseye
). I installed the cdo biniary using apt-get install cdo
which installs CDO version 1.9.10 (https://packages.debian.org/bullseye/cdo).
When I call Cdo()
I get an IndexError
with a Traceback that ends as follows:
File "/usr/local/lib/python3.9/site-packages/cdo.py", line 190, in __init__
self.operators = self.__getOperators()
File "/usr/local/lib/python3.9/site-packages/cdo.py", line 341, in __getOperators
operators[op] = int(ios[i][1:len(ios[i]) - 1].split('|')[1])
IndexError: list index out of range
This is caused by an unexpected behaviour of cdo --operators
. When running cdo --operators
in the command line, I get an unexpected output in the first line of the output:
PRE-MAIN-DEBUG Registering library [eckit] with address [0x7fa77f2ad9a0]
abs Absolute value (1|1)
acos Arc cosine (1|1)
add Add two fields (2|1)
addc Add a constant (1|1)
addtrend Add trend (3|1)
...
So the first line of the output is actually not an operator. I can monkey patch this error with the following code which basically jumps the first line of the output of cdo --operators
:
import os
import subprocess
from cdo import Cdo
class CdoMonkeyPatch(Cdo):
def __getOperators(self): # {{{
operators = {}
proc = subprocess.Popen([self.CDO, '--operators'],
stderr=subprocess.PIPE, stdout=subprocess.PIPE)
ret = proc.communicate()
ops = list(map(lambda x: x.split(' ')[0], ret[0].decode(
"utf-8")[0:-1].split(os.linesep)))
ios = list(map(lambda x: x.split(' ')
[-1], ret[0].decode("utf-8")[0:-1].split(os.linesep)))
for i, op in enumerate(ops):
if i != 0:
operators[op] = int(ios[i][1:len(ios[i]) - 1].split('|')[1])
return operators # }}}
Is there something wrong with my CDO installation?
BTW: This error did not appear when I was using the python:3.8.13-slim-buster
image for which CDO version 1.9.6 is installed by apt-get install cdo
(https://packages.debian.org/buster/cdo).
In conda version, muldpm
multiplies days of year. The correct one should be days of month.
Any idea how to fix this error?
cdo -f nc4 -z zip_1 -muldpm infile outfile
It's convenient to convert dat file to nc file in command line:
cdo -f nc import_binary ${ctl_file} ${ncname}
How to deal with it in python?
When cdo is not on the users path or when the cdo to be used is different, the version 1.6.0 will not work. This was not a problem with v1.5.x. To reproduce the error, look at the example failure section. The fix is surprisingly simple and is described in the resolution section.
Should I fork and make a PR?
I think I have traced it down to differences between the Operator
class.[1,2] In the new code, the init explicitly uses *args and **kwds, which are not provided. As a result, the cdo
argument is lost to the child operators. This can be resolved by explicitly passing the CDO proprity as the argument to Operator.
580c580
< setattr(self.__class__, method_name, Operator())
---
> setattr(self.__class__, method_name, Operator(self.CDO))
[1] https://github.com/Try2Code/cdo-bindings/blob/master/python/cdo/cdo.py#L570-L581
[2] https://github.com/Try2Code/cdo-bindings/blob/maintenance-1.5.x/python/cdo.py#L608-L614
For example, I have a cdo binary at /usr/local/apps/cdo-2.2.1/intel-21.4/bin/cdo, which is not in my path.
which cdo
/usr/bin/which: no cdo in ...
But it is fully functional
/usr/local/apps/cdo-2.2.1/intel-21.4/bin/cdo sinfo -stdatm,0
cdo(1) stdatm: Process started
File format : GRIB
-1 : Institut Source T Steptype Levels Num Points Num Dtype : Parameter ID
1 : unknown unknown c instant 1 1 1 1 : 1
2 : unknown unknown c instant 1 1 1 1 : 130.128
Grid coordinates :
1 : lonlat : points=1 (1x1)
lon : 0 degrees_east
lat : 0 degrees_north
Vertical coordinates :
1 : height : levels=1
level : 0 m
Time coordinate :
time : 1 step
RefTime = 0000-00-00 00:00:00 Units = hours Calendar = proleptic_gregorian
YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss YYYY-MM-DD hh:mm:ss
0001-01-01 00:00:00
cdo(1) stdatm:
cdo sinfo: Processed 2 variables over 1 timestep [0.00s 18MB]
If I use the python bindings with a specified, I get an error FileNotFoundError: [Errno 2] No such file or directory: 'cdo'
import cdo
cpath = '/usr/local/apps/cdo-2.2.1/intel-21.4/bin/cdo'
cdoo = cdo.Cdo(cpath)
f = cdoo.stdatm(0, returnCdf=True)
Error:
Traceback (most recent call last):
File "/home/bhenders/temp/test_cdo.py", line 5, in <module>
f = cdoo.stdatm(0, returnCdf=True)
File "/home/bhenders/temp/cdopy/lib/python3.9/site-packages/cdo/cdo.py", line 580, in __getattr__
setattr(self.__class__, method_name, Operator())
File "/home/bhenders/temp/cdopy/lib/python3.9/site-packages/cdo/cdo.py", line 577, in __init__
super().__init__(*args, **kwargs)
File "/home/bhenders/temp/cdopy/lib/python3.9/site-packages/cdo/cdo.py", line 173, in __init__
self.operators = self.__getOperators()
File "/home/bhenders/temp/cdopy/lib/python3.9/site-packages/cdo/cdo.py", line 246, in __getOperators
version = parse_version(getCdoVersion(self.CDO))
File "/home/bhenders/temp/cdopy/lib/python3.9/site-packages/cdo/cdo.py", line 70, in getCdoVersion
proc = subprocess.Popen(
File "/usr/local/apps/oneapi/intelpython/python3.9/lib/python3.9/subprocess.py", line 951, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/local/apps/oneapi/intelpython/python3.9/lib/python3.9/subprocess.py", line 1821, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'cdo'
Using pdb, I can go up to 580 line where self.CDO is as expected and calling Operator()
raises the error.
> /home/bhenders/temp/cdopy/lib/python3.9/site-packages/cdo/cdo.py(580)__getattr__()
-> setattr(self.__class__, method_name, Operator())
(Pdb) self.CDO
'/usr/local/apps/cdo-2.2.1/intel-21.4/bin/cdo'
(Pdb) Operator()
*** FileNotFoundError: [Errno 2] No such file or directory: 'cdo'
For Windows there are Cygwin64 CDO binaries available. When trying to use these I have the problem that the getOperators()
function doesn't work properly. The culprit seems to be that the output of cdo --operators
is being split at os.linesep
(see, e.g., https://github.com/Try2Code/cdo-bindings/blob/master/python/cdo.py#L305). However, os.linesep
on Windows is \r\n
, while the output of the cdo --operators
call is only separated by \n
.
I guess it should be possible to use something like (pseudo-code)
if on_cygwin:
cdosep = '\n'
else:
cdosep = os.linesep
and then use cdosep
instead of os.linesep
.
Would you be open to a PR implementing this?
It could be interesting if commands like showname, showtimestamp returne a list of results no a list of a single string.
When I run
In[3]: cdo.showdate(input='/some/file/')
# return like this
Out[3]: ['2017-01-19 2017-01-20 2017-01-21 2017-01-22 2017-01-23 2017-01-24 2017-01-25 2017-01-26']
Then I run
In[4]: cdo.showdate(input='/some/file/')[0].split(' ')
Out[4]: ['2017-01-19',
'2017-01-20',
'2017-01-21',
'2017-01-22',
'2017-01-23',
'2017-01-24',
'2017-01-25',
'2017-01-26']
I believe it is better if this is a built-in function. Is this could be improved?
When using the force=False option to make use of the cache instead of recalculating it seems that it is no longer possible to return the result as an xarray Dataset. I would expect that the dataset is still returned based on the cached netcdf.
This results in ds_regridded_cdo_python being an empty list, while in the /tmp/ directory the netcdf can be found.
import cdo
gridfile = "r360x180"
cdo = cdo.Cdo()
ds_regridded_cdo_python = cdo.remapbil(gridfile, input=ds, returnXDataset=True, force=False)
Without the force=False
(default is True) an xr.Dataset is returned. Ideally whether or not the cache is used or the calculation is repeated I would expect to get an xr.Dataset. The cached netcdf could maybe be reloaded into an xr.Dataset if it exists and the force=False
option is active?
Would you be interested in including support for python 3? Having looked at the code, it doesn't seem like it would be much work?
Currently the tempfile generation cannot work for more than 1 output files.
Upto cdo-1.9.1, the cdo binary does not tell, how many streams each operators needs. that will be available from 1.9.2 onwards
Running a process with a Cdo()
interface instantiated in a python module does not allow for a ctrl-C
exit. This is likely a problem with the signal handling. It looks like the issue may be here, where some cleanup occurs but the signal is ignored (using signal.raise
may solve the issue).
In the current release 1.3.0 the ruby version does not support the assignment of environment variables per call, only as a object setup, i.e.
cdo.env = {"PLANET_RADIUS" => "6379000"}
works, but
cdo.fldmean(input: ..., env: {"PLANET_RADIUS" => "6379000"}
doesn't
the test (test_env) does not cover this feature, because it is testing for the default parameter values
Dear CDO-Python developers,
As of CDO version 1.7.2, the parsing of the version number no longer works. If I understand the code correctly, the string "Climate Data Operators" is searched for:
match = re.search("Climate Data Operators version (\d.*) .*",cdo_help)
However, the help string now only says CDO
, not Climate Data Operators
, and the parsing thereby fails.
The fix should be trivial...
I have successfully installed CDO and have confirmed that it is working from the command line. But I'm having trouble getting the python bindings to work. The problem seems to be with the netCDF4 module, which is installed in the same directory as cdo.
Here is the error I get when I try to import the netCDF4 module in python:
Traceback (most recent call last): File "<stdin>", line 1, in <module> File "netCDF4/__init__.py", line 3, in <module> from ._netCDF4 import * ImportError: dlopen(netCDF4/_netCDF4.so, 2): Symbol not found: _clock_gettime Referenced from: netCDF4/.dylibs/libcurl.4.dylib (which was built for Mac OS X 10.13) Expected in: /usr/lib/libSystem.B.dylib in netCDF4/.dylibs/libcurl.4.dylib
And when I try to initialize cdo:
>>> cdo = Cdo() Could not load netCDF4 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "cdo.py", line 135, in __init__ self.loadCdf() # load netcdf lib if possible and set self.cdf }}} File "cdo.py", line 431, in loadCdf from netCDF4 import Dataset as cdf File "netCDF4/__init__.py", line 3, in <module> from ._netCDF4 import * ImportError: dlopen(netCDF4/_netCDF4.so, 2): Symbol not found: _clock_gettime Referenced from: netCDF4/.dylibs/libcurl.4.dylib (which was built for Mac OS X 10.13) Expected in: /usr/lib/libSystem.B.dylib in netCDF4/.dylibs/libcurl.4.dylib
Any suggestions for how to fix?
I had to search the cdo web site and forum to understand that the python bindings can now be installed with conda install -c conda-forge python-cdo
, and that the conda-forge cdo
package is only for providing the executable
Could this very useful information be added to the cdo-bindings and the Using CDO from python or ruby pages?
Thanks!
Attempting to interrupt a process with Ctrl-C after initializing the python binding with cdo = Cdo()
often fails and prints weird messages, even when interrupting python processes unrelated to the cdo commands. The nco python bindings (originally forked from this repo) also have the same issue.
Here's an example before initializing the binding:
In [1]: import time
...: while True:
...: print('hi')
...: time.sleep(1)
hi
hi
^[^C---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
<ipython-input-2-0bf96da52d4a> in <module>
2 while True:
3 print('hi')
----> 4 time.sleep(1)
KeyboardInterrupt:
And here's an example after initializing the binding and holding down Ctrl+C:
In [3]: import time
...: from cdo import Cdo
...: cdo = Cdo()
...: while True:
...: print('hi')
...: time.sleep(1)
hi
hi
^Ccaught signal <cdo.Cdo object at 0x7f596c07dee0> 2 <frame at 0x7f5932049d40, file '<ipython-input-3-742fa89f637e>', line 6, code <module>>
hi
^Ccaught signal <cdo.Cdo object at 0x7f596c07dee0> 2 <frame at 0x7f5932049d40, file '<ipython-input-3-742fa89f637e>', line 6, code <module>>
hi
^Ccaught signal <cdo.Cdo object at 0x7f596c07dee0> 2 <frame at 0x7f5932049d40, file '<ipython-input-3-742fa89f637e>', line 6, code <module>>
hi
^Ccaught signal <cdo.Cdo object at 0x7f596c07dee0> 2 <frame at 0x7f5932049d40, file '<ipython-input-3-742fa89f637e>', line 6, code <module>>
hi
It's impossible to kill the while-loop without killing the parent shell -- even sending the ipython session to the background with Ctrl+Z and trying to kill the session with kill %1
fails because cdo seems to capture that signal as well:
$ kill %1
caught signal <cdo.Cdo object at 0x7f596c07dee0> 15 <frame at 0x7f5932049d40, file '<ipython-input-3-742fa89f637e>', line 6, code <module>>
The only way to stop the while loop is to kill the parent shell running the python process. Not sure how the binding is implemented but it seems to be changing something persistent about the python state.
Hello,
I have a bit of a "bizarre" feature request. It is sometimes nice to use pycdo on your own to build nice command chains. However, I often have colleagues who are not so good at python. It might therefore be nice to have an option to spit out whatever the current command chain would look like to a shell script. Sine pycdo calls subprocesses anyway in the background, it should be something as simple as grabbing whatever you are about to send to the shell, and printing it to a file instead.
Might also be nice for debugging.
On the other side (yet this is much more tricky from a gut feeling) it might be nice to sometimes be able to take some sort of shell script, read it in, and generate an equivalent CDO 'object' to keep using afterward. Tricks here might involve recognizing multiple cdo commands, since given any user input, there will likely be more than just one command in there.
What do you think?
Cheers
Paul
With the new syntax cdo.op1().op2().op3().run()
the question comes up: How to handle operators with multiple inputs.
Here is one idea I'd like to discuss @Chilipp
cdo -add -fldmean -topo,global_0.2 -fldmax -topo,'global_1' /tmp/CdoTemp234.grb
generated from
cdo = Cdo()
cdo.add(input=[cdo.fldmean.topo('global_0.2'),cdo.fldmax.topo('global_1')]).run()
Any other idea how to write this down?
When using the python bindings for cdo I get a segmentation fault that I don't get when running the same command on the command line.
In python I have the following chained command:
cdo.setreftime("1970-01-01,00:00:00", input="-sellonlatbox,{lon0},{lon1},-90,90 -settunits,hours -setcalendar,standard -masklonlatbox,0,360,-90,90 -remapnn,custom_grid.txt {input}".format(lon0=center-180, lon1=center+180.1, input=path_to_input), output=path_to_output, options = "-P 8 -f nc4 -z zip")
When running this script I get the following output:
>>> cdo -O -P 8 -f nc4 -z zip -setreftime,1970-01-01,00:00:00 -sellonlatbox,-180.0,180.1,-90,90 -settunits,hours -setcalendar,standard -masklonlatbox,0,360,-90,90 -remapnn,custom_grid.txt data/gfas_frpfire_Aug2019.nc data/gfas_frpfire_Aug2019_processed.nc<<<
[...]
Segmentation fault (core dumped)
When I now run the command that the python bindings probably executed (cdo -O ...
) on the command line (without python) I do not get a Segmentation fault. The python program does nothing else, so the error really comes from running the command above.
Any idea why this is happening? I am not sure if I built all libraries (hdf, netcdf4) threadsafe but this should not be the problem as there is no SegFault when running the command on the command line.
Collecting package metadata (current_repodata.json): done
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed
PackagesNotFoundError: The following packages are not available from current channels:
Current channels:
To search for alternate channels that may provide the conda package you're
looking for, navigate to
https://anaconda.org
and use the search bar at the top of the page.
Note: you may need to restart the kernel to use updated packages.
It's not clear to me which license is invoked with cdo-bindings
. The conda feedstock says:
about:
home: https://code.zmaw.de/projects/cdo/wiki/Cdo%7Brbpy%7D
license: GPL-2.0-or-later
license_file: COPYING
summary: Use CDO in the context of Python as if it would be a native library
while in the README:
https://github.com/Try2Code/cdo-bindings/blob/master/README.md#license
Cdo.{rb,py} makes use of the BSD-3-clause license
GPL is copyleft while BSD not, so that makes quite a difference.
Hello,
I noticed that the version on PyPI is not up to date with the GitHub version. Can we update this please?
Thanks!
Paul
Just realized yesterday when my ci pipeline failed. The lastet cdo version on conda-forge is v2.0.0
since yesterday. I had to state explicitly cdo==1.9.9
to have my ci pipeline run. There is obviously a problem with parsing the cdo version in the bindings.
conda create -n cdo_test python=3 cdo python-cdo
from cdo import Cdo
cdo = Cdo()
fails with
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/anaconda3/envs/cdo_test/lib/python3.10/site-packages/cdo.py", line 156, in __init__
self.operators = self.__getOperators()
File "/opt/anaconda3/envs/cdo_test/lib/python3.10/site-packages/cdo.py", line 236, in __getOperators
version = parse_version(getCdoVersion(self.CDO))
File "/opt/anaconda3/envs/cdo_test/lib/python3.10/site-packages/cdo.py", line 78, in getCdoVersion
return match.group(1)
AttributeError: 'NoneType' object has no attribute 'group'
works with
conda create -n cdo_test python=3 cdo==1.9.9 python-cdo
Since version 1.3.4 it looks like the initialization of the cdo wrapper has changed. I'm installing both cdo and python-cdo via conda. But the cdo wrapper can not find the cdo tool in the PATH
. I need to set the PATH
when I initialize the cdo wrapper:
import os
from cdo import Cdo
cdo = Cdo(env=os.environ)
Running the test suite aborts with a segmentation fault in test_returnArray
on line 252.
Also doing something like
temperature = cdo.stdatm(0,options = '-f nc', returnCdf = True).variables['T'][:]
print temperature
will end in an segmentation fault, when temperature is accessed by print. Everything works as expected, when cdfMod
is"netcdf4". Maybe the netcdf4 should be used as default.
$ cdo -V
Climate Data Operators version 1.6.3 (http://code.zmaw.de/projects/cdo)
Compiled: by mclaus on ares.geomar.de (x86_64-unknown-linux-gnu) Feb 25 2014 10:44:28
Compiler: gcc -std=gnu99 -g -O2 -pthread
version: gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
Features: PTHREADS NC4 OPeNDAP Z JASPER UDUNITS2 PROJ.4 XML2 MAGICS CURL
Libraries: proj/4.8 xml2/2.7.8 curl/7.35.0(h7.32.0)
Filetypes: srv ext ieg grb grb2 nc nc2 nc4 nc4c
CDI library version : 1.6.3 of Feb 25 2014 10:44:20
CGRIBEX library version : 1.6.3 of Jan 8 2014 19:55:18
GRIB_API library version : 1.11.0
netCDF library version : 4.3.0 of Sep 20 2013 12:21:08 $
HDF5 library version : 1.8.10
SERVICE library version : 1.3.1 of Feb 25 2014 10:44:06
EXTRA library version : 1.3.1 of Feb 25 2014 10:44:01
IEG library version : 1.3.1 of Feb 25 2014 10:44:04
FILE library version : 1.8.2 of Feb 25 2014 10:44:01
python -c "import scipy; print scipy.__version__"
0.14.0
python -c "import netCDF4; print netCDF4.__version__"
1.0.8
just realized this yesterday. might be that the new interface, i.e. cdo. is callable and returns another Cdo object interferes with the definition of help()
found in cdo.py 1.5.4
hi @Try2Code ,
thanks again for this, i like to use cdo through these bindings. i have an issue that just recently came up when i try to use dask.distributed
to use the rather heavy sp2gpl
operator on ERA5 data. I like to delay the execution and run then on several timesteps in parallel. i'll try to make a reproducible case here, e.g.,
setup dask client
from dask.distributed import Client, progress
client = Client()
client
test dataset
import xarray as xr
ds = xr.tutorial.open_dataset("air_temperature")
ds.sel(time='2013-01-01')
some operator that could be embarassingly parallel:
from cdo import Cdo
def test_op(input):
return Cdo().setgridtype('curvilinear', options='-f nc4', input=input)
lazy evaluation with dask: write out each timestep into a single file and run the operator on that.
def apply(ds, use_dask=False):
if use_dask:
from dask import delayed
else:
def delayed(x):
return x
output = []
for ti in ds.time:
# expand_dims so that cdo's to_netcdf call keeps time coordinate in nc file.
ti_res = delayed(test_op)(ds.sel(time = ti).expand_dims(dim='time'))
output.append(ti_res)
return output
Now apply our operator onto the dataset (just the first 10 timesteps for testing...)
out = apply(ds.isel(time=range(0,10)), use_dask=True)
out
This all works fine, however, when i actually trigger the computation, i get this signal error:
import dask
%time out_ = dask.compute(out)
The problem seems to be with signal handling in threads. I could actually avoid the error if the signal handling only runs in the main thread, e.g., something like this in cdo.py
if threading.current_thread() is threading.main_thread():
signal.signal(signal.SIGINT, self.__catch__)
signal.signal(signal.SIGTERM, self.__catch__)
signal.signal(signal.SIGSEGV, self.__catch__)
signal.siginterrupt(signal.SIGINT, False)
signal.siginterrupt(signal.SIGTERM, False)
signal.siginterrupt(signal.SIGSEGV, False)
I runs fine then and actually scales quite nicely, but i am not sure about correctly handling signals in threads. my solution might be naive. It would really be nice to be able to delay cdo computations with dask to integrate them with other workflows... hope, i could make myself clear :) thanks!!!
It appears as if the __del__
method of MyTempFile
is never called, leading to the /tmp
folder on my system to be filled. A reboot would solve this problem, but how can this be done on computers that are rarely rebooted (e.g. servers)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.