euroargodev / argopy Goto Github PK

View Code? Open in Web Editor NEW

176.0 13.0 38.0 1.01 GB

A python library for Argo data beginners and experts

Home Page: https://argopy.readthedocs.io

License: European Union Public License 1.2

Python 68.81% Shell 1.02% Dockerfile 0.01% nesC 0.08% HTML 28.16% CSS 1.93%

argo python argo-floats oceanography argo-data

argopy's People

Contributors

Stargazers

Watchers

argopy's Issues

Filename length in cache folders

when querying a certain amount of cycle, and store the result in a cache file :
argo_loader.profile(6902746, np.arange(12,45)).to_xarray()

the file created in tmp/ is too long (on my linux at least) creating a Errno13.

Here are some filename length limits in popular file systems:

BTRFS   255 bytes
exFAT   255 UTF-16 characters
ext2    255 bytes
ext3    255 bytes
ext3cow 255 bytes
ext4    255 bytes
FAT32   8.3 (255 UCS-2 code units with VFAT LFNs)
NTFS    255 characters
XFS     255 bytes

Binder link on README should point to the latest released version, not the master branch

I'm writing this while waiting for the Binder to load. :)

Binder images take a while to build—minutes, maybe 10 minutes for me right now—but previously built images are saved for future users. If the Binder link points to the master branch, then it would require a new image for every push to master, and the probability that a user (like me) is first can be rather high. However, if the link points to the latest stable release, or even just any stable release, then they don't have to be built as often. It's even possible to trigger the Binder build as part of a deployment, like this:

https://github.com/scikit-hep/uproot/blob/master/binder/trigger_binder.sh

which would ensure that no users are first.

I don't know of a good way to keep the link pointing at the latest tag, though. But any release that's good enough to run the notebook will do.

How to get support?

One of the criteria for the JOSS review (openjournals/joss-reviews#2425) reads as follows:

Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

You've covered 1) and 2) in contributing.rst, but people might not realise that the gitter logo in your README file is the place to go for support. It might worth explicitly mentioning in the documentation at the Read the Docs site that people can go to https://gitter.im/Argo-floats/argopy for support?

Coordinates attributes disappear when using argo.point2profile

To reproduce the error:

import xarray as xr
import numpy as np

import argopy
from argopy import DataFetcher as ArgoDataFetcher
argo_loader = ArgoDataFetcher()

# get data
ds = argo_loader.region([-6, 35, 30, 46, 0, 1000, '2012', '2013']).to_xarray()

dsp = ds.argo.point2profile()

then compare ds and dsp attributes of coordinate longitude:

>> ds['LONGITUDE'].attrs
{'_CoordinateAxisType': 'Lon',
 'actual_range': array([-5.921, 34.991]),
 'axis': 'X',
 'colorBarMaximum': 180.0,
 'colorBarMinimum': -180.0,
 'ioos_category': 'Location',
 'long_name': 'Longitude of the station, best estimate',
 'standard_name': 'longitude',
 'units': 'degrees_east',
 'valid_max': 180.0,
 'valid_min': -180.0}

>> dsp['LONGITUDE'].attrs
{}

Expected Output

We should have similar DataArrays attributes in dsp and ds, but we don't

Problem Description

Coordinates attributes must be retained after using the argo xarray accessor

Note that this has been reported also on the pyxpcm package here

So, this is probably related to how the xarray accessor is being implemented in these libraries.

Versions

Output of `argopy.show_versions()`

INSTALLED VERSIONS

commit: None
libhdf5: 1.10.5
libnetcdf: 4.7.3

argopy: 0.1.5
xarray: 0.15.1
pandas: 1.1.1
numpy: 1.18.4
scipy: 1.4.1
netCDF4: 1.5.3
pydap: installed
h5netcdf: 0.8.0
h5py: 2.10.0
Nio: 1.5.5
zarr: 2.3.2
cftime: 1.1.2
nc_time_axis: None
PseudoNetCDF: installed
rasterio: 1.1.0
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2.24.0
distributed: 2.16.0
matplotlib: 3.1.3
cartopy: 0.17.0
seaborn: 0.10.1
numbagg: None
setuptools: 46.4.0.post20200518
pip: 20.0.2
conda: None
pytest: 5.4.2
IPython: 7.13.0
sphinx: 3.1.1

ValueError: Got more bytes so far (>#) than requested (#)

I'm trying to Fetch a large area for a long time.

MCVE Code Sample

# Your code here
from argopy import DataFetcher as ArgoDataFetcher

fetcher = ArgoDataFetcher()

trange = ('1993-01-01','2015-04-23')
argo_loader = fetcher.region([-30,15,-34,-18,0,2000., *trange])
da = argo_loader.to_xarray()

Expected Output

Get the xarray object for the specified region and time.

Problem Description

ValueError: Got more bytes so far (>#) than requested (#).

I found a reference to this error here, but couldn't find a way to pass these arguments using argopy.
I couldn't find any reference to this problem related to argopy and that's why I thought it would be helpful to post it here.

Traceback

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-167-e08ac0609783> in <module>
      5 trange = ('1993-01-01','2015-04-23')
      6 argo_loader = fetcher.region([-52,-30,-34,-18,0,2000., *trange])
----> 7 argo = argo_loader.to_xarray()

~/programs/anaconda3/envs/py36/lib/python3.6/site-packages/argopy/fetchers.py in to_xarray(self, **kwargs)
    214         if self._AccessPoint not in self.valid_access_points:
    215             raise InvalidFetcherAccessPoint(" Initialize an access point (%s) first." % ",".join(self.Fetchers.keys()))
--> 216         xds = self.fetcher.to_xarray(**kwargs)
    217         xds = self.postproccessor(xds)
    218         return xds

~/programs/anaconda3/envs/py36/lib/python3.6/site-packages/argopy/data_fetchers/erddap_data.py in to_xarray(self)
    335 
    336         # Download data
--> 337         ds = self.fs.open_dataset(self.url)
    338         ds = ds.rename({'row': 'N_POINTS'})
    339 

~/programs/anaconda3/envs/py36/lib/python3.6/site-packages/argopy/stores/fsspec_wrappers.py in open_dataset(self, url, **kwargs)
    249         try:
    250             with self.fs.open(url) as of:
--> 251                 ds = xr.open_dataset(of, **kwargs)
    252             self.register(url)
    253             return ds

~/programs/anaconda3/envs/py36/lib/python3.6/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime, decode_timedelta)
    534                 "with engine='scipy' or 'h5netcdf'"
    535             )
--> 536         engine = _get_engine_from_magic_number(filename_or_obj)
    537         if engine == "scipy":
    538             store = backends.ScipyDataStore(filename_or_obj, **backend_kwargs)

~/programs/anaconda3/envs/py36/lib/python3.6/site-packages/xarray/backends/api.py in _get_engine_from_magic_number(filename_or_obj)
    119                 "manager"
    120             )
--> 121         magic_number = filename_or_obj.read(8)
    122         filename_or_obj.seek(0)
    123 

~/programs/anaconda3/envs/py36/lib/python3.6/site-packages/fsspec/implementations/http.py in read(self, length)
    276         else:
    277             length = min(self.size - self.loc, length)
--> 278         return super().read(length)
    279 
    280     def _fetch_all(self):

~/programs/anaconda3/envs/py36/lib/python3.6/site-packages/fsspec/spec.py in read(self, length)
   1237             # don't even bother calling fetch
   1238             return b""
-> 1239         out = self.cache._fetch(self.loc, self.loc + length)
   1240         self.loc += len(out)
   1241         return out

~/programs/anaconda3/envs/py36/lib/python3.6/site-packages/fsspec/caching.py in _fetch(self, start, end)
    337         ):
    338             # First read, or extending both before and after
--> 339             self.cache = self.fetcher(start, bend)
    340             self.start = start
    341         elif start < self.start:

~/programs/anaconda3/envs/py36/lib/python3.6/site-packages/fsspec/implementations/http.py in _fetch_range(self, start, end)
    330                         raise ValueError(
    331                             "Got more bytes so far (>%i) than requested (%i)"
--> 332                             % (cl, end - start)
    333                         )
    334                 else:

ValueError: Got more bytes so far (>5245944) than requested (5242888)

Versions

Output of `argopy.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.10 |Anaconda, Inc.| (default, May 8 2020, 02:54:21)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 3.10.0-862.14.4.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.7.4

argopy: 0.1.4
xarray: 0.16.0
pandas: 1.0.5
numpy: 1.19.1
scipy: 1.5.0
netCDF4: 1.5.4
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: 2.4.0
cftime: 1.2.1
nc_time_axis: 1.2.0
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.21.0
distributed: 2.21.0
matplotlib: 3.2.2
cartopy: 0.17.0
seaborn: 0.10.1
numbagg: None
setuptools: 49.2.0.post20200714
pip: 20.1.1
conda: None
pytest: 6.0.1
IPython: 7.16.1
sphinx: 3.1.2

Basic description of development workflow

It would be helpful to spell out the development workflow to users that might want to contribute to the code, but are not super familiar with navigating github. I have something similar to https://xmitgcm.readthedocs.io/en/latest/development.html in mind.

Change in matplotlib settings

Hi, I have noticed than when I import the argopy fetcher:

from argopy import DataFetcher as ArgoDataFetcher

there is a change in the style of the matplotlib plots. It seems that importing the fetcher modify the default settings.
Probably it is not an issue, but it took me to understand what was going one, and it may have some side effects.

ValueError: Got more bytes so far (>2602512) than requested (2594828) when using `argo_loader`

I'm running a simple extraction of data over a rather small domain. The code works but for some combinations of parameters I get the ValueError... mentioned in the title.

MCVE Code Sample

import argopy
from argopy import DataFetcher as ArgoDataFetcher
argo_loader = ArgoDataFetcher()

domain_corse = (8.14516, 9.9408, 40.716401, 43.31488)
ds = argo_loader.region([domain_corse[0], domain_corse[1], domain_corse[2], domain_corse[3], 
                         0., 2000., '2019-01-01', '2019-11-30']).to_xarray()

ValueError Traceback (most recent call last)
in
----> 1 ds2c = argo_loader.region([domain_corse400[0], domain_corse400[1], domain_corse400[2], domain_corse400[3],
2 0., 2000., '2019-01-01','2019-11-30']).to_xarray()

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/argopy/fetchers.py in to_xarray(self, **kwargs)
220 if self._AccessPoint not in self.valid_access_points:
221 raise InvalidFetcherAccessPoint(" Initialize an access point (%s) first." % ",".join(self.Fetchers.keys()))
--> 222 xds = self.fetcher.to_xarray(**kwargs)
223 xds = self.postproccessor(xds)
224 return xds

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/argopy/data_fetchers/erddap_data.py in to_xarray(self)
319
320 # Download data
--> 321 ds = self.fs.open_dataset(self.url)
322 ds = ds.rename({'row': 'N_POINTS'})
323

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/argopy/stores/fsspec_wrappers.py in open_dataset(self, url, **kwargs)
251 try:
252 with self.fs.open(url) as of:
--> 253 ds = xr.open_dataset(of, **kwargs)
254 self.register(url)
255 return ds

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime, decode_timedelta)
536 engine = _get_engine_from_magic_number(filename_or_obj)
537 if engine == "scipy":
--> 538 store = backends.ScipyDataStore(filename_or_obj, **backend_kwargs)
539 elif engine == "h5netcdf":
540 store = backends.H5NetCDFStore.open(

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/xarray/backends/scipy_.py in init(self, filename_or_obj, mode, format, group, mmap, lock)
135 )
136 else:
--> 137 scipy_dataset = _open_scipy_netcdf(
138 filename_or_obj, mode=mode, mmap=mmap, version=version
139 )

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/xarray/backends/scipy_.py in _open_scipy_netcdf(filename, mode, mmap, version)
81
82 try:
---> 83 return scipy.io.netcdf_file(filename, mode=mode, mmap=mmap, version=version)
84 except TypeError as e: # netcdf3 message is obscure in this case
85 errmsg = e.args[0]

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/scipy/io/netcdf.py in init(self, filename, mode, mmap, version, maskandscale)
279
280 if mode in 'ra':
--> 281 self._read()
282
283 def setattr(self, attr, value):

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/scipy/io/netcdf.py in _read(self)
608 self._read_dim_array()
609 self._read_gatt_array()
--> 610 self._read_var_array()
611
612 def _read_numrecs(self):

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/scipy/io/netcdf.py in read_var_array(self)
696 pos = self.fp.tell()
697 self.fp.seek(begin)
--> 698 data = frombuffer(self.fp.read(a_size), dtype=dtype_
699 ).copy()
700 data.shape = shape

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/fsspec/implementations/http.py in read(self, length)
276 else:
277 length = min(self.size - self.loc, length)
--> 278 return super().read(length)
279
280 def _fetch_all(self):

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/fsspec/spec.py in read(self, length)
1237 # don't even bother calling fetch
1238 return b""
-> 1239 out = self.cache._fetch(self.loc, self.loc + length)
1240 self.loc += len(out)
1241 return out

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/fsspec/caching.py in _fetch(self, start, end)
354 self.start = start
355 else:
--> 356 new = self.fetcher(self.end, bend)
357 self.cache = self.cache + new
358

~/Software/PythonEnvs/Argo/lib/python3.8/site-packages/fsspec/implementations/http.py in _fetch_range(self, start, end)
328 cl += len(chunk)
329 if cl > end - start:
--> 330 raise ValueError(
331 "Got more bytes so far (>%i) than requested (%i)"
332 % (cl, end - start)

ValueError: Got more bytes so far (>2602512) than requested (2594828)

Expected Output

When I'm running the same code, except that I change the end date:

ds = argo_loader.region([domain_corse[0], domain_corse[1], domain_corse[2], domain_corse[3], 
                         0., 2000., '2019-01-01', '2019-11-29']).to_xarray()

then it works.

Versions

Output of `argopy.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.8.0 (default, Dec 10 2019, 10:35:48)
[GCC 5.4.0 20160609]
python-bits: 64
OS: Linux
OS-release: 4.15.0-120-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.6.3

argopy: 0.1.5
xarray: 0.16.0
pandas: 1.0.5
numpy: 1.19.1
scipy: 1.5.1
netCDF4: 1.5.4
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.21.0
distributed: 2.27.0
matplotlib: 3.3.2
cartopy: None
seaborn: 0.11.0
numbagg: None
setuptools: 49.2.0
pip: 20.2.4
conda: None
pytest: None
IPython: 7.16.1
sphinx: 3.1.2

"Try it" example on Binder is broken; a fresh installation is broken in the same way

It looks like a broken link; maybe a URL has changed? (Or since this is built from master, maybe a URL hasn't been changed yet?)

...

ClientResponseError: 500, message='500', url=URL('https://www.ifremer.fr/erddap/tabledap/ArgoFloats.nc?data_mode,latitude,longitude,position_qc,time,time_qc,direction,platform_number,cycle_number,config_mission_number,vertical_sampling_scheme,pres,temp,psal,pres_qc,temp_qc,psal_qc,pres_adjusted,temp_adjusted,psal_adjusted,pres_adjusted_qc,temp_adjusted_qc,psal_adjusted_qc,pres_adjusted_error,temp_adjusted_error,psal_adjusted_error&longitude%3E=-75&longitude%3C=-45&latitude%3E=20&latitude%3C=30&pres%3E=0&pres%3C=100&time%3E=1293840000.0&time%3C=1296518400.0&distinct()&orderBy(%22time,pres%22)')

TEOS-10 Accessor Improvement Suggestions

Hi @gmaze

I was poking though the xarray accessor code and noticed some things in the teos10() method:

There is a pressure to depth conversion before calculating conservative temperature. The last parameter of the CT_from_t function is pressure.
If the "standard_name" attribute is meant to be CF compliant (the GDAC argo data themselves claim to be CF compliant), the value of it needs to come from the CF Standard Name table, the buoyancy frequency and Potential Vorticity variable do not have a standard name, but there is a proposal process for getting names added to the list if we want to.
I think it might be a good idea to match the "variable names" in the GSW libraries (including case):
- SA
- CT
- SIG0 -> sigma0
- N2 -> Nsquared
- PV (maybe IPV? I don't know if this is the same thing, the TEOS-10 manual talks a lot of about isopycnal potential vorticity)
- PTEMP -> pt
It should be documented that the Nsquared has been shifted back to the CT pressure levels rather than midpoints.
The PV value is not directly from the gsw toolbox, I'm not too familiar with these calculations, but maybe should also be noted in the docstring.

The first point about pressure and depth in CT is an actual bug.

I can prepare a PR to address some/all of these.

One other comment, the recent gsw 3.4 release has support for directly operating on xarray objects, including Datasets/arrays that are backed by some sort of chunked storage (e.g. dask), I think the calls to .values would need to be removed if support for this was desired.

Error fetching data.

Hello, I'm using argopy for the first time and I'm having trouble fetching data. The error seems to be with the ERDDAP data fetcher. I'm not an experienced coder, but I'm not sure what I'm doing wrong. I am working in Jupyter notebook.

MCVE Code Sample

from argopy import DataFetcher as ArgoDataFetcher
ds = ArgoDataFetcher().region([-75, -45, 20, 30, 0, 100, '2011', '2012']).to_xarray()

Expected Output

Expected to be able to print(ds) and see the data array.

Problem Description

The code fails on the second line and I get the following error:
InvalidFetcher: Requested data fetcher 'erddap' not available ! Please try again with any of: localftp
argovis

Versions

Output of `argopy.show_versions()`

lazy loading?

I just tried out the argopy binder and am very impressed! I think this is really fantastic--will be a game changer for the field, making it much easier to access argo data.

My question is about lazy loading. In the binder, one of the first lines is:

ds = argo_loader.region([-85,-45,10.,20.,0,10.]).to_xarray()

This takes quite about a minute to run, and it downloads all of the data eagerly. The whole dataset is loaded into memory.

Might it instead be possible to use lazy loading, and have this function return an xarray dataset backed by dask arrays? In this case, the data would be loaded "on demand," i.e. whenever they are actually needed for computation, and could potentially be processed using parallel processing. (You could always recreate the existing behavior by calling .load() on the dask data.)

Perhaps this is not really a major issue, since argo data is not that big, but I thought I would raise it for discussion. I imagine it would require some refactoring of the backend data loaders.

Improve Binder experience by separating argopy content from its executing environment

#57 raised by @jpivarski points to a very bad user experience when simply trying argopy with the frontpage/README Binder link.

They are 2 points to address:

1- on which branch the Binder/Try-it link should point to ?
this should be the last release, since this is what most users would be installing if happy with the Binder testing

2- efficiency of the Binder set-up
I found this post: https://discourse.jupyter.org/t/tip-speed-up-binder-launches-by-pulling-github-content-in-a-binder-link-with-nbgitpuller/922
If we separate the environment from the content, basically in two repo, we could in fact start to work with a euroargodev repo dedicated to setting up a common environment for euroargodev softwares. I like this idea and give it a try

cannot fetch data

from argopy import DataFetcher as ArgoDataFetcher
ds = ArgoDataFetcher(src='erddap').region([-75, -45, 20, 30, 0, 100, '2011-01', '2011-02']).to_xarray()

leads to:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-4-342a12ae1e91> in <module>
----> 1 ds = ArgoDataFetcher(src='erddap').region([-75, -45, 20, 30, 0, 100, '2011-01', '2011-02']).to_xarray()

~/.miniconda3/envs/cognac/lib/python3.8/site-packages/argopy/fetchers.py in to_xarray(self, **kwargs)
    270             raise InvalidFetcher(" Initialize an access point (%s) first." %
    271                                  ",".join(self.Fetchers.keys()))
--> 272         xds = self.fetcher.to_xarray(**kwargs)
    273         xds = self.postproccessor(xds)
    274         return xds

~/.miniconda3/envs/cognac/lib/python3.8/site-packages/argopy/data_fetchers/erddap_data.py in to_xarray(self, errors)
    423         if not self.parallel:
    424             if len(self.uri) == 1:
--> 425                 ds = self.fs.open_dataset(self.uri[0])
    426             else:
    427                 ds = self.fs.open_mfdataset(

~/.miniconda3/envs/cognac/lib/python3.8/site-packages/argopy/stores/filesystems.py in open_dataset(self, url, *args, **kwargs)
    465         #with self.fs.open(url) as of:
    466         #    ds = xr.open_dataset(of, *args, **kwargs)
--> 467         data = self.fs.cat_file(url)
    468         ds = xr.open_dataset(data, *args, **kwargs)
    469         if "source" not in ds.encoding:

AttributeError: 'HTTPFileSystem' object has no attribute 'cat_file'

Reversing to this code fixes the issue:

with self.fs.open(url) as of:
        ds = xr.open_dataset(of, *args, **kwargs)

even though I run into other errors that will be reported later on.

Versions

Output of `argopy.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 | packaged by conda-forge | (default, Jul 24 2020, 01:06:20) [Clang 10.0.1 ] python-bits: 64 OS: Darwin OS-release: 18.7.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: fr_FR.UTF-8 LOCALE: None.UTF-8 libhdf5: 1.10.6 libnetcdf: 4.7.4

argopy: 0.1.7
xarray: 0.16.0
pandas: 1.0.5
numpy: 1.19.1
scipy: 1.5.2
fsspec: 0.7.4
erddapy: 0.9.0
netCDF4: 1.5.4
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.1.5
cfgrib: None
iris: None
bottleneck: None
dask: 2.20.0
distributed: 2.20.0
matplotlib: 3.2.2
cartopy: 0.18.0
seaborn: None
numbagg: None
gsw: 3.3.1
setuptools: 49.2.0.post20200712
pip: 20.1.1
conda: 4.8.3
pytest: 5.4.3
IPython: 7.16.1
sphinx: None

Long response from the erddap raising internal error

When fetching data with the erddap backend, it happens that if the server takes too much time to answer (presumably because the request is large and it takes some times to prepare), typically more than 1 minute, then a see an "internal server error":

For instance, when running the following the 1st time:

box = [-120., -85., -10, 10, 0, 500, '2019-01-01','2019-12-31']
ds = argo_loader.region(box).to_xarray()

an error is raised like:

Internal Server Error
The server encountered an internal error or misconfiguration and was unable to complete your request.
Please contact the server administrator at [email protected] to inform them of the time this error occurred, and the actions you performed just before this error.
More information about this error may be available in the server error log.

but running it a 2nd time will provide the result:

<xarray.Dataset>
Dimensions:              (N_POINTS: 919174)
Coordinates:
    LONGITUDE            (N_POINTS) float64 -104.0 -104.0 ... -104.3 -104.3
    LATITUDE             (N_POINTS) float64 -2.84 -2.84 -2.84 ... 6.981 6.981
    TIME                 (N_POINTS) datetime64[ns] 2019-01-01T03:52:37 ... 2019-12-30T17:28:14
  * N_POINTS             (N_POINTS) int64 0 1 2 3 ... 919171 919172 919173

I don't know exactly where that 1 minute delay is enforced, but argopy needs to fix this

NOTE: For requests generating more than 9308MB, then, a real 413 error is raised.

FileNotFoundError error

MCVE Code Sample

ds = argo_loader.region([8.14516, 9.9408, 40.716401, 43.31488, 
                         0., 2000., '2015-04-01', '2015-04-30'])
ds.to_xarray()

FileNotFoundError: https://www.ifremer.fr/erddap/tabledap/ArgoFloats.nc?data_mode,latitude,longitude,position_qc,time,time_qc,direction,platform_number,cycle_number,config_mission_number,vertical_sampling_scheme,pres,temp,psal,pres_qc,temp_qc,psal_qc,pres_adjusted,temp_adjusted,psal_adjusted,pres_adjusted_qc,temp_adjusted_qc,psal_adjusted_qc,pres_adjusted_error,temp_adjusted_error,psal_adjusted_error&longitude>=8.14516&longitude<=9.9408&latitude>=40.716401&latitude<=43.31488&pres>=0.0&pres<=2000.0&time>=1427846400.0&time<=1430352000.0&distinct()&orderBy("time,pres")

Expected Output

I'm not sure here, but if there is no file for the period/domain selected, I should get a ValueError, no?
Here there is the path of a file, but this file is not found.

Of course I can use a try... except to avoid the problem.

Versions

Output of `argopy.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.8.0 (default, Dec 10 2019, 10:35:48)
[GCC 5.4.0 20160609]
python-bits: 64
OS: Linux
OS-release: 4.15.0-123-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.6.3

argopy: 0.1.6
xarray: 0.16.1
pandas: 1.1.4
numpy: 1.19.4
scipy: 1.5.4
fsspec: 0.8.3
erddapy: 0.7.2
netCDF4: 1.5.4
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.4.0
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2.30.0
distributed: 2.30.0
matplotlib: 3.3.2
cartopy: 0.18.0
seaborn: 0.11.0
numbagg: None
gsw: 3.4.0
setuptools: 49.6.0
pip: 20.2.3
conda: None
pytest: 6.1.1
IPython: 7.18.1
sphinx: 3.3.1

Move contributing.rst to root directory

This comment is part of the JOSS review at openjournals/joss-reviews#2425.

It's common for software repositories to have a CONTRIBUTING file in the root directory (along with LICENSE, README, CODE_OF_CONDUCT etc).

You've already got that information in docs/contributing.rst, so it might be a good idea to move that file to the root directory. You'll then just need to update docs/index.rst with something like .. include:: ../contributing.rst so that information still appears on your Read the Docs page.

Missing dependency: aiohttp is required

Following the usage documentation,

>>> from argopy import DataFetcher as ArgoDataFetcher
/home/jpivarski/miniconda3/lib/python3.8/site-packages/argopy/plotters.py:33:
 UserWarning: argopy requires cartopy installed for full map plotting functionality
  warnings.warn("argopy requires cartopy installed for full map plotting functionality")
/home/jpivarski/miniconda3/lib/python3.8/site-packages/argopy/plotters.py:41:
 UserWarning: argopy requires seaborn installed for full plotting functionality
  warnings.warn("argopy requires seaborn installed for full plotting functionality")

>>> data = ArgoDataFetcher().region([-75, -45, 20, 30, 0, 100, '2011', '2012'])

Expected Output

No error. :)

Problem Description

The first traceback is:

  File "/home/jpivarski/miniconda3/lib/python3.8/site-packages/fsspec/registry.py", line 186, in get_filesystem_class
    register_implementation(protocol, _import_class(bit["class"]))
  File "/home/jpivarski/miniconda3/lib/python3.8/site-packages/fsspec/registry.py", line 201, in _import_class
    mod = importlib.import_module(mod)
  File "/home/jpivarski/miniconda3/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 783, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/jpivarski/miniconda3/lib/python3.8/site-packages/fsspec/implementations/http.py", line 3, in <module>
    import aiohttp
ModuleNotFoundError: No module named 'aiohttp'

but then the exception chained to:

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jpivarski/miniconda3/lib/python3.8/site-packages/argopy/fetchers.py", line 195, in region
    self.fetcher = self.Fetchers['region'](box=box, **self.fetcher_options)
  File "/home/jpivarski/miniconda3/lib/python3.8/site-packages/argopy/data_fetchers/erddap_data.py", line 85, in __init__
    self.fs = httpstore(cache=cache, cachedir=cachedir, timeout=120)
  File "/home/jpivarski/miniconda3/lib/python3.8/site-packages/argopy/stores/fsspec_wrappers.py", line 33, in __init__
    self.fs = fsspec.filesystem(self.protocol, **kw)
  File "/home/jpivarski/miniconda3/lib/python3.8/site-packages/fsspec/registry.py", line 218, in filesystem
    cls = get_filesystem_class(protocol)
  File "/home/jpivarski/miniconda3/lib/python3.8/site-packages/fsspec/registry.py", line 188, in get_filesystem_class
    raise ImportError(bit["err"]) from e
ImportError: HTTPFileSystem requires "requests" to be installed

which makes it look like "requests" is not installed. Actually, I do have "requests." What I didn't have was "aiohttp." This library is not included in the list of dependencies, nor was it automatically installed when I installed argopy from conda. It needs to go in a requirements list. I don't know whether it's correctly set up for pip; I only tested installation through conda. It's also possible that it's one of your dependencies that's not listing its dependencies.

With "aiohttp" installed, I can now run the above code and get a data object. (I haven't tried anything beyond that, yet.)

Versions

>>> import argopy
>>> argopy.__version__
'0.1.6'

Cannot access localftp data

This code accessing profile data locally fails with "Data not found" error even though the data is available at the DAC-compliant path. The works fine with src='erddap' or src='argovis'

argo_loader = ArgoDataFetcher(mode='expert', src='localftp', local_ftp='/Users/ericrehm/ftp/outgoing/argo')
wmoId = 6903550
profileNumber = 34
ds = argo_loader.profile(wmoId, profileNumber).to_xarray().to_dataframe()
ds.head()

Error Traceback

DataNotFound Traceback (most recent call last)
in
7 wmoId = 6903550
8 profileNumber = 34
----> 9 ds = argo_loader.profile(wmoId, profileNumber).to_xarray().to_dataframe()
10 ds.head()

~/opt/anaconda2/envs/py3/lib/python3.8/site-packages/argopy/fetchers.py in to_xarray(self, **kwargs)
270 raise InvalidFetcher(" Initialize an access point (%s) first." %
271 ",".join(self.Fetchers.keys()))
--> 272 xds = self.fetcher.to_xarray(**kwargs)
273 xds = self.postproccessor(xds)
274 return xds

~/opt/anaconda2/envs/py3/lib/python3.8/site-packages/argopy/data_fetchers/localftp_data.py in to_xarray(self, errors)
326 else:
327 method = self.parallel_method
--> 328 ds = self.fs.open_mfdataset(self.uri,
329 method=method,
330 concat_dim='N_POINTS',

~/opt/anaconda2/envs/py3/lib/python3.8/site-packages/argopy/stores/filesystems.py in open_mfdataset(self, urls, concat_dim, max_workers, method, progress, concat, preprocess, errors, *args, **kwargs)
307 return results
308 else:
--> 309 raise DataNotFound(urls)
310
311 def read_csv(self, url, **kwargs):

DataNotFound: ['/Users/ericrehm/ftp/outgoing/argo/dac/coriolis/6903550/profiles/BD6903550_034.nc']

However the data does exist and is accessbile

ERICs-MBP-2:Downloads ericrehm$ ncdump -h /Users/ericrehm/ftp/outgoing/argo/dac/coriolis/6903550/profiles/BD6903550_034.nc | head
netcdf BD6903550_034 {
dimensions:
DATE_TIME = 14 ;
STRING256 = 256 ;
STRING64 = 64 ;
STRING32 = 32 ;
STRING16 = 16 ;
STRING8 = 8 ;
STRING4 = 4 ;
STRING2 = 2 ;

Show_versions output

argopy: 0.1.7
src: /Users/ericrehm/opt/anaconda2/envs/py3/lib/python3.8/site-packages/argopy/init.py
options: {'src': 'erddap', 'local_ftp': '.', 'dataset': 'phy', 'cachedir': '/Users/ericrehm/.cache/argopy', 'mode': 'standard', 'api_timeout': 60}

INSTALLED VERSIONS

commit: None
python: 3.8.5 (default, Sep 4 2020, 02:22:02)
[Clang 10.0.0 ]
python-bits: 64
OS: Darwin
OS-release: 18.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.7.3

argopy: 0.1.7
xarray: 0.16.2
pandas: 1.2.1
numpy: 1.19.2
scipy: 1.5.2
fsspec: 0.8.3
erddapy: 0.9.0
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.3.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2021.02.0
distributed: None
matplotlib: 3.1.3
cartopy: 0.18.0
seaborn: 0.11.1
numbagg: None
gsw: 3.4.0
setuptools: 52.0.0.post20210125
pip: 20.3.3
conda: None
pytest: None
IPython: 7.20.0
sphinx: Nonex`

Method to clear cache folders

There is no method to clear the content of cache folders. It would be nice to provide such a method to users.

For instance, once we fetched data (using cache):

import argopy
from argopy import DataFetcher as ArgoDataFetcher

loader = ArgoDataFetcher(cache=True).region([-75, -45, 20, 30, 0, 10, '2011-01-01', '2011-06'])
ds = loader.to_xarray()

Clearing the cache folder could be done with:

loader.clearcache()
# or
argopy.utilities.clearcache()

The 2 methods would be required because the datafetcher can have its own cache folder, different from the one set with global options "cachedir".

ValueError: Cannot seek strteaming HTTP file

After successful installation of Argopy, I tried to fetch the data using the code given in " https://argopy.readthedocs.io/en/latest/data_fetching.html " ,while fetching data I am consistently getting " ValueError: Cannot seek strteaming HTTP file "

# Python 3.8.5 (default, Sep  4 2020, 07:30:14) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from argopy import DataFetcher as ArgoDataFetcher
>>> argo_loader = ArgoDataFetcher()
>>> argo_loader
<datafetcher 'Not initialised'>
Backend: erddap
Fetchers: profile, float, region
User mode: standard
>>> ds = argo_loader.region([-75, -45, 20, 30, 0, 10, '2011-01-01', '2011-06']).to_xarray()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/mayur/anaconda3/lib/python3.8/site-packages/argopy/fetchers.py", line 222, in to_xarray
    xds = self.fetcher.to_xarray(**kwargs)
  File "/home/mayur/anaconda3/lib/python3.8/site-packages/argopy/data_fetchers/erddap_data.py", line 321, in to_xarray
    ds = self.fs.open_dataset(self.url)
  File "/home/mayur/anaconda3/lib/python3.8/site-packages/argopy/stores/fsspec_wrappers.py", line 253, in open_dataset
    ds = xr.open_dataset(of, **kwargs)
  File "/home/mayur/anaconda3/lib/python3.8/site-packages/xarray/backends/api.py", line 529, in open_dataset
    engine = _get_engine_from_magic_number(filename_or_obj)
  File "/home/mayur/anaconda3/lib/python3.8/site-packages/xarray/backends/api.py", line 124, in _get_engine_from_magic_number
    filename_or_obj.seek(0)
  File "/home/mayur/anaconda3/lib/python3.8/site-packages/fsspec/implementations/http.py", line 440, in seek
    raise ValueError("Cannot seek strteaming HTTP file")
ValueError: Cannot seek strteaming HTTP file
>>>

Versions

commit: None
python: 3.8.5 (default, Sep 4 2020, 07:30:14)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-52-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.7.3

argopy: 0.1.6
xarray: 0.16.1
pandas: 1.1.3
numpy: 1.19.1
scipy: 1.5.2
netCDF4: 1.5.3
pydap: installed
h5netcdf: 0.8.1
h5py: 2.10.0
Nio: None
zarr: 2.5.0
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: installed
rasterio: 1.1.7
cfgrib: None
iris: 2.4.0
bottleneck: 1.3.2
dask: 2.30.0
distributed: 2.30.0
matplotlib: 3.3.1
cartopy: 0.18.0
seaborn: 0.11.0
numbagg: installed
setuptools: 50.3.0.post20201006
pip: 20.2.3
conda: 4.9.0
pytest: 6.1.1
IPython: 7.18.1
sphinx: 3.2.1

argovis not respecting region definition

It seems that the argovis data source fetcher for a region returns more data than it should

Code Sample

Load data:

from argopy import DataFetcher as ArgoDataFetcher
box = [-75, -45, 20, 30, 0, 100, '2011-01', '2011-02']
ds = ArgoDataFetcher(src='argovis').region(box).to_xarray()

Problem

We would expect all profiles to be inside the box domain, and no profiles outside the boundaries.

Instead, we see profiles to the north of the domain.

Then plot profile position:

import matplotlib.pyplot as plt
def plot_box(b, ax, c='k', lw=1):
    return ax.plot([b[0], b[1], b[1], b[0], b[0]],[b[2], b[2], b[3], b[3], b[2]], color=c, linewidth=lw)

fig, ax = plt.subplots(ncols=1, nrows=1, dpi=120)
plot_box(box, ax, lw=2)
ax.plot(ds['LONGITUDE'], ds['LATITUDE'], 'k+')

This seems to affect only the latitudinal boundaries. Nothing along the longitude bounds.

Versions

Output of `argopy.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.11 | packaged by conda-forge | (default, Aug 5 2020, 20:09:42)
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 4.19.112+
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.5
libnetcdf: 4.6.3

argopy: 0.1.5
xarray: 0.16.0
pandas: 1.1.1
numpy: 1.19.1
scipy: 1.5.2
netCDF4: 1.5.4
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.4.0
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.24.0
distributed: 2.24.0
matplotlib: 3.3.1
cartopy: None
seaborn: None
numbagg: None
setuptools: 45.1.0.post20200119
pip: 20.2.2
conda: None
pytest: None
IPython: 7.11.1
sphinx: 3.2.1

test against xarray master

It would be a good idea to test against the git master versions of all dependencies, especially xarray. This would've caught #65 early, and helps the xarray team fix things before release.

PS: I love argopy!

Produce contribution guidelines

This is an urgent need, argopy developers must provide contributions guidelines.

Guidelines should cover:

Bug reporting
Enhancement and new feature requests
Contributing to the library
Contributing to the documentation

Example of guidelines:
https://github.com/pydata/xarray/blob/stable/doc/contributing.rst

Visualization/plot API design

Dear colleagues,

Since data and meta data fetchers are on track (eg: #6, #5), we very soon gonna need an API for visualization and plotting of fetched Argo information.

This issue is meant to design that API.
If you want to contribute or shoot some ideas or requests please reply below.
Be specific about the fetched object to be used.

Example:

Plotting meta-data object (#6):

from argopy import ArgoIndexFetcher
idx = ArgoIndexFetcher.region([-75, -65, 10, 20]).to_dataframe()

idx.argopy.plot.trajectories() # Create a map with all float trajectories
idx.argopy.plot.dac() # Create a histogram of nb of floats by DAC
idx.argopy.plot.datamode(x='time') # Create a histogram of Data Mode ('R','D','A') over time

Note that here the API is ds.argopy.plot.<> because idx is a Pandas dataframe that already come with a plot method, so we can't customize idx.plot.<>.

Plotting data object (eg: #5):

from argopy import ArgoDataFetcher
ds = ArgoDataFetcher.region([-75, -65, 10, 20]).to_xarray()

ds.argopy.plot.trajectories() # Create a map with all float trajectories
ds.argopy.plot.profiles() # Create a plot with superimposed vertical profiles
ds.argopy.plot.profiles(hue='time') # Create a plot with superimposed vertical profiles, colored by time
ds.argopy.plot.tsdiagram() # Create a T/S diagram 
ds['TEMP'].argopy.plot.profiles(hue='QC') # Create a plot of Temperature profiles colored by QC flags

ds = ArgoDataFetcher.float(6902746).to_xarray()
ds.argopy.plot.waterfall(x='TEMP') # Create a water fall plot of temperature

Note that here the API is ds.argopy.plot.<> because ds is a xarray dataset that already come with a plot method, so we can't customize ds.plot.<>.

erddapy update to v0.8.0 breaking argopy functionality

erddapy updated to v0.8.0 which includes a major refactor that removes the utilities module (#145, #148). This breaks the import of argopy.

MCVE Code Sample

from argopy import DataFetcher as ArgoDataFetcher

Expected Output

No expected output. Should import module without error.

Problem Description

Setting up the environment, erddapy is restricted to erddapy>=0.6 . This results in conda installing erddapy=0.8.0. Version 0.8.0 features a major refactor, removing the utilities module. This causes the import error in argopy. This was a very recent change (3 days ago). Downgrading erddapy to version 0.7.0 restores normal functionality.

Versions

Output of `argopy.show_versions()`

INSTALLED VERSIONS ------------------ commit: None python: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 18:22:52) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 85 Stepping 7, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: English_United States.1252 libhdf5: 1.10.6 libnetcdf: 4.7.4 argopy: 0.1.6 xarray: 0.15.1 pandas: 1.1.4 numpy: 1.19.4 scipy: 1.5.3 fsspec: 0.8.4 erddapy: 0.8.0 netCDF4: 1.5.4 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.2.1 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: None dask: 2.30.0 distributed: 2.30.1 matplotlib: None cartopy: None seaborn: None numbagg: None gsw: 3.4.0 setuptools: 49.6.0.post20201009 pip: 20.2.4 conda: None pytest: 6.1.2 IPython: 7.19.0 sphinx: 3.3.0

List of already existing python and argo libraries / snippets

Please list url of the libraries you know, its focus/scope and main features.

We don't want to reinvent or redesigned things.

Dtype warnings with erddap data fetcher

When using the erddap data fetcher, data are downloaded from the server with a csv format that directly feeds a pandas reader.
Because csv column data types are not defined, the fetcher returns a buch of warnings like:

DtypeWarning: Columns (18,19,20) have mixed types.Specify dtype option on import or set low_memory=False.
    self.__test_float('erddap')

this is well explained here: https://stackoverflow.com/questions/24251219/pandas-read-csv-low-memory-and-dtype-options

So we need to define column types, both for efficiency, reliability and to suppress these warnings.

feature request: teos sound_speed

issue subject should be pretty explicit

Problem accessing data with src ='localftp' in 0.1.4

Hi, I'm not sure if this is a bug or something I am doing wrong...

MCVE Code Sample

argopy.set_options(mode='expert',src='localftp',local_ftp='/Users/dnicholson/Documents/gdac1/')
argo_loader = ArgoDataFetcher(ds='phy')
test_wmo = 4903261
ds = argo_loader.float(test_wmo).to_xarray()
ds

Expected Output

This works fine if I downgrade to 0.1.3. I get a similar error with 0.1.4 both for ds='phy' and for ds='bgc'

Problem Description

NetCDF4FileNotFoundError Traceback (most recent call last)
in
1 test_wmo = 4903261
----> 2 ds = argo_loader.float(test_wmo).to_xarray()
3 ds

~/opt/anaconda3/envs/argopy/lib/python3.8/site-packages/argopy/fetchers.py in to_xarray(self, **kwargs)
214 if self._AccessPoint not in self.valid_access_points:
215 raise InvalidFetcherAccessPoint(" Initialize an access point (%s) first." % ",".join(self.Fetchers.keys()))
--> 216 xds = self.fetcher.to_xarray(**kwargs)
217 xds = self.postproccessor(xds)
218 return xds

~/opt/anaconda3/envs/argopy/lib/python3.8/site-packages/argopy/data_fetchers/localftp_data.py in to_xarray(self, errors, client)
342 """
343 # Set internal list of files to load:
--> 344 self.list_argo_files(errors=errors)
345
346 # Load data (will raise an error if no data found):

~/opt/anaconda3/envs/argopy/lib/python3.8/site-packages/argopy/data_fetchers/localftp_data.py in list_argo_files(self, errors)
436 for wmo in self.WMO:
437 if self.CYC is None:
--> 438 self._list_of_argo_files.append(self._absfilepath(wmo, errors=errors))
439 else:
440 for cyc in self.CYC:

~/opt/anaconda3/envs/argopy/lib/python3.8/site-packages/argopy/data_fetchers/localftp_data.py in _absfilepath(self, wmo, cyc, errors)
195 elif len(lst) == 0:
196 if errors == 'raise':
--> 197 raise NetCDF4FileNotFoundError(pattern)
198 else:
199 # Otherwise remain silent/ignore

NetCDF4FileNotFoundError: "Couldn't find NetCDF4 file: /Users/dnicholson/Documents/gdac1/dac/*/4903261/4903261_prof.nc"

Versions

Output of `argopy.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.8.3 | packaged by conda-forge | (default, Jun 1 2020, 17:21:09)
[Clang 9.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 19.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.3

argopy: 0.1.4
xarray: 0.15.1
pandas: 1.0.5
numpy: 1.18.5
scipy: 1.4.1
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.1.3
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2.19.0
distributed: 2.19.0
matplotlib: 3.2.2
cartopy: 0.18.0
seaborn: 0.10.1
numbagg: None
setuptools: 47.3.1.post20200616
pip: 20.1.1
conda: None
pytest: None
IPython: 7.15.0
sphinx: 3.0.3

xarray version 0.16.1 breaking argopy

It seems that since xarray moved to 0.16.1, argopy is broken !

MCVE Code Sample

import xarray  # Loading version 0.16.1
import argopy  # Loading master version
l = argopy.fetchers.ArgoDataFetcher(src='erddap', cache=0).region([-60, -55, 40., 45., 0., 10., '2007-08-01', '2007-09-01'])
# l = argopy.fetchers.ArgoDataFetcher(src='erddap', cache=0).profile(6902746, 34)
l.to_xarray()

the same code with an xarray version strictly lower than 0.16.1 will work (eg. 0.15.1, 0.16.0)

This will throw an error like those found in unit testing here before we locked xarray version to 0.15.1.
This is the reason we currently have an "outdated" message for argopy requirements.

Full error stack

ValueError Traceback (most recent call last)
in
1 l = argopy.fetchers.ArgoDataFetcher(src='erddap', cache=0).region([-60, -55, 40., 45., 0., 10., '2007-08-01', '2007-09-01'])
2 # l = argopy.fetchers.ArgoDataFetcher(src='erddap', cache=0).profile(6902746, 34)
----> 3 l.to_xarray()

~/git/github/euroargodev/argopy/argopy/fetchers.py in to_xarray(self, **kwargs)
271 ",".join(self.Fetchers.keys()))
272 xds = self.fetcher.to_xarray(**kwargs)
--> 273 xds = self.postproccessor(xds)
274 return xds
275

~/git/github/euroargodev/argopy/argopy/fetchers.py in postprocessing(xds)
251
252 def postprocessing(xds):
--> 253 xds = self.fetcher.filter_data_mode(xds)
254 xds = self.fetcher.filter_qc(xds)
255 xds = self.fetcher.filter_variables(xds, self._mode)

~/git/github/euroargodev/argopy/argopy/data_fetchers/erddap_data.py in filter_data_mode(self, ds, **kwargs)
467
468 def filter_data_mode(self, ds, **kwargs):
--> 469 ds = ds.argo.filter_data_mode(errors="ignore", **kwargs)
470 if ds.argo._type == "point":
471 ds["N_POINTS"] = np.arange(0, len(ds["N_POINTS"]))

~/git/github/euroargodev/argopy/argopy/xarray.py in filter_data_mode(self, keep_error, errors)
308
309 # Create one dataset for each of the data_mode:
--> 310 argo_r, argo_a, argo_d = ds_split_datamode(ds)
311
312 # Fill in the adjusted field with the non-adjusted wherever it is NaN

~/git/github/euroargodev/argopy/argopy/xarray.py in ds_split_datamode(xds)
221 """
222 # Real-time:
--> 223 argo_r = ds.where(ds['DATA_MODE'] == 'R', drop=True)
224 for v in plist:
225 vname = v.upper() + '_ADJUSTED'

~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/common.py in where(self, cond, other, drop)
1266 cond = cond.isel(**indexers)
1267
-> 1268 return ops.where_method(self, cond, other)
1269
1270 def close(self: Any) -> None:

~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/ops.py in where_method(self, cond, other)
191 # alignment for three arguments is complicated, so don't support it yet
192 join = "inner" if other is dtypes.NA else "exact"
--> 193 return apply_ufunc(
194 duck_array_ops.where_method,
195 self,

~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/computation.py in apply_ufunc(func, input_core_dims, output_core_dims, exclude_dims, vectorize, join, dataset_join, dataset_fill_value, keep_attrs, kwargs, dask, output_dtypes, output_sizes, meta, dask_gufunc_kwargs, *args)
1090 # feed datasets apply_variable_ufunc through apply_dataset_vfunc
1091 elif any(is_dict_like(a) for a in args):
-> 1092 return apply_dataset_vfunc(
1093 variables_vfunc,
1094 *args,

~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/computation.py in apply_dataset_vfunc(func, signature, join, dataset_join, fill_value, exclude_dims, keep_attrs, *args)
405 )
406
--> 407 list_of_coords = build_output_coords(args, signature, exclude_dims)
408 args = [getattr(arg, "data_vars", arg) for arg in args]
409

~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/computation.py in build_output_coords(args, signature, exclude_dims)
222 else:
223 # TODO: save these merged indexes, instead of re-computing them later
--> 224 merged_vars, unused_indexes = merge_coordinates_without_align(
225 coords_list, exclude_dims=exclude_dims
226 )

~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/merge.py in merge_coordinates_without_align(objects, prioritized, exclude_dims)
327 filtered = collected
328
--> 329 return merge_collected(filtered, prioritized)
330
331

~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/merge.py in merge_collected(grouped, prioritized, compat)
227 variables = [variable for variable, _ in elements_list]
228 try:
--> 229 merged_vars[name] = unique_variable(name, variables, compat)
230 except MergeError:
231 if compat != "minimal":

~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/merge.py in unique_variable(name, variables, compat, equals)
118 if compat == "broadcast_equals":
119 dim_lengths = broadcast_dimension_size(variables)
--> 120 out = out.set_dims(dim_lengths)
121
122 if compat == "no_conflicts":

~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/variable.py in set_dims(self, dims, shape)
1438 # don't use broadcast_to unless necessary so the result remains
1439 # writeable if possible
-> 1440 expanded_data = self.data
1441 elif shape is not None:
1442 dims_map = dict(zip(dims, shape))

~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/variable.py in data(self)
357 return self._data
358 else:
--> 359 return self.values
360
361 @data.setter

~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/variable.py in values(self)
508 def values(self):
509 """The variable's data as a numpy.ndarray"""
--> 510 return _as_array_or_item(self._data)
511
512 @values.setter

~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/variable.py in _as_array_or_item(data)
270 data = data.get()
271 else:
--> 272 data = np.asarray(data)
273 if data.ndim == 0:
274 if data.dtype.kind == "M":

~/anaconda/envs/py38/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
81
82 """
---> 83 return array(a, dtype, copy=False, order=order)
84
85

~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/indexing.py in array(self, dtype)
683
684 def array(self, dtype=None):
--> 685 self._ensure_cached()
686 return np.asarray(self.array, dtype=dtype)
687

~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/indexing.py in _ensure_cached(self)
680 def _ensure_cached(self):
681 if not isinstance(self.array, NumpyIndexingAdapter):
--> 682 self.array = NumpyIndexingAdapter(np.asarray(self.array))
683
684 def array(self, dtype=None):

~/anaconda/envs/py38/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
81
82 """
---> 83 return array(a, dtype, copy=False, order=order)
84
85

~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/indexing.py in array(self, dtype)
653
654 def array(self, dtype=None):
--> 655 return np.asarray(self.array, dtype=dtype)
656
657 def getitem(self, key):

~/anaconda/envs/py38/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
81
82 """
---> 83 return array(a, dtype, copy=False, order=order)
84
85

~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/core/indexing.py in array(self, dtype)
558 def array(self, dtype=None):
559 array = as_indexable(self.array)
--> 560 return np.asarray(array[self.key], dtype=None)
561
562 def transpose(self, order):

~/anaconda/envs/py38/lib/python3.8/site-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
81
82 """
---> 83 return array(a, dtype, copy=False, order=order)
84
85

~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/coding/variables.py in array(self, dtype)
68
69 def array(self, dtype=None):
---> 70 return self.func(self.array)
71
72 def repr(self):

~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/coding/times.py in decode_cf_datetime(num_dates, units, calendar, use_cftime)
156 if use_cftime is None:
157 try:
--> 158 dates = _decode_datetime_with_pandas(flat_num_dates, units, calendar)
159 except (KeyError, OutOfBoundsDatetime, OverflowError):
160 dates = _decode_datetime_with_cftime(

~/anaconda/envs/py38/lib/python3.8/site-packages/xarray/coding/times.py in _decode_datetime_with_pandas(flat_num_dates, units, calendar)
121 with warnings.catch_warnings():
122 warnings.filterwarnings("ignore", "invalid value encountered", RuntimeWarning)
--> 123 pd.to_timedelta(flat_num_dates.min(), delta) + ref_date
124 pd.to_timedelta(flat_num_dates.max(), delta) + ref_date
125

~/anaconda/envs/py38/lib/python3.8/site-packages/numpy/core/_methods.py in _amin(a, axis, out, keepdims, initial, where)
41 def _amin(a, axis=None, out=None, keepdims=False,
42 initial=_NoValue, where=True):
---> 43 return umr_minimum(a, axis, None, out, keepdims, initial, where)
44
45 def _sum(a, axis=None, dtype=None, out=None, keepdims=False,

ValueError: zero-size array to reduction operation minimum which has no identity

Problem Description

Obviously, argopy must work with the last versions of xarray.

I noticed that in the breaking changes of the 0.16.1 version that:

DataArray.astype() and Dataset.astype() now preserve attributes. Keep the old behavior by passing keep_attrs=False

This may be the origin of the issue here.

Versions

Output of `argopy.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.8.6 | packaged by conda-forge | (default, Oct 7 2020, 18:42:56)
[Clang 10.0.1 ]
python-bits: 64
OS: Darwin
OS-release: 18.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.7.4

argopy: 999
xarray: 0.16.1
pandas: 1.1.3
numpy: 1.19.2
scipy: 1.5.2
fsspec: 0.8.4
erddapy: 0.7.2
netCDF4: 1.5.4
pydap: installed
h5netcdf: 0.8.1
h5py: 2.10.0
Nio: None
zarr: 2.4.0
cftime: 1.2.1
nc_time_axis: 1.2.0
PseudoNetCDF: installed
rasterio: 1.1.7
cfgrib: 0.9.8.4
iris: 2.4.0
bottleneck: 1.3.2
dask: 2.30.0
distributed: 2.30.0
matplotlib: 3.3.2
cartopy: 0.18.0
seaborn: 0.11.0
numbagg: installed
gsw: 3.4.0
setuptools: 49.6.0.post20201009
pip: 20.2.3
conda: None
pytest: 6.1.1
IPython: 7.18.1
sphinx: None

Implement data fetcher for the argovis at Colorado Univ.

Folks at the Colorado Univ. developed a nice web interface and webAPI to access Argo data.

More info at: http://www.argo.ucsd.edu/Argovis/Argovis_Tutorials.html

We should be able to provide data fetcher for this access point.

Misleading error message for Sprof fetch

When fetching data from a local netcdf Sprof file, a preprocessing error leading to an empty result at the end of open_mfdataset raises the 'DataNotFound' error. This could be misleading, since the data were found but preprocessing failed.

MCVE Code Sample

import argopy

# Download ftp sample and get the ftp local path:
ftproot = argopy.tutorial.open_dataset('localftp')[0]

# then fetch data:
with argopy.set_options(src='localftp', local_ftp=ftproot):
    ds = ArgoDataFetcher(ds='bgc', mode='expert').float(3902131).to_xarray()
    print(ds)

This code will raise the following error:

/srv/conda/envs/notebook/lib/python3.8/site-packages/argopy/stores/filesystems.py in open_mfdataset(self, urls, concat_dim, max_workers, method, progress, concat, preprocess, errors, *args, **kwargs)
    306                 return results
    307         else:
--> 308             raise DataNotFound(urls)
    309 
    310     def read_csv(self, url, **kwargs):

DataNotFound: ['/home/jovyan/.argopy_tutorial_data/ftp/dac/coriolis/3902131/3902131_Sprof.nc']

Expected Output

Since the file exists and was found, we can't say that data were not found, so a more appropriate error message is needed here.

Versions

Output of `argopy.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.8.5 | packaged by conda-forge | (default, Jul 24 2020, 01:25:15)
[GCC 7.5.0]
python-bits: 64
OS: Linux
OS-release: 4.19.112+
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.7.4

argopy: 0.1.6
xarray: 0.16.1
pandas: 1.1.4
numpy: 1.19.4
scipy: 1.5.3
fsspec: 0.8.3
erddapy: 0.7.2
netCDF4: 1.5.4
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: 2.4.0
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: 0.9.8.4
iris: None
bottleneck: 1.3.2
dask: 2.30.0
distributed: 2.30.0
matplotlib: 3.3.2
cartopy: 0.18.0
seaborn: 0.11.0
numbagg: None
gsw: 3.4.0
setuptools: 49.6.0.post20201009
pip: 20.2.3
conda: None
pytest: 6.1.1
IPython: 7.18.1
sphinx: 3.3.1

New release is needed to get a source distribution on PyPI

Pr #45 added a source distribution and binary wheel upload but the workflow on runs during a release. I'm not sure of you want to delete and re-create 0.1.5 of if you want to issue a new one 0.1.6 maybe?

Failed CI testings because of remote API errors

No or long erddap response leading to failed CI testing

I have nearly all the time failed tests because of a time out from the erddap
This is seriously inferring with CI tests !

See for instance line 1549 of this one: https://github.com/euroargodev/argopy/runs/794341074?check_suite_focus=true

         except (ProtocolError, socket.error) as err:
>           raise ConnectionError(err, request=request)
E           requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

/Users/runner/hostedtoolcache/Python/3.7.7/x64/lib/python3.7/site-packages/requests/adapters.py:498: ConnectionError

This may be related to #16

When an action run fails because of this error, a simple relaunch of the runs will always succeed (see here for the run above)

So I wonder how to address this:

leave along with it because it truly highlights something not working correctly, that could be experienced by users as well when using the library, and that need a fix (from argopy or erddap is not clear)
consider a fake erddap server to be used for CI testing, may be based on the mock package of unittest. See this example.

How to manage dependencies versions for unit testing, releases, binder, etc ...

I have a lot of difficulties to ensure successful unit testing on github actions, simply because the tests environments is evolving with dependencies new versions, while my local dev environment is rather fixed (i.e. evolves much less frequently than regular builds on github actions).
So unit testing becomes quite difficult because it's not easy to identify errors from code design, vs errors from new version of dependencies (This is happening right now with xarray version 0.16 breaking argopy in several places).

Also @jpivarski pointed (#57) to the need for our binder link to point to the latest release of argopy (instead of the master branch). This implies that we can also control the environment built by binder and points to the need of having a record of the dependencies versions for which a release works.

So, how do we manage these environments ?

A "free" env with whatever the last versions of dependencies is available, to be used in unit testing to future proof argopy
A "dev" env with a controlled set of dependencies versions, to be used in unit testing
A "release" env with a controlled set of dependencies versions, to be used with Binder (and that would also inform users of the expected env one specific release is working with)

How do we make use of conda-lock for all of that ?

I had a look at how xarray is managing this (eg here): but this is way to complex for me

@ocefpaf as you mentionned on gitter, would you be interested in helping out with this ?

Implement data fetcher for a local copy of the GDAC ftp

I already have a very nice implementation to do that very efficiently.
I just need to refactor it so that it fit the argopy design.

Feature Request: Subset index by date?

I'm trying to load profiles from post 2020 for a certain set of floats. It would of course be easy to filter these out using the to_dataframe() or to_xarray() functions, but I also want to use idx.plot('trajectory') for those 2020 and on profiles.

MCVE Code Sample

import datetime as dt
import argopy
argopy.set_options(mode='expert')
fetcher = argopy.IndexFetcher()

my_wmo_list = [
    4902417,
    4902418,
    4902424,
    4902434,
    4902435,
    4902452
]

idx = fetcher.float(my_wmo_list) # want to subset this by date!
# perhaps a kwarg?
# idx = fetcher.float(my_wmo_list, startdate=dt.date(2020, 1, 1))
# could also have an enddate=, but finds all profiles after startdate if not set, as above

Expected Output

Same indexfetcher object with appropriate dates removed.

Problem Description

Not a problem so much as a desired feature. Of course I can work around this but it would be nice to directly use the plotting features that the indexfetcher object offers!

Versions

INSTALLED VERSIONS ------------------ commit: None python: 3.8.5 (default, Aug 5 2020, 09:44:06) [MSC v.1916 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 142 Stepping 12, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: English_United States.1252 libhdf5: 1.10.4 libnetcdf: 4.7.3

argopy: 0.1.7
xarray: 0.16.1
pandas: 1.1.1
numpy: 1.19.1
scipy: 1.5.0
fsspec: 0.8.0
erddapy: 0.6.0
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: None
cftime: 1.2.1
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2.27.0
distributed: 2.27.0
matplotlib: 3.3.1
cartopy: 0.18.0
seaborn: 0.11.0
numbagg: None
gsw: 3.3.1
setuptools: 49.6.0.post20200814
pip: 20.2.2
conda: 4.9.2
pytest: 6.0.2
IPython: 7.18.1
sphinx: 3.2.1

ValueError: Cannot seek strteaming HTTP file

In unit testing there is a lot of failing because of the following error:

 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
argopy/tests/test_data_fetchers.py:123: in __test_region
    ds = ArgoDataFetcher(src=bk).region(arg).to_xarray()
argopy/fetchers.py:222: in to_xarray
    xds = self.fetcher.to_xarray(**kwargs)
argopy/data_fetchers/erddap_data.py:321: in to_xarray
    ds = self.fs.open_dataset(self.url)
argopy/stores/fsspec_wrappers.py:253: in open_dataset
    ds = xr.open_dataset(of, **kwargs)
../../../hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/xarray/backends/api.py:536: in open_dataset
    engine = _get_engine_from_magic_number(filename_or_obj)
../../../hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/xarray/backends/api.py:122: in _get_engine_from_magic_number
    filename_or_obj.seek(0)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <File-like object HTTPFileSystem, https://www.ifremer.fr/erddap/tabledap/ArgoFloats.nc?data_mode,latitude,longitude,po...titude>=40.0&latitude<=45.0&pres>=0.0&pres<=10.0&time>=1185926400.0&time<=1188604800.0&distinct()&orderBy("time,pres")>
args = (0,), kwargs = {}

    def seek(self, *args, **kwargs):
>       raise ValueError("Cannot seek strteaming HTTP file")
E       ValueError: Cannot seek strteaming HTTP file

../../../hostedtoolcache/Python/3.7.9/x64/lib/python3.7/site-packages/fsspec/implementations/http.py:433: ValueError

This seems to happen in argopy/stores/fsspec_wrappers.py when xarray open a remote dataset with:

with fs.open(url) as of:
  ds = xr.open_dataset(of, **kwargs)

where fs is a fsspec http file store

Data fetcher "Coriolis"

This fetcher will be based on Cassandra/Elastic Search API(s) developed for Coriolis database at Ifremer IT department.
Those API(s) are mostly meant for Web portals but we plan to integrate a new fetcher based on this.

To start, I create this issue to list our feedback & ideas on various aspects :

how to use it from python
how to access data
how to access metadata
how do we plan to integrate it into argopy fetching architecture

Profiles/cycles and the second profile

I have noticed that the documentacion says, in several places, something like:

for profiles of a given float:
ds = argo_loader.profile(6901254, 1).to_xarray()

I think rather than profiles should say cycle since if there is a descending profile, the above command would bring the two profiles within the first cycle, the descending and the ascending. (please sea page 13 of the Argo Data Management Team. Argo user’s manual. https://doi.org/10.13155/29825)

Maybe it is a different issue,but I have noticed that for the floats with a second profile, as 6901254, this is N_PROF=2, the above command would only bring the core argo mission profile. This second profile is described in the variable VERTICAL_SAMPLING_SCHEME ( Reference table 16: vertical sampling schemes )

sorry of this is not an issue, and just my lack of python's skills.

FR? // Filter float/profile data by available measurement (oxygen)?

Aloha ARGOpy team. First, thanks for these great efforts to make Argo data easier to use and more impactful. Well done and much appreciated.

This may be a feature request OR it may be my ignorance interpreting the docs but I'm unclear if this package provides flags to filter float / profile data to only include floats / profiles with a certain measurement, for example oxygen?

If this isn't possible what approach might one take to try to add this feature?

Thanks again.

consider using filesystem-spec

I had a quick look at your backend code, and I wanted to suggest you investigate filesystem-spec: https://filesystem-spec.readthedocs.io/en/latest

Filesystem Spec is a project to unify various projects and classes to work with remote filesystems and file-system-like abstractions using a standard pythonic interface.

Using fsspec might allow you to remove some of your code related to file downloading, caching, etc. It might also make it easier to point at different endpoints for the data (e.g. ftp, http, s3). We use it, for example, in llcreader, which is similar to this project (tries to provide a uniform API for reading ECCO LLC data regardless of where it is stored).

An added benefit of using fsspec is its end-to-end compatibility with dask, which is somewhat related to #14.

User experience when a webAPI is not alive ?

Today the argovis webAPI was down.
This resulted in cascading small issues from unit testing to doc failing.

In #34 I implemented a function to test if a webAPI entry point is alive or not.
This is used in unit testing to determine if tests can be run or not.

But what about regular argopy usage ? What to do when a webAPI is not alive to improve user experience ?

For instance, when the argovis API was not available, a fetch like this:

ArgoDataFetcher().float(1900857).to_xarray()

resulted in a FileNotFoundError error.
But this is slightly different, the file/data is there, but the API is not available.

When argopy is loaded, available fetchers are listed by the utilities.list_available_data_src() and utilities.list_available_index_src() functions. But this is merely an internal compliance check, no real availability is tested.

When a user loads argopy:

import argopy

should we also test the real availability of fetchers relying on webAPI ?

or should we try to catch FileNotFoundError error and consequently test for the API availability and inform better the error message ?

Error using argopy for the first time

Hi, I installed the argopy using

pip install git+http://github.com/euroargodev/argopy.git@master

However initializing the DataFetcher gives me an error:

Python 3.7.6 (default, Jan  8 2020, 13:42:34) 
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from argopy import DataFetcher as ArgoDataFetcher
/Users/pvb/miniconda3/lib/python3.7/site-packages/argopy/fetchers.py:48: UserWarning: An error occured while loading the ERDDAP data fetcher, it will not be available !
<class 'ModuleNotFoundError'>
  warnings.warn("An error occured while loading the ERDDAP data fetcher, it will not be available !\n%s" % e)

>>> argo_loader = ArgoDataFetcher()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/pvb/miniconda3/lib/python3.7/site-packages/argopy/fetchers.py", line 96, in __init__
    raise ValueError("The %s data fetcher is not available" % backend)
ValueError: The erddap data fetcher is not available

I am using a mac book pro with Kernel Version 19.3.0 (Catalina, although it should not be this the problem since I am even able to use dask to analyse model data, ... ), and I have installed conda with homebrew and pip from conda. Installing miniconda directly and then using pip give the same result.

Any help?
thanks a lot
Pedro

Idea: Access to quality-controlled Argo floats trajectory data

Is there a plan to add an access to quality-controlled Argo floats trajectory data such as ANDRO (https://seanoe.org/data/00360/47077/)?

Accept numpy.int32 as wmo input type

Ran into this trying to grab some float information while looping through some WMO numbers. The following error is thrown

ValueError: WMO must be a list of integers

Another less important note: the error states that WMO must be a list of integers, but a single integer (not in a list) is also ok, so this message could be improved.

MCVE Code Sample

from argopy import DataFetcher as ArgoDataFetcher
argo_loader = ArgoDataFetcher()

WMOs = np.array([4902553, 4902554, 4902555])
cycles= np.array([2, 3, 6])
print(type(WMOs[0])) # numpy.int32 

for f, c in zip(WMOs, cycles):
    ds = argo_loader.profile(f, cyc).to_xarray() # throws an error
    ds = argo_loader.profile(int(f), cyc).to_xarray() # does not

Expected Output

Same dataset as second line in for loop, improved error message, perhaps:

ValueError: WMO must be an integer or a list of integers

Problem Description

User should not need to concern themselves with what kind of integer an integer is.

Versions

General options

Would it be nice to have a user level way of defining general default options ?

Something along the line of what is possible with:

matplotlib, with a configuration file like:

import argopy.DataFetcher
import argopy
argopy.DataFetcher.use('localftp')

or xarray like:

import argopy
with argopy.set_options(backend='localftp'):
  argopy.DataFetcher.profile(6902746, 34).to_xarray()

This should be very useful in situations where users will have to define parameters for their environment like:

path to index files
path to local ftp copy
etc ...
These are not options we want users to define on every API calls

Improve performances in fetching a large datasets online

Using the erddap data source to fetch a large amount of data has poor performances (in terms of wall time).
Such kind of fetching is problematic because:

the erddap has a 1 minute time out that will throw a bad response if it takes too long to prepare to output on the server side (see #16).
the erddap server can only handle requests leading to no more than 9308MB of data.

To check this out, we can use the example from #16:

from argopy import DataFetcher as ArgoDataFetcher
box = [-120., -85., -10, 10, 0, 500, '2019-01-01','2019-12-31']
ds = ArgoDataFetcher(src='erddap').region(box).to_xarray()

To fix this issue, and more importantly improve performances and reliability of large datasets fetching, I propose to test the following pattern:

In argopy.data_fetchers.ErddapArgoDataFetcher, implement a chunking procedure for http requests:

Given a box request: split the domain into chunks/sub-domains: in space or time, or may be both.
Given a wmo request: split the request into chunks, one each wmo.

Each chunk of the full request would be fetched using its own http request and chunk data would be gathered/concatenated as they come along from the server.

Chunking the request, i.e. creating a list of urls to fetch, would be done in at the fetcher level in argopy.data_fetchers.ErddapArgoDataFetcher.
Managing a pull of requests, would be done at the file system level in argopy.stores.httpstore.

I suspect zarr or fsspec can already manage an (asynchronous) pull of requests.
In zarr, we can read it is capable of "Read an array concurrently from multiple threads or processes.", so this looks very close to the pattern above. May be @rabernat can give us his impressions on this ?

easy access for synthetic files?

Is there a way to get synthetic files from a lat/lon/time into a batch download? The way I am currently using is to call 'wget' to get the database in my university's server, then I query the files on my personal computer. (wget -r -A Sprof.nc -nH --cut-dirs=4 ftp://ftp.ifremer.fr/ifremer/argo/dac/ -A Sprof.nc). This is messy because I have way more files than I need for my particular purpose.

Ideally I'd like to use your code to download a synthetic float profile for particular locations and times. could it be a simple addition to the API query? I couldn't find it in my search within the documentation.

euroargodev / argopy Goto Github PK

argopy's People

Contributors

Stargazers

Watchers

Forkers

argopy's Issues

To reproduce the error:

Expected Output

Problem Description

Versions

INSTALLED VERSIONS

MCVE Code Sample

Expected Output

Problem Description

Versions

INSTALLED VERSIONS

MCVE Code Sample

Expected Output

Versions

INSTALLED VERSIONS

MCVE Code Sample

Expected Output

Problem Description

Versions

Versions

MCVE Code Sample

Expected Output

Versions

INSTALLED VERSIONS

Expected Output

Problem Description

Versions

Error Traceback

INSTALLED VERSIONS

Versions

Code Sample

Problem

Versions

INSTALLED VERSIONS

MCVE Code Sample

Expected Output

Problem Description

Versions

MCVE Code Sample

Expected Output

Problem Description

Versions

INSTALLED VERSIONS

MCVE Code Sample

Problem Description

Versions

INSTALLED VERSIONS

MCVE Code Sample

Expected Output

Versions

INSTALLED VERSIONS

No or long erddap response leading to failed CI testing

MCVE Code Sample

Expected Output

Problem Description

Versions

MCVE Code Sample

Expected Output

Problem Description

Versions

Recommend Projects

Recommend Topics

Recommend Org