c-proof / pyglider Goto Github PK

View Code? Open in Web Editor NEW

17.0 17.0 25.0 27.5 MB

glider software

Home Page: https://pyglider.readthedocs.io/

License: Apache License 2.0

Makefile 0.23% Python 52.65% Roff 47.12%

gliders netcdf oceanography xarray

pyglider's People

Contributors

Stargazers

Watchers

pyglider's Issues

Use of glider_devices

Are the fields in glider_devices from the deployment yaml used anywhere? I couldn't find them being accessed by any of the scripts in gliderpy. These fields include sensor names, serial numbers and calibration information. Could be useful to include in netCDF metadata, but I don't know how that would be done. Could just append them to metadata?

Cannot parse real-time microstructure data

At line ~131, "pl.read_csv(f, separator=';')" in seaexplorer.py, read_csv cannot parse extreme microstructure values as i64:

ComputeError: Could not parse 2.22e-101as dtypei64` at column 'MR1000G-RDL_EPS2' (column number 22).
The current offset in the file is 490928 bytes.

You might want to try:

increasing infer_schema_length (e.g. infer_schema_length=10000),
specifying correct dtype with the dtypes argument
setting ignore_errors to True,
adding 2.22e-101 to the null_values list.`

pytest fails

The pytest currently fails. It looks like the raw data is there, but the test is not producing the parquet file. I am using the latest commit: 71c6e13

============================================================= test session starts ==============================================================
platform linux -- Python 3.12.2, pytest-8.1.1, pluggy-1.4.0
rootdir: /home/portal/src/pyglider
plugins: cov-4.1.0
collected 26 items / 1 error                                                                                                                   

==================================================================== ERRORS ====================================================================
___________________________________________________ ERROR collecting tests/test_pyglider.py ____________________________________________________
tests/test_pyglider.py:21: in <module>
    outname = seaexplorer.raw_to_L0timeseries(rawncdir, l0tsdir,
../upstream/pyglider/pyglider/seaexplorer.py:317: in raw_to_timeseries
    gli = pl.read_parquet(f'{indir}/{id}-rawgli.parquet')
../../miniconda3/envs/pyglider/lib/python3.12/site-packages/polars/_utils/deprecation.py:134: in wrapper
    return function(*args, **kwargs)
../../miniconda3/envs/pyglider/lib/python3.12/site-packages/polars/_utils/deprecation.py:134: in wrapper
    return function(*args, **kwargs)
../../miniconda3/envs/pyglider/lib/python3.12/site-packages/polars/io/parquet/functions.py:171: in read_parquet
    lf = scan_parquet(
../../miniconda3/envs/pyglider/lib/python3.12/site-packages/polars/_utils/deprecation.py:134: in wrapper
    return function(*args, **kwargs)
../../miniconda3/envs/pyglider/lib/python3.12/site-packages/polars/_utils/deprecation.py:134: in wrapper
    return function(*args, **kwargs)
../../miniconda3/envs/pyglider/lib/python3.12/site-packages/polars/io/parquet/functions.py:311: in scan_parquet
    return pl.LazyFrame._scan_parquet(
../../miniconda3/envs/pyglider/lib/python3.12/site-packages/polars/lazyframe/frame.py:458: in _scan_parquet
    scan = _scan_parquet_fsspec(source, storage_options)  # type: ignore[arg-type]
../../miniconda3/envs/pyglider/lib/python3.12/site-packages/polars/io/parquet/anonymous_scan.py:21: in _scan_parquet_fsspec
    schema = polars.io.parquet.read_parquet_schema(data)
../../miniconda3/envs/pyglider/lib/python3.12/site-packages/polars/io/parquet/functions.py:213: in read_parquet_schema
    return _read_parquet_schema(source)
E   FileNotFoundError: No such file or directory (os error 2): ...yglider/tests/example-data/example-seaexplorer/realtime_rawnc//dfo-eva035-rawgli.parquet
=========================================================== short test summary info ============================================================
ERROR tests/test_pyglider.py - FileNotFoundError: No such file or directory (os error 2): ...yglider/tests/example-data/example-seaexplorer/realtime_rawnc//dfo-eva035-raw...
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
=============================================================== 1 error in 3.86s ===============================================================

We should force timebase for slocums as well.

          Note I think that we should force timebase for slocums as well.

Originally posted by @jklymak in #139 (comment)

Code cleanup

We could use clearing out unused imports from scripts and dependencies from the environment. Also the code could be linted by something like black to keep the formatting consistent. This could be a pre-commit hook if we're feeling fancy.

This could also clean out some of the excess print statements and improve the logging

interpolation of heading

Heading of the seaexplorer is recorded at a lower temporal resolution than science sensors, approx. 20 seconds between measurements. Current solution is to interpolate onto the timebase of the science sensors. This could cause some unintended results when glider is heading around north, e.g. interpolation of 180 between two values of 355 and 005.

There must be a smart way to fix this. Maybe using deg2rad and some kind of normalisation?

dbdreader no longer works

dbdreader seems to have started returning a tuple from data access. We need to pin to old dbdreader and maybe update our code. An example that fails would be helpful.

CF standard name proposal for additional acoustic standard names

A few active acoustic standard names exist in the current CF vocabulary. This just serves as a notices that additional standard names have been proposed and invite comments from the community. The issue is linked here: cf-convention/vocabularies#44

MNT: Factor out profile computation...

Currently, binary_to_timeseries or raw_to_timeseries (Sea explorer) both have the "profile" finding logic in them. Recent experience has shown that this step is often somewhat flaky, and causes the raw time series process to not complete.

I'd propose we break the "profile" logic into its own routine and ask users to call it after making the time series to add profile info. That will also allow more modularity with the "best" profile logic, which I find depends on the sampling frequency of the pressure sensor. It would also allow for taking advantage of any metadata sent up by the glider as well. So now users would do:

    outname = slocum.raw_to_timeseries(
        rawdir, l1tsdir, deploymentyaml)
    outname = ncprocess.get_profiles(outname, ...)

AROD oxygen data wrong

From @bastienqueste:

Note that oxygen from the AROD is wrong as it assumes a salinity of 0 PSU. To get proper oxygen, calculate saturation from the bad oxygen concentration and solubility calculated with good temperature and 0 salinity.
From that saturation (which is a useful one to plot), we can get good concentration using a solubility calculated with good tmeperature and salinity.
This salinity related error is probably about ~15-30% of the value so it's not worth ignoring

Does this fall within the remit of pyglider? It's not calibration of correction, just reporting accurate vales. It would involve a recalculation though.

[DOC] document interpolation of sensor variables...

Both the dbdreader code and the parquet seaexplorer code linearly interpolate onto a timebase. We should document this in some detail now that we have a variety of different deployment setups. Probably a bit of a project to do comprehensively, but I think we should maybe have a utility that creates a report where the user just needs to specify a cast number (to avoid shallow/bad casts) and it makes some plots of raw data and what we give them. Then we can compile some of the examples. This appendix to the documentation could also include what settings on the gliders were used to get the data set so that we understand what settings set the sampling frequencies and the results are reproducible.

Value Error of conflicting sizes for dimension 'time' and 'distance over ground'

In running the example-slocum scripts using the example data, a value error arises which reads "ValueError: conflicting sizes for dimension 'time': length 3487 on 'distance_over_ground' and length 3489 on {'time': 'time'}." I have not manipulated the example scripts given at all and have the required packages installed. Attached is a screenshot of the terminal after running process_deploymentRealTime.py.

convert to gsw

Linked to Issue #15, the seawater package is deprecated. We should aim to migrate to gsw once tests are in place to ensure nothing is changed

Duplicate time values in pld1 break oxygen correction

If fed a pld1 file with duplicate timestamps, the reindex call in utils.oxygen_concentration_correction will fail

ds_temp = data.potential_temperature[~np.isnan(data.potential_temperature)].reindex(time=ds_oxy.time, method="nearest")

Instances of duplicate timestamps happen very rarely. In this snippet of pld file, we see timestamp 11:12:28.730 repeated.

   18/11/2021 11:12:28.730;116;1604.485;5521.078;;;;;;;;;;;;;;9.0473;9.7355;1.6952;7.3451;9.7404;
   18/11/2021 11:12:28.761;116;1604.485;5521.078;;;;;;;;;;;;;;;;;;;
   18/11/2021 11:12:28.792;116;1604.485;5521.078;;;;;;;;;;;;;;9.0473;9.7354;1.7010;7.3450;9.6665;
   18/11/2021 11:12:28.823;116;1604.485;5521.078;;;;;;;;;;;;;;;;;;;
   18/11/2021 11:12:28.838;116;1604.485;5521.078;;;;;;;;;;;;;;;;;;;
   18/11/2021 11:12:28.871;116;1604.485;5521.078;;;;;;;;;;;;;;9.0467;9.7354;1.6963;7.3445;9.7035;
   18/11/2021 11:12:28.886;116;1604.485;5521.078;;;;;;;;;;;;;;;;;;;
   18/11/2021 11:12:28.901;116;1604.485;5521.078;;;;;;;;;;;;;;;;;;;
   18/11/2021 11:12:28.935;116;1604.485;5521.078;;;;;;;;;;;;;;9.0453;9.7354;1.6883;7.3433;9.6665;
   18/11/2021 11:12:28.952;116;1604.485;5521.078;;;;;;;;;;;;;;;;;;;
   18/11/2021 11:12:28.970;116;1604.485;5521.078;;;;;;;;;;;;;;;;;;;
   18/11/2021 11:12:28.664;116;1604.485;5521.079;-0.026;;;;;;;;;;;;;9.0462;9.7353;1.6874;7.3441;9.6665;
   18/11/2021 11:12:28.680;116;1604.485;5521.079;-0.026;;;;;;;;;;;;;;;;;;
   18/11/2021 11:12:28.699;116;1604.485;5521.079;-0.026;;;;;;;;;;;;;;;;;;
   18/11/2021 11:12:28.730;116;1604.485;5521.079;-0.026;;;;;;;;;;;;;9.0462;9.7353;1.6639;7.3441;9.6295;

This is easily fixed using xarray Dataset.drop_duplicates. Should this be performed as a catching step in the oxygen correction function? Or earlier in processing? Or should some other workaround be used?

Legato error values in pressure converted to depth

The legato pressure sensor is used to calculate glider depth. When the sensor is switched off/malfunctioning, the seaexplorer reports a value of 9999 for pressure. This is converted to a depth of 9690 m by utils.get_glider_depth. While 9999 is a fairly obvious error value, 9690 is not.

This function has a filter step for "good" pressures which it later uses to interpolate over the bad depths. Should we exclude pressure ratings greater than the maximum pressure from the "good" subset of pressures used to calculate depth perhaps?

Extra metadata - physical

Following a discussion with SeaExplorer data users, several additional pieces of metadata have been suggested:

glider mass and glider volume for flight model regression. This should be simple to add to the yaml at the metadata level. Not sure if we'd need a structure to include units as well, or have names like glider_mass_kg and glider_volume_m3

location of sensors this is a little more involved, but seemingly the location of the sensors on the glider can be important in post-processing. One suggested method was to define x, y, z of the sensors relative to some consistent point of the glider chassis. These could be added to each sensor in the glider_sensors structure of the yaml

Processing Level identifiers

These appear to be inconsistent between the seaexplorer and slocum examples in their creation in process_deployementRealTime and removal via Makefile.

These levels are:
Lx-timeseries
Ly-profiles (seaexplorer only?)
Lz-gridfiles

Across the four files, x,y,z are some variant of 0,0,0 or 0,1,2

What processing level is correct for these products?

Use of logging

I haven't used logging before. I understand it is better than my usual method of sprinkling print statements throughout the code. How do we access the log though?

I can't find anywhere in the code that specifies a log file, following the example in the docs I'd expect something like:

logging.basicConfig(filename='example.log', level=logging.INFO)

Any ideas?

oxygen correction wrongly nans data in raw mode

If the oxygen correction is applied to raw SeaExplorer data, the majority of oxygen data can be mistakenly converted to nan. This is due to a timestamps issue.

Oxygen data and CTD data are typically recorded at different times by the SeaExplorer, most CTD observations do not have an exactly corresponding oxygen observation, though they may be temporally very close. glider_utils.oxy.oxygen_concentration_correction applies a correction on the assumption that oxygen measurements have corresponding temperature and salinity measurements which are used to correct for the 0 salinity assumed by the AROD oxygen optode. This is often not the case. Where the oxygen data does not have corresponding temp/sal it is converted to nan.

This issue does not occur with sub data, as used in the tests, as these data are comprised of lines where each timestamp has a measurement for every variable. For this reason, the issue escaped detection by the tests.

How to handle underwater lon/lat for SeaExplorer?

In my recent trials with pyglider+seaexplorer, I noticed that the underwater positions for the most recent missions are a bit funky, and after some digging I realized it's because of a (somewhat recent) firmware change that writes the dead-reckoning determined positions into the data file, along with an extra column to indicate whether the location was measured (GPS) or calculated (Alseamar dead reckoning algorithm).

Below is an example from a recent mission, showing how the dead-reckoning positions are propagating through the processing, but because they don't account for the depth-averaged current, the final position of a dive cycle ends up being discontinuous with the start of the next dive cycle.

Bug in profile index

When procesing a large dataset, profile_index is 0 every 20 dives or so for a short time

This appears to correspond with profile_direction going to 0 rather than +/- 1

Salinity calculation factor 10 error

There's something weird going on with salinity. The factor of 10 multiplication of conductivity in utils.py line 200 wrecks our data, resulting in salinities up around 100 g/kg.

Is this factor of 10 necessary for processing GPCTD data? If so, it should be applied in place to conductivity. At present, the SX example produces final netcdfs where salinity is ~ 10x conductivity. I think this is most likely an error.

If this function is to be refactored, I would recommend moving away from seawater. This package has not been updated since 2015. The maintainer (Filipe) recommends moving to TEOS-10 as implemented in gsw

Update Installing for pip and conda

Refactor to avoid creating individual instrument netcdfs

As per comments by @jklymak in PR #14:

Maybe we should go back though and simplify the logic, because the seperate sensor streams were a hack anyway, and not integral to the processing.

I wonder if we can get rid of this hack, and just process without the prefix at all? That greatly simplifies all this code and we don't need to assume anything about the carrier instrument for each sensor.

How to distribute the examples

python packages can have data, but usually that is for things the routines need at runtime. Not 100% sure how we should distribute the examples here. Maybe as a tarball in the doc/_static/? Not sure what other libraries do for examples like this that require significant data.

1970s from glider reboots

If a seaexplorer reboots mid-mission, a few 1970 datestamps will be produced before it gets a timefix from the GPS. This can mess up processing down the line, particularly as xarray assumes monotonically increasing datetime index for several operations.

It would make sense to fix this as far upstream as possible. A crude but simple fix could be to drop any lines with datestamps before 1971 at the raw to rawnc stage. Or should it be left until later?

pexamp.get_example_data('./') returns 404 error

I don't know if this function should be deprecated or updated since it returns a 404 error.

Add tests

The simplest test could make use of http://xarray.pydata.org/en/stable/generated/xarray.DataArray.equals.html to check that the final datasets do not change. This would require uploading a finished dataset to the repo. The timeseries dataset is the smallest, would make a good target

Eventually tests should cover all of the functions in pyglider. This will take time, but we should try to ensure that all new functionality comes with tests

gridding PAR - should log first

As recommend by Basiten Queste: PAR should be logged before gridding. This is sensible as PAR is an exponential variable. We wouldn't want to bias it when performing the mean while gridding.

This changed averaging strategy should be logged in the metadata of the variable. Maybe it should be controllable from the yaml too?

Make yaml files more modular.

Perhaps this is just our internal c-proof usage, but having yaml files be subtly different between "realtime" and delayed mode processing leads to transcription errors. It would be good to have more robust ways of either making the yaml files or to make them a bit more modular so information doesn't need to be copied over as much.

RecursionError: maximum recursion depth exceeded

Using the example code, I am having trouble with the slocum.raw_to_timeseries. It doesn’t seem to be running correctly and therefore isn’t defining “outname”. I get returned the attached error message-

I am using the example code to try to run this (from example-slocum), so I think it is something with how I am running the code and not my files. Is anyone familiar with this and can provide any advice? I'm not very familiar with python so unsure what this error means or why I am having this issue.

Consider using dbdreader as backend?

Currently pyglider has its own python-only dbd-numpy converter for slocums vendored from https://gitlab.oceantrack.org/ceotr-public/dinkum

https://github.com/smerckel/dbdreader reads dinkum binary files (slocum)

Pros:

fast; I didn't test, but I bet 10x faster.
maintained by maybe somewhat larger community
having utility installed directly is pretty good way to look at raw dinkum files interactively.

Cons:

c-code
- code is pretty generic so should be easy to compile
- no conda recipe. But they have a wheel?
- not sure ease on Windows? users will need a compiler to install pyglider. Perhaps a pain for potential SeaExplorer-only users? Not sure what OSs the wheel works on...
some effort to integrate, though looks minimal.
not sure of the commit policy on smerckel/dbdreader. Looks like maybe some commits are direct, rather than via PR? Maybe OK if we are only using released versions.

A quick look indicates this would be pretty easy to implement. We could probably even have both backends, with the faster c backend as optional If the C code turns out to be problematic.

ping @callumrollo @hvdosser @smerckel...

I am having trouble using pyglider.

Hello everyone,

I would like to use pyglider to process Slocum glider data. When I installed pyglider today I ran into a couple of issues that I wanted to raise here. In the following I will list my steps to ensure that I have not done a mistake (I am very new to python and Slocum gliders):

I have installed the required packages and created a new environment as suggested on the website.
conda create -n gliderwork
conda activate gliderwork
conda install -c conda-forge pyglider
I downloaded the example data manually, as
import pyglider.example_data as pexamp
pexamp.get_example_data('./')
is not working.
Having the example data I wanted to run the example code to make sure that everything is working properly. The code is shown in the shown on the getting started : slocum (https://pyglider.readthedocs.io/en/latest/getting-started-slocum.html)

I wanted to run the first couple of lines to ensure the packages are imported properly. I received the following error message:
Attribute Error: pyglider.slocum' has no attribute 'binary_to_timeseries

I sort of solved this issue by copying the code from pyglider.slocum from github and replacing the old code.

I then received another error message that dbdreader is not defined. I realized that dbdreader was not installed with the packages (I think its important to transform the binary code from for example *.dbd files?) I have tried to install dbdreader separately, but it failed miserably on both my MAC and my Windows Laptop.

I received the following error message:
%%%%%
Collecting dbdreader
Using cached dbdreader-0.4.11.tar.gz (768 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: numpy in ./opt/anaconda3/envs/gliderwork/lib/python3.11/site-packages (from dbdreader) (1.24.1)
Building wheels for collected packages: dbdreader
Building wheel for dbdreader (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [15 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-10.9-x86_64-cpython-311
creating build/lib.macosx-10.9-x86_64-cpython-311/dbdreader
copying dbdreader/init.py -> build/lib.macosx-10.9-x86_64-cpython-311/dbdreader
copying dbdreader/dbdreader.py -> build/lib.macosx-10.9-x86_64-cpython-311/dbdreader
running build_ext
building '_dbdreader' extension
creating build/temp.macosx-10.9-x86_64-cpython-311
creating build/temp.macosx-10.9-x86_64-cpython-311/extension
clang -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /Users/riaoelerich/opt/anaconda3/envs/gliderwork/include -fPIC -O2 -isystem /Users/riaoelerich/opt/anaconda3/envs/gliderwork/include -Iextension/include -I/Users/riaoelerich/opt/anaconda3/envs/gliderwork/include/python3.11 -c extension/dbdreader.c -o build/temp.macosx-10.9-x86_64-cpython-311/extension/dbdreader.o
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
error: command '/usr/bin/clang' failed with exit code 1
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for dbdreader
Running setup.py clean for dbdreader
Failed to build dbdreader
Installing collected packages: dbdreader
Running setup.py install for dbdreader ... error
error: subprocess-exited-with-error

× Running setup.py install for dbdreader did not run successfully.
│ exit code: 1
╰─> [17 lines of output]
running install
/Users/riaoelerich/opt/anaconda3/envs/gliderwork/lib/python3.11/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running build
running build_py
creating build
creating build/lib.macosx-10.9-x86_64-cpython-311
creating build/lib.macosx-10.9-x86_64-cpython-311/dbdreader
copying dbdreader/init.py -> build/lib.macosx-10.9-x86_64-cpython-311/dbdreader
copying dbdreader/dbdreader.py -> build/lib.macosx-10.9-x86_64-cpython-311/dbdreader
running build_ext
building '_dbdreader' extension
creating build/temp.macosx-10.9-x86_64-cpython-311
creating build/temp.macosx-10.9-x86_64-cpython-311/extension
clang -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /Users/riaoelerich/opt/anaconda3/envs/gliderwork/include -fPIC -O2 -isystem /Users/riaoelerich/opt/anaconda3/envs/gliderwork/include -Iextension/include -I/Users/riaoelerich/opt/anaconda3/envs/gliderwork/include/python3.11 -c extension/dbdreader.c -o build/temp.macosx-10.9-x86_64-cpython-311/extension/dbdreader.o
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
error: command '/usr/bin/clang' failed with exit code 1
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> dbdreader

note: This is an issue with the package mentioned above, not pip.
%%%%

Sorry for this rather lengthy issue. I wonder whether some of you have encountered this? Or whether I did something wrong?

Best wishes,
Ria

Slow processing of large datasets

There appears to be a performance bottleneck in raw_to_rawnc. In the docstring it is noted that this is slow for large datasets. As this function converts small files individually, it looks ideal for multiprocessing.

Need to write tests for profile and gridded netcdf files

Tests should include a check on the time variables (time and profile_time), specifically the values and units, to ensure that data and metadata remains IOOS compliant.

Output OG 1.0

Ultimately, we should aim for the OceanGliders 1.0 format https://github.com/OceanGlidersCommunity/OG-format-user-manual

This will take some time. If you spot an atomic change that can be made to move toward OG 1.0, please submit an Issue for it

Alternative time variales incorrectly processed

@Maribaken discovered a bug when trying to process AD2CP_TIME from a SeaExplorer. This is timing from the ADCP internal clock, which can differ substantially from the SX payload clock, so it's important to carry this data forward.

These are correctly interpreted as datetimes at the raw_to_rawnc stage

:array([ 'NaT', 'NaT',
'2021-12-03T10:45:39.000000000', '2021-12-03T10:46:09.000000000',
'2021-12-03T10:46:39.000000000'], dtype='datetime64[ns]')

but get converted to floats during raw_to_L0timeseries becoming

array([ nan, nan, 0., ..., 1990., 2020., 2050.])

The culprit is the decode_times=False kwarg in sensor = xr.open_dataset(f'{indir}/{id}-{kind}pld.nc', decode_times=False) in seaexplorer.py line 248

I think this flag is in here to force the main timestamp to be in seconds since 1970, so that the function utils.get_profiles_new works as it takes calculates a median time difference between samples in seconds.

Should I do a specific fix for the AD2CP? Are there other sensors that record internal time we need to keep? Should I try to rework utils.get_profiles_new to work with datetimes rather than seconds since 1970?

attribute: _FillValue & Extra profile files?

When I run my profile files through the IOOS compliance checker the only issue I get is "Variable profile_time must contain attribute: _FillValue" - is there a way to add _FillValue to profile_time? I checked the files and _FillValue isn't included with profile_time.

On another note - we went through our profile .nc files and it seems that half the files captured the start/end of a profile - all the expected profiles are reflected, but every other .nc file generated has this structure with 25 sample points either from the bottom or top of the profile, but is not a continuous dive. I have added the first few EBD/DBD files along with the resulting .nc files (one is a usable profile, one has the 25 samples) and the timeseries/gridfile. Any explanation for why this is happening and how to fix it? https://github.com/mackenziemeier86/Pyglider-Test-Data-and-Output-.git

xarray error adding time attribute in seaexplorer

When running the seaexplorer example, xarray throws an error in seaexplorer.py line 361:

INFO:pyglider.seaexplorer:interpolating latitude
Traceback (most recent call last):
  File "/home/callum/Documents/data-flow/raw-to-nc/pyglider/example-seaexplorer/process_deploymentRealTime.py", line 35, in <module>
    outname = seaexplorer.raw_to_L0timeseries(rawncdir, l0tsdir, deploymentyaml, kind='sub')
  File "/home/callum/Documents/data-flow/raw-to-nc/pyglider/pyglider/seaexplorer.py", line 361, in raw_to_L0timeseries
    ds[name] = (('time'), val, attrs)
  File "/home/callum/anaconda3/envs/pyglider/lib/python3.9/site-packages/xarray/core/dataset.py", line 1563, in __setitem__
    self.update({key: value})
  File "/home/callum/anaconda3/envs/pyglider/lib/python3.9/site-packages/xarray/core/dataset.py", line 4208, in update
    merge_result = dataset_update_method(self, other)
  File "/home/callum/anaconda3/envs/pyglider/lib/python3.9/site-packages/xarray/core/merge.py", line 984, in dataset_update_method
    return merge_core(
  File "/home/callum/anaconda3/envs/pyglider/lib/python3.9/site-packages/xarray/core/merge.py", line 632, in merge_core
    collected = collect_variables_and_indexes(aligned)
  File "/home/callum/anaconda3/envs/pyglider/lib/python3.9/site-packages/xarray/core/merge.py", line 294, in collect_variables_and_indexes
    variable = as_variable(variable, name=name)
  File "/home/callum/anaconda3/envs/pyglider/lib/python3.9/site-packages/xarray/core/variable.py", line 121, in as_variable
    raise TypeError(
TypeError: Using a DataArray object to construct a variable is ambiguous, please extract the data using the .data property.

Process finished with exit code 1

I think this worked in earlier versions of xarray, but more recently the package is more strict on ambiguity. I used a conda environment created from the environment.yml included in the repo with python=3.9.7, xarray=0.19.0:

# packages in environment at /home/callum/anaconda3/envs/pyglider:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       1_gnu    conda-forge
alsa-lib                  1.2.3                h516909a_0    conda-forge
bitstring                 3.1.9              pyhd8ed1ab_0    conda-forge
bokeh                     2.4.1            py39hf3d152e_1    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.17.2               h7f98852_0    conda-forge
ca-certificates           2021.10.8            ha878542_0    conda-forge
certifi                   2021.10.8        py39hf3d152e_0    conda-forge
cftime                    1.5.1            py39hce5d2b2_0    conda-forge
click                     8.0.3            py39hf3d152e_0    conda-forge
cloudpickle               2.0.0              pyhd8ed1ab_0    conda-forge
curl                      7.79.1               h2574ce0_1    conda-forge
cycler                    0.10.0                     py_2    conda-forge
cytoolz                   0.11.0           py39h3811e60_3    conda-forge
dask                      2021.9.1           pyhd8ed1ab_0    conda-forge
dask-core                 2021.9.1           pyhd8ed1ab_0    conda-forge
dbus                      1.13.6               h48d8840_2    conda-forge
distributed               2021.9.1         py39hf3d152e_0    conda-forge
expat                     2.4.1                h9c3ff4c_0    conda-forge
fontconfig                2.13.1            hba837de_1005    conda-forge
freetype                  2.10.4               h0708190_1    conda-forge
fsspec                    2021.10.1          pyhd8ed1ab_0    conda-forge
geojson                   2.5.0                      py_0    conda-forge
gettext                   0.19.8.1          h73d1719_1008    conda-forge
glib                      2.70.0               h780b84a_0    conda-forge
glib-tools                2.70.0               h780b84a_0    conda-forge
gst-plugins-base          1.18.5               hf529b03_0    conda-forge
gstreamer                 1.18.5               h76c114f_0    conda-forge
hdf4                      4.2.15               h10796ff_3    conda-forge
hdf5                      1.12.1          nompi_h2750804_101    conda-forge
heapdict                  1.0.1                      py_0    conda-forge
icu                       68.1                 h58526e2_0    conda-forge
jbig                      2.1               h7f98852_2003    conda-forge
jinja2                    3.0.2              pyhd8ed1ab_0    conda-forge
jpeg                      9d                   h36c2ea0_0    conda-forge
kiwisolver                1.3.2            py39h1a9c180_0    conda-forge
krb5                      1.19.2               hcc1bbae_2    conda-forge
lcms2                     2.12                 hddcbb42_0    conda-forge
ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
lerc                      3.0                  h9c3ff4c_0    conda-forge
libblas                   3.9.0           12_linux64_openblas    conda-forge
libcblas                  3.9.0           12_linux64_openblas    conda-forge
libclang                  11.1.0          default_ha53f305_1    conda-forge
libcurl                   7.79.1               h2574ce0_1    conda-forge
libdeflate                1.8                  h7f98852_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libevent                  2.1.10               h9b69904_4    conda-forge
libffi                    3.4.2                h9c3ff4c_4    conda-forge
libgcc-ng                 11.2.0              h1d223b6_11    conda-forge
libgfortran-ng            11.2.0              h69a702a_11    conda-forge
libgfortran5              11.2.0              h5c6108e_11    conda-forge
libglib                   2.70.0               h174f98d_0    conda-forge
libgomp                   11.2.0              h1d223b6_11    conda-forge
libiconv                  1.16                 h516909a_0    conda-forge
liblapack                 3.9.0           12_linux64_openblas    conda-forge
libllvm11                 11.1.0               hf817b99_2    conda-forge
libnetcdf                 4.8.1           nompi_hb3fd0d9_101    conda-forge
libnghttp2                1.43.0               h812cca2_1    conda-forge
libogg                    1.3.4                h7f98852_1    conda-forge
libopenblas               0.3.18          pthreads_h8fe5266_0    conda-forge
libopus                   1.3.1                h7f98852_1    conda-forge
libpng                    1.6.37               h21135ba_2    conda-forge
libpq                     13.3                 hd57d9b9_1    conda-forge
libssh2                   1.10.0               ha56f1ee_2    conda-forge
libstdcxx-ng              11.2.0              he4da1e4_11    conda-forge
libtiff                   4.3.0                h6f004c6_2    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libvorbis                 1.3.7                h9c3ff4c_0    conda-forge
libwebp-base              1.2.1                h7f98852_0    conda-forge
libxcb                    1.13              h7f98852_1003    conda-forge
libxkbcommon              1.0.3                he3ba5ed_0    conda-forge
libxml2                   2.9.12               h72842e0_0    conda-forge
libzip                    1.8.0                h4de3113_1    conda-forge
libzlib                   1.2.11            h36c2ea0_1013    conda-forge
locket                    0.2.0                      py_2    conda-forge
lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
markupsafe                2.0.1            py39h3811e60_0    conda-forge
matplotlib                3.4.3            py39hf3d152e_1    conda-forge
matplotlib-base           3.4.3            py39h2fa2bec_1    conda-forge
msgpack-python            1.0.2            py39h1a9c180_1    conda-forge
mysql-common              8.0.26               ha770c72_0    conda-forge
mysql-libs                8.0.26               hfa10184_0    conda-forge
ncurses                   6.2                  h58526e2_4    conda-forge
netcdf4                   1.5.7           nompi_py39h64b754b_103    conda-forge
nspr                      4.30                 h9c3ff4c_0    conda-forge
nss                       3.69                 hb5efdd6_1    conda-forge
numpy                     1.21.2           py39hdbf815f_0    conda-forge
olefile                   0.46               pyh9f0ad1d_1    conda-forge
openjpeg                  2.4.0                hb52868f_1    conda-forge
openssl                   1.1.1l               h7f98852_0    conda-forge
packaging                 21.0               pyhd8ed1ab_0    conda-forge
pandas                    1.3.4            py39hde0f152_0    conda-forge
partd                     1.2.0              pyhd8ed1ab_0    conda-forge
pcre                      8.45                 h9c3ff4c_0    conda-forge
pillow                    8.3.2            py39ha612740_0    conda-forge
pip                       21.3               pyhd8ed1ab_0    conda-forge
psutil                    5.8.0            py39h3811e60_1    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
pyqt                      5.12.3           py39hf3d152e_7    conda-forge
pyqt-impl                 5.12.3           py39h0fcd23e_7    conda-forge
pyqt5-sip                 4.19.18          py39he80948d_7    conda-forge
pyqtchart                 5.12             py39h0fcd23e_7    conda-forge
pyqtwebengine             5.12.1           py39h0fcd23e_7    conda-forge
python                    3.9.7           hb7a2778_3_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python_abi                3.9                      2_cp39    conda-forge
pytz                      2021.3             pyhd8ed1ab_0    conda-forge
pyyaml                    6.0              py39h3811e60_0    conda-forge
qt                        5.12.9               hda022c4_4    conda-forge
readline                  8.1                  h46c0cb4_0    conda-forge
scipy                     1.7.1            py39hee8e79c_0    conda-forge
seawater                  3.3.4                      py_1    conda-forge
setuptools                58.2.0           py39hf3d152e_0    conda-forge
simplekml                 1.3.6              pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
sortedcontainers          2.4.0              pyhd8ed1ab_0    conda-forge
sqlite                    3.36.0               h9cd32fc_2    conda-forge
tblib                     1.7.0              pyhd8ed1ab_0    conda-forge
tk                        8.6.11               h27826a3_1    conda-forge
toolz                     0.11.1                     py_0    conda-forge
tornado                   6.1              py39h3811e60_1    conda-forge
typing_extensions         3.10.0.2           pyha770c72_0    conda-forge
tzdata                    2021d                he74cb21_0    conda-forge
webcolors                 1.11.1             pyhd8ed1ab_0    conda-forge
wheel                     0.37.0             pyhd8ed1ab_1    conda-forge
xarray                    0.19.0             pyhd8ed1ab_1    conda-forge
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
yaml                      0.2.5                h516909a_0    conda-forge
zict                      2.0.0                      py_0    conda-forge
zlib                      1.2.11            h36c2ea0_1013    conda-forge
zstd                      1.5.0                ha95c52a_0    conda-forge

Specifying an xarray version could fix this. Or the offending line could be corrected. I'll submit a PR for the latter approach.

test suites on other Python versions,

It could be worth adding test suites on other Python versions, at least 3.7 and 3.9. Testing on 3.10 is still a bit tricky atm but could be added later

Originally posted by @callumrollo in #8 (comment)

Refactor to paths

We're currently using strings to interact with the filesytem. Ideally we'd change to using pathlib as it is more robust

yaml to control sensor processing

As discussed in PR #14, it would be good to control coarsening of data and perhaps other things? This could be done with optional additions to the deployment yaml. For discussion

SeaExplorer delayed mode time series data loss

In testing the new code for this pull request, I found an issue with processing the delayed mode SeaExplorer time series data for missions for which certain sensors (oxygen in this case) are severely oversampled. These missions end up with delayed mode data files that contain fewer actual (non-nan) data points than the realtime files. In other words, we are losing data during the processing.

Currently, the dropna function is used to remove the oversampled oxygen data when converting the raw data. The dropna function is working correctly, however note that the resulting data has many nan values in it, for both the CTD and optics. These nan values will often not co-occur.

I think the problem in the processing is caused by using the GPCTD_TEMPERATURE as the default time base in seaexplorer.py. This variable contains nan values that are not all co-located with the nan values in the oxygen and optical variables. It's desirable to use the CTD as the time base, but we may need to do some interpolation to avoid losing data when the other variables are mapped onto this base.

sea explorer time base fallbacks don't work with parquet

pyglider/pyglider/seaexplorer.py

Line 323 in 67c6acd

elif 'GPCTD_TEMPERATURE' in list(sensor.variables):

These lines error out if timebase is not specified in the yaml. I guess we could enforce timebase in the yaml, but this code should be fixed or removed.

Would Zarr be a possible format option for pyglider?

@truedichotomy asked:

    Would Zarr be a possible format option for pyglider?

https://medium.com/pangeo/cloud-performant-reading-of-netcdf4-hdf5-data-using-the-zarr-library-1a95c5c92314

https://www.azavea.com/blog/2022/09/22/benchmarking-zarr-and-parquet-data-retrieval-using-the-national-water-model-nwm-in-a-cloud-native-environment/

Originally posted by @truedichotomy in #120 (comment)

IOOS DAC compatibility issues: time variable not being populated with attributes

Most of the other variables in our netcdf files have attributes, but time does not, despite it having an entry in deployment.yml. Should be easy to add a code block to fill the attributes into the time variable.

Need to start having documentation.

We need to start making a set of documentation for this. I somewhat prefer markdown, so if we can use markdown in sphinx, that would probably be easiest.

make conda recipe

Dynamically handle payload configurations and sensor calibrations

This is a redo of an issue that originally only referred to the DFO east coast gliders which have an integrated SBE43 oxygen sensor instead of the Rinko AROD, but really it should be generalized to include a way of configuring pyglider to handle any glider payload configuration through the setup in the deployment.yml (the current version is hard-coded to assume GP-CTD+FLBBCD+AROD only).

If this should perhaps be split into more issues (e.g. to handle the general case vs the SBE43-specific one) just let me know.

A complication with the SBE43 is that the data stream from the sensor is raw frequency, and requires calculations using the calibration coefficients to turn it into a usable concentration. Below is an example data set from one of our recent missions, and I will also post details of the raw-to-calibrated data conversion in a follow-up comment.

sea021m059_sbe43_example_data.zip

Extra Profile .nc files

We went through our profile .nc files and it seems that half the files captured the start/end of a profile - all the expected profiles are reflected, but every other .nc file generated has this structure with 25 sample points from the bottom and top of the profile, but is not a continuous dive. I have added the first few EBD/DBD files along with the resulting .nc files (one is a usable profile, one has the 25 samples) and the timeseries/gridfile. Any explanation for why this is happening and how to fix it? https://github.com/mackenziemeier86/Pyglider-Test-Data-and-Output-.git

c-proof / pyglider Goto Github PK

pyglider's People

Contributors

Stargazers

Watchers

Forkers

pyglider's Issues

Pros:

Cons:

Recommend Projects

Recommend Topics

Recommend Org