c-proof / pyglider Goto Github PK
View Code? Open in Web Editor NEWglider software
Home Page: https://pyglider.readthedocs.io/
License: Apache License 2.0
glider software
Home Page: https://pyglider.readthedocs.io/
License: Apache License 2.0
Are the fields in glider_devices from the deployment yaml used anywhere? I couldn't find them being accessed by any of the scripts in gliderpy. These fields include sensor names, serial numbers and calibration information. Could be useful to include in netCDF metadata, but I don't know how that would be done. Could just append them to metadata?
At line ~131, "pl.read_csv(f, separator=';')" in seaexplorer.py, read_csv cannot parse extreme microstructure values as i64:
ComputeError: Could not parse
2.22e-101as dtype
i64` at column 'MR1000G-RDL_EPS2' (column number 22).
The current offset in the file is 490928 bytes.
You might want to try:
infer_schema_length
(e.g. infer_schema_length=10000
),dtypes
argumentignore_errors
to True
,2.22e-101
to the null_values
list.`The pytest
currently fails. It looks like the raw data is there, but the test is not producing the parquet file. I am using the latest commit: 71c6e13
============================================================= test session starts ==============================================================
platform linux -- Python 3.12.2, pytest-8.1.1, pluggy-1.4.0
rootdir: /home/portal/src/pyglider
plugins: cov-4.1.0
collected 26 items / 1 error
==================================================================== ERRORS ====================================================================
___________________________________________________ ERROR collecting tests/test_pyglider.py ____________________________________________________
tests/test_pyglider.py:21: in <module>
outname = seaexplorer.raw_to_L0timeseries(rawncdir, l0tsdir,
../upstream/pyglider/pyglider/seaexplorer.py:317: in raw_to_timeseries
gli = pl.read_parquet(f'{indir}/{id}-rawgli.parquet')
../../miniconda3/envs/pyglider/lib/python3.12/site-packages/polars/_utils/deprecation.py:134: in wrapper
return function(*args, **kwargs)
../../miniconda3/envs/pyglider/lib/python3.12/site-packages/polars/_utils/deprecation.py:134: in wrapper
return function(*args, **kwargs)
../../miniconda3/envs/pyglider/lib/python3.12/site-packages/polars/io/parquet/functions.py:171: in read_parquet
lf = scan_parquet(
../../miniconda3/envs/pyglider/lib/python3.12/site-packages/polars/_utils/deprecation.py:134: in wrapper
return function(*args, **kwargs)
../../miniconda3/envs/pyglider/lib/python3.12/site-packages/polars/_utils/deprecation.py:134: in wrapper
return function(*args, **kwargs)
../../miniconda3/envs/pyglider/lib/python3.12/site-packages/polars/io/parquet/functions.py:311: in scan_parquet
return pl.LazyFrame._scan_parquet(
../../miniconda3/envs/pyglider/lib/python3.12/site-packages/polars/lazyframe/frame.py:458: in _scan_parquet
scan = _scan_parquet_fsspec(source, storage_options) # type: ignore[arg-type]
../../miniconda3/envs/pyglider/lib/python3.12/site-packages/polars/io/parquet/anonymous_scan.py:21: in _scan_parquet_fsspec
schema = polars.io.parquet.read_parquet_schema(data)
../../miniconda3/envs/pyglider/lib/python3.12/site-packages/polars/io/parquet/functions.py:213: in read_parquet_schema
return _read_parquet_schema(source)
E FileNotFoundError: No such file or directory (os error 2): ...yglider/tests/example-data/example-seaexplorer/realtime_rawnc//dfo-eva035-rawgli.parquet
=========================================================== short test summary info ============================================================
ERROR tests/test_pyglider.py - FileNotFoundError: No such file or directory (os error 2): ...yglider/tests/example-data/example-seaexplorer/realtime_rawnc//dfo-eva035-raw...
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
=============================================================== 1 error in 3.86s ===============================================================
Note I think that we should force timebase for slocums as well.
Originally posted by @jklymak in #139 (comment)
We could use clearing out unused imports from scripts and dependencies from the environment. Also the code could be linted by something like black
to keep the formatting consistent. This could be a pre-commit hook if we're feeling fancy.
This could also clean out some of the excess print statements and improve the logging
Heading of the seaexplorer is recorded at a lower temporal resolution than science sensors, approx. 20 seconds between measurements. Current solution is to interpolate onto the timebase of the science sensors. This could cause some unintended results when glider is heading around north, e.g. interpolation of 180 between two values of 355 and 005.
There must be a smart way to fix this. Maybe using deg2rad and some kind of normalisation?
dbdreader seems to have started returning a tuple from data access. We need to pin to old dbdreader and maybe update our code. An example that fails would be helpful.
A few active acoustic standard names exist in the current CF vocabulary. This just serves as a notices that additional standard names have been proposed and invite comments from the community. The issue is linked here: cf-convention/vocabularies#44
Currently, binary_to_timeseries
or raw_to_timeseries
(Sea explorer) both have the "profile" finding logic in them. Recent experience has shown that this step is often somewhat flaky, and causes the raw time series process to not complete.
I'd propose we break the "profile" logic into its own routine and ask users to call it after making the time series to add profile info. That will also allow more modularity with the "best" profile logic, which I find depends on the sampling frequency of the pressure sensor. It would also allow for taking advantage of any metadata sent up by the glider as well. So now users would do:
outname = slocum.raw_to_timeseries(
rawdir, l1tsdir, deploymentyaml)
outname = ncprocess.get_profiles(outname, ...)
From @bastienqueste:
Note that oxygen from the AROD is wrong as it assumes a salinity of 0 PSU. To get proper oxygen, calculate saturation from the bad oxygen concentration and solubility calculated with good temperature and 0 salinity.
From that saturation (which is a useful one to plot), we can get good concentration using a solubility calculated with good tmeperature and salinity.
This salinity related error is probably about ~15-30% of the value so it's not worth ignoring
Does this fall within the remit of pyglider? It's not calibration of correction, just reporting accurate vales. It would involve a recalculation though.
Both the dbdreader code and the parquet seaexplorer code linearly interpolate onto a timebase. We should document this in some detail now that we have a variety of different deployment setups. Probably a bit of a project to do comprehensively, but I think we should maybe have a utility that creates a report where the user just needs to specify a cast number (to avoid shallow/bad casts) and it makes some plots of raw data and what we give them. Then we can compile some of the examples. This appendix to the documentation could also include what settings on the gliders were used to get the data set so that we understand what settings set the sampling frequencies and the results are reproducible.
In running the example-slocum scripts using the example data, a value error arises which reads "ValueError: conflicting sizes for dimension 'time': length 3487 on 'distance_over_ground' and length 3489 on {'time': 'time'}." I have not manipulated the example scripts given at all and have the required packages installed. Attached is a screenshot of the terminal after running process_deploymentRealTime.py.
Linked to Issue #15, the seawater package is deprecated. We should aim to migrate to gsw once tests are in place to ensure nothing is changed
If fed a pld1 file with duplicate timestamps, the reindex call in utils.oxygen_concentration_correction will fail
ds_temp = data.potential_temperature[~np.isnan(data.potential_temperature)].reindex(time=ds_oxy.time, method="nearest")
Instances of duplicate timestamps happen very rarely. In this snippet of pld file, we see timestamp 11:12:28.730
repeated.
18/11/2021 11:12:28.730;116;1604.485;5521.078;;;;;;;;;;;;;;9.0473;9.7355;1.6952;7.3451;9.7404;
18/11/2021 11:12:28.761;116;1604.485;5521.078;;;;;;;;;;;;;;;;;;;
18/11/2021 11:12:28.792;116;1604.485;5521.078;;;;;;;;;;;;;;9.0473;9.7354;1.7010;7.3450;9.6665;
18/11/2021 11:12:28.823;116;1604.485;5521.078;;;;;;;;;;;;;;;;;;;
18/11/2021 11:12:28.838;116;1604.485;5521.078;;;;;;;;;;;;;;;;;;;
18/11/2021 11:12:28.871;116;1604.485;5521.078;;;;;;;;;;;;;;9.0467;9.7354;1.6963;7.3445;9.7035;
18/11/2021 11:12:28.886;116;1604.485;5521.078;;;;;;;;;;;;;;;;;;;
18/11/2021 11:12:28.901;116;1604.485;5521.078;;;;;;;;;;;;;;;;;;;
18/11/2021 11:12:28.935;116;1604.485;5521.078;;;;;;;;;;;;;;9.0453;9.7354;1.6883;7.3433;9.6665;
18/11/2021 11:12:28.952;116;1604.485;5521.078;;;;;;;;;;;;;;;;;;;
18/11/2021 11:12:28.970;116;1604.485;5521.078;;;;;;;;;;;;;;;;;;;
18/11/2021 11:12:28.664;116;1604.485;5521.079;-0.026;;;;;;;;;;;;;9.0462;9.7353;1.6874;7.3441;9.6665;
18/11/2021 11:12:28.680;116;1604.485;5521.079;-0.026;;;;;;;;;;;;;;;;;;
18/11/2021 11:12:28.699;116;1604.485;5521.079;-0.026;;;;;;;;;;;;;;;;;;
18/11/2021 11:12:28.730;116;1604.485;5521.079;-0.026;;;;;;;;;;;;;9.0462;9.7353;1.6639;7.3441;9.6295;
This is easily fixed using xarray Dataset.drop_duplicates. Should this be performed as a catching step in the oxygen correction function? Or earlier in processing? Or should some other workaround be used?
The legato pressure sensor is used to calculate glider depth. When the sensor is switched off/malfunctioning, the seaexplorer reports a value of 9999 for pressure. This is converted to a depth of 9690 m by utils.get_glider_depth
. While 9999 is a fairly obvious error value, 9690 is not.
This function has a filter step for "good" pressures which it later uses to interpolate over the bad depths. Should we exclude pressure ratings greater than the maximum pressure from the "good" subset of pressures used to calculate depth perhaps?
Following a discussion with SeaExplorer data users, several additional pieces of metadata have been suggested:
glider mass and glider volume for flight model regression. This should be simple to add to the yaml at the metadata
level. Not sure if we'd need a structure to include units as well, or have names like glider_mass_kg
and glider_volume_m3
location of sensors this is a little more involved, but seemingly the location of the sensors on the glider can be important in post-processing. One suggested method was to define x, y, z of the sensors relative to some consistent point of the glider chassis. These could be added to each sensor in the glider_sensors
structure of the yaml
These appear to be inconsistent between the seaexplorer and slocum examples in their creation in process_deployementRealTime and removal via Makefile.
These levels are:
Lx-timeseries
Ly-profiles (seaexplorer only?)
Lz-gridfiles
Across the four files, x,y,z are some variant of 0,0,0 or 0,1,2
What processing level is correct for these products?
I haven't used logging before. I understand it is better than my usual method of sprinkling print statements throughout the code. How do we access the log though?
I can't find anywhere in the code that specifies a log file, following the example in the docs I'd expect something like:
logging.basicConfig(filename='example.log', level=logging.INFO)
Any ideas?
If the oxygen correction is applied to raw SeaExplorer data, the majority of oxygen data can be mistakenly converted to nan. This is due to a timestamps issue.
Oxygen data and CTD data are typically recorded at different times by the SeaExplorer, most CTD observations do not have an exactly corresponding oxygen observation, though they may be temporally very close. glider_utils.oxy.oxygen_concentration_correction
applies a correction on the assumption that oxygen measurements have corresponding temperature and salinity measurements which are used to correct for the 0 salinity assumed by the AROD oxygen optode. This is often not the case. Where the oxygen data does not have corresponding temp/sal it is converted to nan.
This issue does not occur with sub data, as used in the tests, as these data are comprised of lines where each timestamp has a measurement for every variable. For this reason, the issue escaped detection by the tests.
In my recent trials with pyglider+seaexplorer, I noticed that the underwater positions for the most recent missions are a bit funky, and after some digging I realized it's because of a (somewhat recent) firmware change that writes the dead-reckoning determined positions into the data file, along with an extra column to indicate whether the location was measured (GPS) or calculated (Alseamar dead reckoning algorithm).
Below is an example from a recent mission, showing how the dead-reckoning positions are propagating through the processing, but because they don't account for the depth-averaged current, the final position of a dive cycle ends up being discontinuous with the start of the next dive cycle.
There's something weird going on with salinity. The factor of 10 multiplication of conductivity in utils.py
line 200 wrecks our data, resulting in salinities up around 100 g/kg.
Is this factor of 10 necessary for processing GPCTD data? If so, it should be applied in place to conductivity. At present, the SX example produces final netcdfs where salinity is ~ 10x conductivity. I think this is most likely an error.
If this function is to be refactored, I would recommend moving away from seawater. This package has not been updated since 2015. The maintainer (Filipe) recommends moving to TEOS-10 as implemented in gsw
As per comments by @jklymak in PR #14:
Maybe we should go back though and simplify the logic, because the seperate sensor streams were a hack anyway, and not integral to the processing.
I wonder if we can get rid of this hack, and just process without the prefix at all? That greatly simplifies all this code and we don't need to assume anything about the carrier instrument for each sensor.
python packages can have data, but usually that is for things the routines need at runtime. Not 100% sure how we should distribute the examples here. Maybe as a tarball in the doc/_static/
? Not sure what other libraries do for examples like this that require significant data.
If a seaexplorer reboots mid-mission, a few 1970 datestamps will be produced before it gets a timefix from the GPS. This can mess up processing down the line, particularly as xarray assumes monotonically increasing datetime index for several operations.
It would make sense to fix this as far upstream as possible. A crude but simple fix could be to drop any lines with datestamps before 1971 at the raw to rawnc stage. Or should it be left until later?
I don't know if this function should be deprecated or updated since it returns a 404 error.
The simplest test could make use of http://xarray.pydata.org/en/stable/generated/xarray.DataArray.equals.html to check that the final datasets do not change. This would require uploading a finished dataset to the repo. The timeseries dataset is the smallest, would make a good target
Eventually tests should cover all of the functions in pyglider. This will take time, but we should try to ensure that all new functionality comes with tests
As recommend by Basiten Queste: PAR should be logged before gridding. This is sensible as PAR is an exponential variable. We wouldn't want to bias it when performing the mean while gridding.
This changed averaging strategy should be logged in the metadata of the variable. Maybe it should be controllable from the yaml too?
Perhaps this is just our internal c-proof usage, but having yaml files be subtly different between "realtime" and delayed mode processing leads to transcription errors. It would be good to have more robust ways of either making the yaml files or to make them a bit more modular so information doesn't need to be copied over as much.
Using the example code, I am having trouble with the slocum.raw_to_timeseries. It doesn’t seem to be running correctly and therefore isn’t defining “outname”. I get returned the attached error message-
I am using the example code to try to run this (from example-slocum), so I think it is something with how I am running the code and not my files. Is anyone familiar with this and can provide any advice? I'm not very familiar with python so unsure what this error means or why I am having this issue.
Currently pyglider has its own python-only dbd-numpy converter for slocums vendored from https://gitlab.oceantrack.org/ceotr-public/dinkum
https://github.com/smerckel/dbdreader reads dinkum binary files (slocum)
A quick look indicates this would be pretty easy to implement. We could probably even have both backends, with the faster c backend as optional If the C code turns out to be problematic.
ping @callumrollo @hvdosser @smerckel...
Hello everyone,
I would like to use pyglider to process Slocum glider data. When I installed pyglider today I ran into a couple of issues that I wanted to raise here. In the following I will list my steps to ensure that I have not done a mistake (I am very new to python and Slocum gliders):
I have installed the required packages and created a new environment as suggested on the website.
conda create -n gliderwork
conda activate gliderwork
conda install -c conda-forge pyglider
I downloaded the example data manually, as
import pyglider.example_data as pexamp
pexamp.get_example_data('./')
is not working.
Having the example data I wanted to run the example code to make sure that everything is working properly. The code is shown in the shown on the getting started : slocum (https://pyglider.readthedocs.io/en/latest/getting-started-slocum.html)
I wanted to run the first couple of lines to ensure the packages are imported properly. I received the following error message:
Attribute Error: pyglider.slocum' has no attribute 'binary_to_timeseries
I sort of solved this issue by copying the code from pyglider.slocum from github and replacing the old code.
I then received another error message that dbdreader is not defined. I realized that dbdreader was not installed with the packages (I think its important to transform the binary code from for example *.dbd files?) I have tried to install dbdreader separately, but it failed miserably on both my MAC and my Windows Laptop.
I received the following error message:
%%%%%
Collecting dbdreader
Using cached dbdreader-0.4.11.tar.gz (768 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: numpy in ./opt/anaconda3/envs/gliderwork/lib/python3.11/site-packages (from dbdreader) (1.24.1)
Building wheels for collected packages: dbdreader
Building wheel for dbdreader (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [15 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build/lib.macosx-10.9-x86_64-cpython-311
creating build/lib.macosx-10.9-x86_64-cpython-311/dbdreader
copying dbdreader/init.py -> build/lib.macosx-10.9-x86_64-cpython-311/dbdreader
copying dbdreader/dbdreader.py -> build/lib.macosx-10.9-x86_64-cpython-311/dbdreader
running build_ext
building '_dbdreader' extension
creating build/temp.macosx-10.9-x86_64-cpython-311
creating build/temp.macosx-10.9-x86_64-cpython-311/extension
clang -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /Users/riaoelerich/opt/anaconda3/envs/gliderwork/include -fPIC -O2 -isystem /Users/riaoelerich/opt/anaconda3/envs/gliderwork/include -Iextension/include -I/Users/riaoelerich/opt/anaconda3/envs/gliderwork/include/python3.11 -c extension/dbdreader.c -o build/temp.macosx-10.9-x86_64-cpython-311/extension/dbdreader.o
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
error: command '/usr/bin/clang' failed with exit code 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for dbdreader
Running setup.py clean for dbdreader
Failed to build dbdreader
Installing collected packages: dbdreader
Running setup.py install for dbdreader ... error
error: subprocess-exited-with-error
× Running setup.py install for dbdreader did not run successfully.
│ exit code: 1
╰─> [17 lines of output]
running install
/Users/riaoelerich/opt/anaconda3/envs/gliderwork/lib/python3.11/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
running build
running build_py
creating build
creating build/lib.macosx-10.9-x86_64-cpython-311
creating build/lib.macosx-10.9-x86_64-cpython-311/dbdreader
copying dbdreader/init.py -> build/lib.macosx-10.9-x86_64-cpython-311/dbdreader
copying dbdreader/dbdreader.py -> build/lib.macosx-10.9-x86_64-cpython-311/dbdreader
running build_ext
building '_dbdreader' extension
creating build/temp.macosx-10.9-x86_64-cpython-311
creating build/temp.macosx-10.9-x86_64-cpython-311/extension
clang -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /Users/riaoelerich/opt/anaconda3/envs/gliderwork/include -fPIC -O2 -isystem /Users/riaoelerich/opt/anaconda3/envs/gliderwork/include -Iextension/include -I/Users/riaoelerich/opt/anaconda3/envs/gliderwork/include/python3.11 -c extension/dbdreader.c -o build/temp.macosx-10.9-x86_64-cpython-311/extension/dbdreader.o
xcrun: error: invalid active developer path (/Library/Developer/CommandLineTools), missing xcrun at: /Library/Developer/CommandLineTools/usr/bin/xcrun
error: command '/usr/bin/clang' failed with exit code 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure
× Encountered error while trying to install package.
╰─> dbdreader
note: This is an issue with the package mentioned above, not pip.
%%%%
Sorry for this rather lengthy issue. I wonder whether some of you have encountered this? Or whether I did something wrong?
Best wishes,
Ria
There appears to be a performance bottleneck in raw_to_rawnc. In the docstring it is noted that this is slow for large datasets. As this function converts small files individually, it looks ideal for multiprocessing.
Tests should include a check on the time variables (time and profile_time), specifically the values and units, to ensure that data and metadata remains IOOS compliant.
Ultimately, we should aim for the OceanGliders 1.0 format https://github.com/OceanGlidersCommunity/OG-format-user-manual
This will take some time. If you spot an atomic change that can be made to move toward OG 1.0, please submit an Issue for it
@Maribaken discovered a bug when trying to process AD2CP_TIME from a SeaExplorer. This is timing from the ADCP internal clock, which can differ substantially from the SX payload clock, so it's important to carry this data forward.
These are correctly interpreted as datetimes at the raw_to_rawnc stage
:array([ 'NaT', 'NaT',
'2021-12-03T10:45:39.000000000', '2021-12-03T10:46:09.000000000',
'2021-12-03T10:46:39.000000000'], dtype='datetime64[ns]')
but get converted to floats during raw_to_L0timeseries
becoming
array([ nan, nan, 0., ..., 1990., 2020., 2050.])
The culprit is the decode_times=False
kwarg in sensor = xr.open_dataset(f'{indir}/{id}-{kind}pld.nc', decode_times=False)
in seaexplorer.py line 248
I think this flag is in here to force the main timestamp to be in seconds since 1970, so that the function utils.get_profiles_new
works as it takes calculates a median time difference between samples in seconds.
Should I do a specific fix for the AD2CP? Are there other sensors that record internal time we need to keep? Should I try to rework utils.get_profiles_new
to work with datetimes rather than seconds since 1970?
When I run my profile files through the IOOS compliance checker the only issue I get is "Variable profile_time must contain attribute: _FillValue" - is there a way to add _FillValue to profile_time? I checked the files and _FillValue isn't included with profile_time.
On another note - we went through our profile .nc files and it seems that half the files captured the start/end of a profile - all the expected profiles are reflected, but every other .nc file generated has this structure with 25 sample points either from the bottom or top of the profile, but is not a continuous dive. I have added the first few EBD/DBD files along with the resulting .nc files (one is a usable profile, one has the 25 samples) and the timeseries/gridfile. Any explanation for why this is happening and how to fix it? https://github.com/mackenziemeier86/Pyglider-Test-Data-and-Output-.git
When running the seaexplorer example, xarray throws an error in seaexplorer.py line 361:
INFO:pyglider.seaexplorer:interpolating latitude
Traceback (most recent call last):
File "/home/callum/Documents/data-flow/raw-to-nc/pyglider/example-seaexplorer/process_deploymentRealTime.py", line 35, in <module>
outname = seaexplorer.raw_to_L0timeseries(rawncdir, l0tsdir, deploymentyaml, kind='sub')
File "/home/callum/Documents/data-flow/raw-to-nc/pyglider/pyglider/seaexplorer.py", line 361, in raw_to_L0timeseries
ds[name] = (('time'), val, attrs)
File "/home/callum/anaconda3/envs/pyglider/lib/python3.9/site-packages/xarray/core/dataset.py", line 1563, in __setitem__
self.update({key: value})
File "/home/callum/anaconda3/envs/pyglider/lib/python3.9/site-packages/xarray/core/dataset.py", line 4208, in update
merge_result = dataset_update_method(self, other)
File "/home/callum/anaconda3/envs/pyglider/lib/python3.9/site-packages/xarray/core/merge.py", line 984, in dataset_update_method
return merge_core(
File "/home/callum/anaconda3/envs/pyglider/lib/python3.9/site-packages/xarray/core/merge.py", line 632, in merge_core
collected = collect_variables_and_indexes(aligned)
File "/home/callum/anaconda3/envs/pyglider/lib/python3.9/site-packages/xarray/core/merge.py", line 294, in collect_variables_and_indexes
variable = as_variable(variable, name=name)
File "/home/callum/anaconda3/envs/pyglider/lib/python3.9/site-packages/xarray/core/variable.py", line 121, in as_variable
raise TypeError(
TypeError: Using a DataArray object to construct a variable is ambiguous, please extract the data using the .data property.
Process finished with exit code 1
I think this worked in earlier versions of xarray, but more recently the package is more strict on ambiguity. I used a conda environment created from the environment.yml included in the repo with python=3.9.7, xarray=0.19.0:
# packages in environment at /home/callum/anaconda3/envs/pyglider:
#
# Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 1_gnu conda-forge
alsa-lib 1.2.3 h516909a_0 conda-forge
bitstring 3.1.9 pyhd8ed1ab_0 conda-forge
bokeh 2.4.1 py39hf3d152e_1 conda-forge
bzip2 1.0.8 h7f98852_4 conda-forge
c-ares 1.17.2 h7f98852_0 conda-forge
ca-certificates 2021.10.8 ha878542_0 conda-forge
certifi 2021.10.8 py39hf3d152e_0 conda-forge
cftime 1.5.1 py39hce5d2b2_0 conda-forge
click 8.0.3 py39hf3d152e_0 conda-forge
cloudpickle 2.0.0 pyhd8ed1ab_0 conda-forge
curl 7.79.1 h2574ce0_1 conda-forge
cycler 0.10.0 py_2 conda-forge
cytoolz 0.11.0 py39h3811e60_3 conda-forge
dask 2021.9.1 pyhd8ed1ab_0 conda-forge
dask-core 2021.9.1 pyhd8ed1ab_0 conda-forge
dbus 1.13.6 h48d8840_2 conda-forge
distributed 2021.9.1 py39hf3d152e_0 conda-forge
expat 2.4.1 h9c3ff4c_0 conda-forge
fontconfig 2.13.1 hba837de_1005 conda-forge
freetype 2.10.4 h0708190_1 conda-forge
fsspec 2021.10.1 pyhd8ed1ab_0 conda-forge
geojson 2.5.0 py_0 conda-forge
gettext 0.19.8.1 h73d1719_1008 conda-forge
glib 2.70.0 h780b84a_0 conda-forge
glib-tools 2.70.0 h780b84a_0 conda-forge
gst-plugins-base 1.18.5 hf529b03_0 conda-forge
gstreamer 1.18.5 h76c114f_0 conda-forge
hdf4 4.2.15 h10796ff_3 conda-forge
hdf5 1.12.1 nompi_h2750804_101 conda-forge
heapdict 1.0.1 py_0 conda-forge
icu 68.1 h58526e2_0 conda-forge
jbig 2.1 h7f98852_2003 conda-forge
jinja2 3.0.2 pyhd8ed1ab_0 conda-forge
jpeg 9d h36c2ea0_0 conda-forge
kiwisolver 1.3.2 py39h1a9c180_0 conda-forge
krb5 1.19.2 hcc1bbae_2 conda-forge
lcms2 2.12 hddcbb42_0 conda-forge
ld_impl_linux-64 2.36.1 hea4e1c9_2 conda-forge
lerc 3.0 h9c3ff4c_0 conda-forge
libblas 3.9.0 12_linux64_openblas conda-forge
libcblas 3.9.0 12_linux64_openblas conda-forge
libclang 11.1.0 default_ha53f305_1 conda-forge
libcurl 7.79.1 h2574ce0_1 conda-forge
libdeflate 1.8 h7f98852_0 conda-forge
libedit 3.1.20191231 he28a2e2_2 conda-forge
libev 4.33 h516909a_1 conda-forge
libevent 2.1.10 h9b69904_4 conda-forge
libffi 3.4.2 h9c3ff4c_4 conda-forge
libgcc-ng 11.2.0 h1d223b6_11 conda-forge
libgfortran-ng 11.2.0 h69a702a_11 conda-forge
libgfortran5 11.2.0 h5c6108e_11 conda-forge
libglib 2.70.0 h174f98d_0 conda-forge
libgomp 11.2.0 h1d223b6_11 conda-forge
libiconv 1.16 h516909a_0 conda-forge
liblapack 3.9.0 12_linux64_openblas conda-forge
libllvm11 11.1.0 hf817b99_2 conda-forge
libnetcdf 4.8.1 nompi_hb3fd0d9_101 conda-forge
libnghttp2 1.43.0 h812cca2_1 conda-forge
libogg 1.3.4 h7f98852_1 conda-forge
libopenblas 0.3.18 pthreads_h8fe5266_0 conda-forge
libopus 1.3.1 h7f98852_1 conda-forge
libpng 1.6.37 h21135ba_2 conda-forge
libpq 13.3 hd57d9b9_1 conda-forge
libssh2 1.10.0 ha56f1ee_2 conda-forge
libstdcxx-ng 11.2.0 he4da1e4_11 conda-forge
libtiff 4.3.0 h6f004c6_2 conda-forge
libuuid 2.32.1 h7f98852_1000 conda-forge
libvorbis 1.3.7 h9c3ff4c_0 conda-forge
libwebp-base 1.2.1 h7f98852_0 conda-forge
libxcb 1.13 h7f98852_1003 conda-forge
libxkbcommon 1.0.3 he3ba5ed_0 conda-forge
libxml2 2.9.12 h72842e0_0 conda-forge
libzip 1.8.0 h4de3113_1 conda-forge
libzlib 1.2.11 h36c2ea0_1013 conda-forge
locket 0.2.0 py_2 conda-forge
lz4-c 1.9.3 h9c3ff4c_1 conda-forge
markupsafe 2.0.1 py39h3811e60_0 conda-forge
matplotlib 3.4.3 py39hf3d152e_1 conda-forge
matplotlib-base 3.4.3 py39h2fa2bec_1 conda-forge
msgpack-python 1.0.2 py39h1a9c180_1 conda-forge
mysql-common 8.0.26 ha770c72_0 conda-forge
mysql-libs 8.0.26 hfa10184_0 conda-forge
ncurses 6.2 h58526e2_4 conda-forge
netcdf4 1.5.7 nompi_py39h64b754b_103 conda-forge
nspr 4.30 h9c3ff4c_0 conda-forge
nss 3.69 hb5efdd6_1 conda-forge
numpy 1.21.2 py39hdbf815f_0 conda-forge
olefile 0.46 pyh9f0ad1d_1 conda-forge
openjpeg 2.4.0 hb52868f_1 conda-forge
openssl 1.1.1l h7f98852_0 conda-forge
packaging 21.0 pyhd8ed1ab_0 conda-forge
pandas 1.3.4 py39hde0f152_0 conda-forge
partd 1.2.0 pyhd8ed1ab_0 conda-forge
pcre 8.45 h9c3ff4c_0 conda-forge
pillow 8.3.2 py39ha612740_0 conda-forge
pip 21.3 pyhd8ed1ab_0 conda-forge
psutil 5.8.0 py39h3811e60_1 conda-forge
pthread-stubs 0.4 h36c2ea0_1001 conda-forge
pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge
pyqt 5.12.3 py39hf3d152e_7 conda-forge
pyqt-impl 5.12.3 py39h0fcd23e_7 conda-forge
pyqt5-sip 4.19.18 py39he80948d_7 conda-forge
pyqtchart 5.12 py39h0fcd23e_7 conda-forge
pyqtwebengine 5.12.1 py39h0fcd23e_7 conda-forge
python 3.9.7 hb7a2778_3_cpython conda-forge
python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge
python_abi 3.9 2_cp39 conda-forge
pytz 2021.3 pyhd8ed1ab_0 conda-forge
pyyaml 6.0 py39h3811e60_0 conda-forge
qt 5.12.9 hda022c4_4 conda-forge
readline 8.1 h46c0cb4_0 conda-forge
scipy 1.7.1 py39hee8e79c_0 conda-forge
seawater 3.3.4 py_1 conda-forge
setuptools 58.2.0 py39hf3d152e_0 conda-forge
simplekml 1.3.6 pyhd8ed1ab_0 conda-forge
six 1.16.0 pyh6c4a22f_0 conda-forge
sortedcontainers 2.4.0 pyhd8ed1ab_0 conda-forge
sqlite 3.36.0 h9cd32fc_2 conda-forge
tblib 1.7.0 pyhd8ed1ab_0 conda-forge
tk 8.6.11 h27826a3_1 conda-forge
toolz 0.11.1 py_0 conda-forge
tornado 6.1 py39h3811e60_1 conda-forge
typing_extensions 3.10.0.2 pyha770c72_0 conda-forge
tzdata 2021d he74cb21_0 conda-forge
webcolors 1.11.1 pyhd8ed1ab_0 conda-forge
wheel 0.37.0 pyhd8ed1ab_1 conda-forge
xarray 0.19.0 pyhd8ed1ab_1 conda-forge
xorg-libxau 1.0.9 h7f98852_0 conda-forge
xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge
xz 5.2.5 h516909a_1 conda-forge
yaml 0.2.5 h516909a_0 conda-forge
zict 2.0.0 py_0 conda-forge
zlib 1.2.11 h36c2ea0_1013 conda-forge
zstd 1.5.0 ha95c52a_0 conda-forge
Specifying an xarray version could fix this. Or the offending line could be corrected. I'll submit a PR for the latter approach.
It could be worth adding test suites on other Python versions, at least 3.7 and 3.9. Testing on 3.10 is still a bit tricky atm but could be added later
Originally posted by @callumrollo in #8 (comment)
We're currently using strings to interact with the filesytem. Ideally we'd change to using pathlib as it is more robust
As discussed in PR #14, it would be good to control coarsening of data and perhaps other things? This could be done with optional additions to the deployment yaml. For discussion
In testing the new code for this pull request, I found an issue with processing the delayed mode SeaExplorer time series data for missions for which certain sensors (oxygen in this case) are severely oversampled. These missions end up with delayed mode data files that contain fewer actual (non-nan) data points than the realtime files. In other words, we are losing data during the processing.
Currently, the dropna
function is used to remove the oversampled oxygen data when converting the raw data. The dropna
function is working correctly, however note that the resulting data has many nan values in it, for both the CTD and optics. These nan values will often not co-occur.
I think the problem in the processing is caused by using the GPCTD_TEMPERATURE
as the default time base in seaexplorer.py
. This variable contains nan values that are not all co-located with the nan values in the oxygen and optical variables. It's desirable to use the CTD as the time base, but we may need to do some interpolation to avoid losing data when the other variables are mapped onto this base.
pyglider/pyglider/seaexplorer.py
Line 323 in 67c6acd
These lines error out if timebase is not specified in the yaml. I guess we could enforce timebase in the yaml, but this code should be fixed or removed.
@truedichotomy asked:
Would Zarr be a possible format option for pyglider?
Originally posted by @truedichotomy in #120 (comment)
Most of the other variables in our netcdf files have attributes, but time
does not, despite it having an entry in deployment.yml. Should be easy to add a code block to fill the attributes into the time variable.
We need to start making a set of documentation for this. I somewhat prefer markdown, so if we can use markdown in sphinx, that would probably be easiest.
This is a redo of an issue that originally only referred to the DFO east coast gliders which have an integrated SBE43 oxygen sensor instead of the Rinko AROD, but really it should be generalized to include a way of configuring pyglider to handle any glider payload configuration through the setup in the deployment.yml (the current version is hard-coded to assume GP-CTD+FLBBCD+AROD only).
If this should perhaps be split into more issues (e.g. to handle the general case vs the SBE43-specific one) just let me know.
A complication with the SBE43 is that the data stream from the sensor is raw frequency, and requires calculations using the calibration coefficients to turn it into a usable concentration. Below is an example data set from one of our recent missions, and I will also post details of the raw-to-calibrated data conversion in a follow-up comment.
We went through our profile .nc files and it seems that half the files captured the start/end of a profile - all the expected profiles are reflected, but every other .nc file generated has this structure with 25 sample points from the bottom and top of the profile, but is not a continuous dive. I have added the first few EBD/DBD files along with the resulting .nc files (one is a usable profile, one has the 25 samples) and the timeseries/gridfile. Any explanation for why this is happening and how to fix it? https://github.com/mackenziemeier86/Pyglider-Test-Data-and-Output-.git
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.