pyoceans / python-ctd Goto Github PK

View Code? Open in Web Editor NEW

54.0 13.0 35.0 250.99 MB

Tools to load hydrographic data into pandas DataFrame

Home Page: https://pyoceans.github.io/python-ctd/

License: BSD 3-Clause "New" or "Revised" License

Python 82.21% Jupyter Notebook 17.79%

python-ctd's Introduction

python-ctd

Tools to load hydrographic data as pandas DataFrame with some handy methods for data pre-processing and analysis

This module can load SeaBird CTD (CNV), Sippican XBT (EDF), and Falmouth CTD (ASCII) formats.

Quick intro

You can install the CTD package with

conda install ctd --channel conda-forge

pip install ctd

and then,

from pathlib import Path
import ctd

path = Path('tests', 'data', 'CTD')
fname = path.joinpath('g01l06s01.cnv.gz')

down, up = ctd.from_cnv(fname).split()
ax = down['t090C'].plot_cast()

We can do better:

temperature = down['t090C']

fig, ax = plt.subplots(figsize=(5.5, 6))
temperature.plot_cast(ax=ax)
temperature.remove_above_water()\
           .despike()\
           .lp_filter()\
           .press_check()\
           .interpolate(method='index',
                        limit_direction='both',
                        limit_area='inside')\
           .bindata(delta=1, method='interpolate')\
           .smooth(window_len=21, window='hanning') \
           .plot_cast(ax=ax)
ax.set_ylabel('Pressure (dbar)')
ax.set_xlabel('Temperature (°C)')

Try it out on mybinder

python-ctd's People

Contributors

Stargazers

Watchers

python-ctd's Issues

Save to CNV

Hi!
I'm currently looking for a solution to modify a Seabird CNV, remove some values (first values, last values, etc...) and go back to a CNV file. Is it possible with python-ctd?

I already checked the docs and notebooks, but i don't find nothing related to that...

Any help would be appreciated :D. Thank you all!

Reading BTL file with blank lines in header

I've run into an issue reading a BTL file using the read.from_btl() function. The error occurred because there are blank lines in the header of the BTL file:

22102006.txt

...
30: * 7 external voltages sampled
31: * stored voltage # 0 = external voltage 0
32: * stored voltage # 1 = external voltage 1
33: * stored voltage # 2 = external voltage 2
34: * stored voltage # 3 = external voltage 3
35: * stored voltage # 4 = external voltage 4
36: * stored voltage # 5 = external voltage 5
37: * stored voltage # 6 = external voltage 6
38: *
39: 
40: * S>
41: * dh
42: * cast 0  03/31 06:45:17  smpls 0 to 10459  nv = 7  avg = 1  stp = switch of
43: 
44: * S>
45: # interval = seconds: 0.125
46: # start_time = Mar 31 2022 06:45:17 [Instrument's time stamp, header]
...

On line 39 a blank line is encountered, which activates this section of the read._parser_sea_save() function (line 169)

        else:  # btl.
            # There is no *END* like in a .cnv file, skip two after header info.
            if not (line.startswith("*") | line.startswith("#")):
                # Fix commonly occurring problem when Sbeox.* exists in the file
                # the name is concatenated to previous parameter
                # example:
                #   CStarAt0Sbeox0Mm/Kg to CStarAt0 Sbeox0Mm/Kg (really two different params)
                line = re.sub(r"(\S)Sbeox", "\\1 Sbeox", line)

                names = line.split()
                skiprows = k + 2
                break

Of course it hits the break and the rest of the file isn't read.

Any advice?

I have several thoughts on how we could solve this, but I don't want to modify something that will break the code for others.

Just spit balling, but a simple solution is to just make sure the line isn't a blank line

current code

        else:  # btl.
            # There is no *END* like in a .cnv file, skip two after header info.
            if not (line.startswith("*") | line.startswith("#")):
                # Fix commonly occurring problem when Sbeox.* exists in the file
                # the name is concatenated to previous parameter
                # example:
                #   CStarAt0Sbeox0Mm/Kg to CStarAt0 Sbeox0Mm/Kg (really two different params)
                line = re.sub(r"(\S)Sbeox", "\\1 Sbeox", line)

                names = line.split()
                skiprows = k + 2

modified code

        else:  # btl.
            # There is no *END* like in a .cnv file, skip two after header info.
            if line != '' and not (line.startswith("*") | line.startswith("#")):
                # Fix commonly occurring problem when Sbeox.* exists in the file
                # the name is concatenated to previous parameter
                # example:
                #   CStarAt0Sbeox0Mm/Kg to CStarAt0 Sbeox0Mm/Kg (really two different params)
                line = re.sub(r"(\S)Sbeox", "\\1 Sbeox", line)

                names = line.split()
                skiprows = k + 2

Align conductivity and temperature

Add processing routines to align conductivity and temperature.

DataFrame' object has no attribute

Tried with the test data and notebook example, ended up in the following error during removing spike

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-15-9f765579d6d3> in <module>
      2 down = down[['t090C', 'c0S/m']]
      3 
----> 4 proc = down.remove_above_water()\
      5            .remove_up_to(idx=7)\
      6            .despike(n1=2, n2=20, block=100)\

~\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5134             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5135                 return self[name]
-> 5136             return object.__getattribute__(self, name)
   5137 
   5138     def __setattr__(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'remove_up_to'

Binder notebooks fail on import of ctd

Attempting to run notebooks from binder link in README gives

ModuleNotFoundError: No module named 'ctd'

Likely due to lack of ctd in .binder/environment.yml

Alternatively, one could add the parent directory to the path in the notebooks to use the local version of ctd

Checking 'pressure' upon reading cnv file

Hello,

Accessing a bunch of CNV files from last 4 years of observation has been in trouble using python-ctd module. My CNV files has no 'pressure' record and the python-ctd module shows the following errors,

>>> cast = ctd.from_cnv('/Users/reno/Downloads/ctd_sample/Gisang1_New2020.8.22.2005.cnv')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/reno/anaconda3/lib/python3.8/site-packages/ctd/read.py", line 387, in from_cnv
    raise ValueError(f"Expectd one pressure column, got {prkey}.")
ValueError: Expectd one pressure column, got [].

Checking the CNV file,

# name 0 = depSM: Depth [salt water, m], lat = 0.00
# name 1 = t090C: Temperature [ITS-90, deg C]
# name 2 = sal00: Salinity, Practical [PSU]
# name 3 = sbeox0ML/L: Oxygen, SBE 43 [ml/l]
# name 4 = ph: pH
# name 5 = altM: Altimeter [m]
# name 6 = pumps: Pump Status
# name 7 = flag:  0.000e+00

Is 'pressure' absolutely necessary in python-ctd module?

Parse hexfile

Add capabilities to load directly from the seabird HEX file.

from_cnv failed to read cnv files with both prDM and depSM variable

I get the following error message "ValueError: Expected one pressure/depth column, got ['prDM', 'depSM']."
I removed the depSM variable from the file and run the from_cnv method again with no error.

Link to ipython notebook broken

Hi Filipe,

One more thing: on the README page, the link to the ipython notebook is broken. Perhaps your blog posts on the matter would be good here.

http://ocefpaf.github.io/python4oceanographers/blog/2013/05/27/CTD2DataFrame/
http://ocefpaf.github.io/python4oceanographers/blog/2013/07/29/python-ctd/

Cheers,
Jonas

Drop the monkey-patching and use a proper sub-classing or composition

See https://pandas.pydata.org/pandas-docs/stable/internals.html#subclassing-pandas-data-structures

Reading from Stream

I'm currently working on a web based interface for reading BTL files. One of the challenges this creates is that the web browser passes a "Stream" rather than a file to the server. I'd like to see if I can create a from_btl_stream(file_stream) function that will basically be the same as the from_btl(fname) function after the _read_file(fname) function has been called.

The Django web server gets an InMemoryUploadedFile, which can easily get opened as an io.StringIO stream. So I just need a hook to by-pass _read_file()

Unify bottle and cnv read

We should write a _parse_seabird that would unify the common parts on those readers and reduce the amount of code.

include "prDE" in pressure keys inside from_cnv function

I am working with .cnv files that have "prDE" as the pressure heading so I had to add "prDE" to prkeys in the from_cnv function to be able to read my files and process them. Would be great if "prDE" was added to keys in case other users have a similar issue. Thank you.

Python 3 failures

Investigate the failures when reading certain data formats.

Add velocity checks and "loop edit"

When descending velocity is available we should add options to makes it based on a threshold.

Add layer calculations

barrier layer
mixed layer
etc...

prDM or prdM

Hi Filipe,

Great work! I had been looking for python software for Sea-Bird CTD data but did not find yours until I read your latest blog entry.

I tried it on some of my data but got a KeyError error because the pressure field is not labelled 'prDM', as your script assumes, but 'prdM' (lowercase 'd'). Don't ask me why. The instrument was a Sea-Bird SBE 19plus and the software version 2.2.6.

I guess there are multiple ways of fixing this:

try both spellings, i.e. 'prDM' and 'prdM' (see my quickfix)
make the pressure key an argument to the from_cnv-function
??

Cheers,
Jonas

Add SBE pressure names

Here is the list of pressure names for .cnv files:

prM Pressure [db] pr M db User-entry for moored pressure (instrument with no pressure sensor)
prE Pressure [psi] pr E psi User-entry for moored pressure (instrument with no pressure sensor)
prDM Pressure, Digiquartz [db] pr M db Digiquartz pressure sensor
pr50M Pressure, SBE 50 [db] pr50 M db 1st SBE 50 pressure sensor
pr50M1 Pressure, SBE 50, 2 [db] pr50 M2 db 2nd SBE 50 pressure sensor
prSM Pressure, Strain Gauge [db] pr M db strain-gauge pressure sensor
prdM Pressure, Strain Gauge [db] pr M db strain-gauge pressure sensor
pr Pressure [db] Specific python-ctd name, see #20

Marked items are recognized by cnv-parser.

Change the split() criteria

.split() separated the cast at the index argmax, this is fine the first time but if executed on an already split cast one will loose the last datapoint instead.

Use pocean-core...

... and output a valid profile DSG object. With this we can also save CF compliant netCDF files easily.

GUI for CTD cast editing

I usually not a big fan of the GUI, but it would be pretty handy for editing CTD casts.

One could imagine some buttons for applying the different filters (w/ different parameters) to see in realtime what impact it had on each profile before moving on to the next.

Using "Panel" with hvplot of course. :-)

Add a interpolate_cast method

Should be the normal pandas interpolation with the options:

.interpolate(
    method='index',
    limit_direction='both',
    limit_area='inside'
)

That would avoid extrapolation beyond the data domain and would obey the index spacing.

Re-factor tests to use pytest...

... and split the different parsers into their own test file.

ASCII BTL File

I notice at the top of the from_btl(fname) function that it says:

DataFrame constructor to open Seabird CTD BTL-ASCII format.

But the _read_file(fname) function is actually opening the file with a UTF-8 encoding.

My work around at the moment is to open the BTL file in NotePad++ and convert the ASCII files to UTF-8, but maybe we could pass the file encoding as from_btl(fname, encoding='utf-8'), then _read_file(fname, encoding='utf-8'). Doing so shouldn't change the existing behavior, but will give the option for the developer to set the file encoding if necessary.

Thoughts?

plot_section function in ctd

When I read the book《R AND PYTHON FOR OCEANOGRAPHERS》，I found the auther used python packkage CTD like this :from ctd import plot_section,but when I used in my code the error happpened,so I read the source code of CTD ,there is no function named plot_section,why?

update the inverted y-axis logic

Just use: https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.invert_yaxis.html#matplotlib.axes.Axes.invert_yaxis

Date Column

I've forked the project and have pytest running. I've added a test with my "alternate" BTL file and believe I've come up with a solution to issue #138, which I'll explain there. However, I've run into another issue. In the from_btl() function an assumption is made that column 2 will be the date column, In my case the date column is column 3.

I think I'll solve this by doing a header search in the 'names' array for the keyword 'Date'

Load CTD data from CSV

I see there is DataFrame.from_{cnv,edf,fsi}. Is there a way to use this software with CSV or data that I've pre-read and have as lists or numpy arrays already in Python? Or do I need to add a reader function to the library?

Create a "lighter" branch?

Hi @ocefpaf, today I cloned this repo again to re-run some analyses, and noticed the enormous size of the tests/data/CTD/raw directory, I think the cloned repo amounted to almost 800mb, would you consider making a "lighter" branch available? I made one of my own in my fork of the repo, and just by deleting the "raw" directory, the repo size decreased substantially. However, I did not submit a PR to the master branch because I do not know the importance of these files to the integrity of the code.

I think for future users, having a branch which does not contain heavy data files would be useful. What do you think?

Duplicate Name in Header

Was glad to find this project, it should save me a pile of time in reading BTL files.

I ran into an issue with my BTL file header (I don't make them, I just parse them). The Header looks like

    Bottle     Bottle        Date Sbeox0ML/L Sbeox1ML/L      Sal00      Sal11 Potemp068C Potemp168C  Sigma-é00  Sigma-é11       Scan      TimeS       PrDM      T068C      C0S/m      T168C      C1S/m       AltM    Par/log    Sbeox0V    Sbeox1V    FlSPuv0       FlSP         Ph TurbWETbb0       Spar   Latitude  Longitude
  Position       S/N         Time

When parsed by the _parse_seabird() method this produces names where the first two columns are 'Bottle' and 'Bottle', that causes issues in the df = pd.read_fwf() method causing it to throw a duplicate names error.

Is there a way you'd like to see this resolved that I could put in a PR for?

Maybe append the second row of the header to the first?

Persist the ._metadata attribute with operations

The ._metadata attribute is lost when splitting or sub-setting a Dataframe/Series we need to copy it into the new object.

Write a header parser test

This should check if the expected metadata is parsed and present in the header attribute and as properties.

Add inversion checks

After we compute density one can remove instabilities by checking for inversions and masking them. Most of the time those are due to CTD descending issue and/or not desired even when they are real features of the data (like when feeding the data to a numeric model.)

Problem while reading old Sea Bird MicroCat SBE 37SM

Hi @ocefpaf
we found this problem while working with this old ctd. @vinisalazar is already working on this and preparing a Pull Request.

We have not discovered the problem with error message, but we could fix the problem with our input file.

With old version of gsw we could install python-ctd under python 2.7 and the error is much more clear.

.virtualenvs/zurf/local/lib/python2.7/site-packages/ctd/ctd.pyc in from_cnv(fname, compression, below_water, lon, lat)
    200     f.close()
    201 
--> 202     cast.set_index('prDM', drop=True, inplace=True)
    203     cast.index.name = 'Pressure [dbar]'
    204 

.virtualenvs/zurf/local/lib/python2.7/site-packages/pandas/core/frame.pyc in set_index(self, keys, drop, append, inplace, verify_integrity)
   2915                 names.append(None)
   2916             else:
-> 2917                 level = frame[col]._values
   2918                 names.append(col)
   2919                 if drop:

.virtualenvs/zurf/local/lib/python2.7/site-packages/pandas/core/frame.pyc in __getitem__(self, key)
   2057             return self._getitem_multilevel(key)
   2058         else:
-> 2059             return self._getitem_column(key)
   2060 
   2061     def _getitem_column(self, key):

.virtualenvs/zurf/local/lib/python2.7/site-packages/pandas/core/frame.pyc in _getitem_column(self, key)
   2064         # get column
   2065         if self.columns.is_unique:
-> 2066             return self._get_item_cache(key)
   2067 
   2068         # duplicate columns & possible reduce dimensionality

.virtualenvs/zurf/local/lib/python2.7/site-packages/pandas/core/generic.pyc in _get_item_cache(self, item)
   1384         res = cache.get(item)
   1385         if res is None:
-> 1386             values = self._data.get(item)
   1387             res = self._box_item_values(item, values)
   1388             cache[item] = res

.virtualenvs/zurf/local/lib/python2.7/site-packages/pandas/core/internals.pyc in get(self, item, fastpath)
   3541 
   3542             if not isnull(item):
-> 3543                 loc = self.items.get_loc(item)
   3544             else:
   3545                 indexer = np.arange(len(self.items))[isnull(self.items)]

.virtualenvs/zurf/local/lib/python2.7/site-packages/pandas/indexes/base.pyc in get_loc(self, key, method, tolerance)
   2134                 return self._engine.get_loc(key)
   2135             except KeyError:
-> 2136                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2137 
   2138         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4433)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13742)()

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13696)()

KeyError: 'prDM'

The error while using python3 shows a UnicodeDecodeError and do not show the problem with pressure string.

UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-4-afa11aeac91a> in <module>()
----> 1 a = ctd.DataFrame.from_cnv('tccvini/sampling/25-01-17/stations_25-01-2017_RAW.cnv')

.virtualenvs/envpy3/lib/python3.5/site-packages/ctd/ctd.py in from_cnv(fname, compression, below_water, lon, lat)
    161     f = read_file(fname, compression=compression)
    162     header, config, names = [], [], []
--> 163     for k, line in enumerate(f.readlines()):
    164         line = line.strip()
    165         if '# name' in line:  # Get columns names.

.virtualenvs/envpy3/lib/python3.5/codecs.py in decode(self, input, final)
    319         # decode input (taking the buffer into account)
    320         data = self.buffer + input
--> 321         (result, consumed) = self._buffer_decode(data, self.errors, final)
    322         # keep undecoded input until the next call
    323         self.buffer = data[consumed:]

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe1 in position 71: invalid continuation byte

Rosette Summary file name

I've come across an issue with yet again how DFO does things. I'm processing BTL/ROS files from what we call a 'fix station' and the files have next to no metadata. The BTL file processes fine, but when I try to use ctd.read.from_summary() I get an error on line 451 name = _basename(fname)[1]

The files don't have a * FileName = xxxx line so there is no file name.

The ctd.read.from_btl() function solves this on lines 278-280 with:

    if "name" not in metadata:
        name = _basename(fname)[1]
        metadata["name"] = str(name)

I'd like to do the same thing for the ctd.read.from_cnv() function, called by the ctd.read.from_summary() function if it's not an issue.