Git Product home page Git Product logo

hydrofunctions's Introduction

HydroFunctions

Visit Hydrofunctions on PyPI

Unit Testing Status

Code Coverage Status

Documentation Status

MIT license

A suite of convenience functions for exploring water data in Python.

Features

  • Retrieves stream data from the USGS NWIS service
  • Select data using multiple site numbers, by state, county codes, or a boundary box
  • Preserves NWIS metadata, including NoData values
  • Helpful error messages to help you write valid requests
  • Extracts data into a Pandas dataframe, json, or dict
  • Plot beautiful graphs in Jupyter Notebooks
    • hydrographs (or time series of any data)
    • flow duration charts
    • cycle plots to illustrate annual or diurnal cycles
    • Interactive map for finding stream gauge ID numbers
  • Plotting and manipulation through Pandas dataframes
  • Retrieve USGS rating curves, peak discharges, field notes, and site files for gauging stations
  • Retrieve USGS daily, monthly, and annual statistics for gauging stations
  • Saves data in compact, easy-to-use parquet files instead of requesting the same dataset repeatedly
  • Massive Users Guide that makes Hydrology AND Data Science easy!

Still in active development! Let me know what features you want!

Read the Users Guide for more details.

Basic Usage

First, import hydrofunctions into your project. If you plan to work with Jupyter notebooks, then go ahead and enable automatic chart display:

In  [1]: import hydrofunctions as hf
         %matplotlib inline

Create an NWIS data object to hold our request and the data we will retrieve. We will request the instantaneous values ('iv') for site '01585200' for the past 55 days:

In  [2]: herring = hf.NWIS('01585200', 'iv', period='P55D')
Requested data from https://waterservices.usgs.gov/nwis/iv/?format=json%2C1.1&sites=01585200&period=P55D

You can check that the request went smoothly:

In  [3]: herring.ok
Out [3]: True

Find out what data we received:

In  [4]: herring
Out [4]: USGS:01585200: WEST BRANCH HERRING RUN AT IDLEWYLDE, MD
             00060: <5 * Minutes>  Discharge, cubic feet per second
             00065: <5 * Minutes>  Gage height, feet
         Start: 2019-05-25 01:05:00+00:00
         End:   2019-07-19 19:05:00+00:00

This tells us the name of our site and gives a list of the parameters that we have. For each parameter it lists how frequently the data were collected, and it shows the common name of the parameter and its units.

Create a dataframe using only our discharge data, and list the first five items:

In  [5]: herring.df('discharge').head()
Out [5]:

--a table with our data appears--

datetimeUTC

USGS:01585200:00060:00000

2019-05-25 01:05:00+00:00

1.57

2019-05-25 01:10:00+00:00

1.57

2019-05-25 01:15:00+00:00

1.51

2019-05-25 01:20:00+00:00

1.57

2019-05-25 01:25:00+00:00

1.57

If we're using Jupyter Lab, we can plot a graph of the data using built-in methods from Pandas and mathplotlib:

In  [6]: herring.df('q').plot()
Out [6]:

--a stream hydrograph appears--

a stream hydrograph for Herring Run

Learn more:

Easy Installation

The easiest way to install Hydrofunctions is by typing this from your command line:

$ pip install hydrofunctions

Hydrofunctions depends upon Pandas and numerous other scientific packages for Python. Anaconda is an easy, safe, open-source method for downloading everything and avoiding conflicts with other versions of Python that might be running on your computer.

Visit the Installation Page in the Users Guide to learn how to install Anaconda, or if you have problems using the Easy Installation method above.

hydrofunctions's People

Contributors

elmerehbi avatar jdhughes-usgs avatar jmccreight avatar ktarbet avatar lapidesd avatar mopickel avatar mroberge avatar nkorinek avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

hydrofunctions's Issues

'discharge' does not filter to only discharge

  • HydroFunctions version: 0.2.1
  • Python version: 3.7.10
  • Operating System: Ubuntu 18.04.5 LTS

Description

Get DataFrame of discharge data from a gage. Many gages without discharge return a valid DataFrame. They should raise a warning/exception if one is trying to get discharge from a gage without that type of record.

What I Did

import hydrofunctions as hf
start, end = '1984-01-01', '2020-12-31'
sid = '05015500'
nwis = hf.NWIS(sid, 'dv', start_date=start, end_date=end)
df = nwis.df('discharge')
print(list(df.columns))

['USGS:05015500:62614:00003', 'USGS:05015500:62614:32400']

The columns are not discharge, but 62614 data (reservoir sruface elevation). This happens with many non-discharge parameters at many gages.

I needed to add a test for discharge data:

if not np.any(['00060' in x for x in list(df.columns)]):
    print('no q data at {}'.format(sid))

add a save function to NWIS

Create a method for NWIS that allows user to save the dataframe as a local file. The current internal data structure could be saved as a parquet file, which does a great job compressing the data and working quickly.

Additionally, you could have the NWIS.get_data() method check for a file with the correct name before submitting a request to the NWIS. This would act as a cache, so that Jupyter notebooks that get re-run multiple times don't keep submitting the same request over and over.

set parameterCd default to 'None'

If you specify a parameterCd in your request, the NWIS will return data only for that parameter. Instead, make the default be parameterCd = None, so that the default request will return all of the parameters collected at the site.

If I set parameterCd=None, will the request add parameterCd to the url, or does it drop parameters that are set to None?

[* edit mcr 2/11/19] ==> This works. If you don't specify a parameterCd now, NWIS will return every parameter.

discharge monthly statistics

Hi,
I am wondering if the monthly statistics data can also be downloaded through the function? Thanks!

Krystal

Need a better way to set freq for requests for recent data

  • HydroFunctions version: 0.1.8dev
  • Python version: 3.7

Description

If you make a request without specifying a period or a date, you will get the most recent data. Sometimes, the most recent data is very old. Sometimes, one parameter is very old, and another is very recent.

Hydrofunctions has no way of figuring out the frequency of a table that only has one row, so it assumes that the frequency is 15 minutes.

It would be better to assume the frequency is 1 day for 'dv', and 15 minutes for 'iv'.

Better still, it shouldn't assign a frequency to these types of request. If it does, and then attempts to interpolate between two different dates, you can get some crazy results.

What I Did

problem = hf.NWIS('01541200')
problem.df().shape
>>>(2017441, 4)

This crazy site collected temperature for the last time in 1961, and it also has stream discharge for right now. So hydrofunctions set the frequency to 15 minutes and created a dataframe with a row for every 15 minutes from 1961 to 2019.

Sigh.

NWIS.df() should allow you to list more than one parameter or site

  • HydroFunctions version: 0.1.8dev
  • Python version: 3.7

Description

NWIS.df('discharge', 'stage')
should return both a discharge and a stage column. Instead, it only returns stage.

NWIS.df('siteA', 'siteB)
should return columns for site A and B, but it only returns site B.

NWIS.df('data', 'flags')
should return both data & the qualifier flags, but it doesn't. It only returns the flags.

Potential Solution

the sites, parameters, and metadata should each use an "OR" to process each new argument, so that the new set of columns returns everything that we are currently searching for, plus the latest argument.

I feel like I am not explaining this very well.

I'm going to add a test for the behavior that I want, and then try out my idea for how to solve this.

NWIS & get_nwis should print url before making request

Description

If you make an unusually large request for data, or if the NWIS takes a long time to respond for some other reason, the user will not get a message until there is a response. The url should be printed immediately, and say, "Requesting data from ..." instead of "Requested data from ..."

Suggested Solution

in get_nwis, swap the order of the print statement and the call to Requests.

Make pyarrow an optional dependency

Description

PyArrow is a relatively exotic dependency that doesn't a pure python wheel. As a result, it often is the first dependency to cause import troubles with cloud-based kernals () or in browser-kernals (for jupyter-lite). Pandas seems to handle Pyarrow as an optional dependency too.

Is it possible to make PyArrow an optional dependency that won't prevent hydrofunctions from installing?

Once I Find a Solution

Pandas has this nice system that warns when an optional dependency isn't available:

def import_lzma():
    """
    Importing the `lzma` module.
    Warns
    -----
    When the `lzma` module is not available.
    """
    try:
        import lzma

        return lzma
    except ImportError:
        msg = (
            "Could not import the lzma module. Your installed Python is incomplete. "
            "Attempting to use lzma compression will result in a RuntimeError."
        )
        warnings.warn(msg)

Create a check for if an object is a JSON or NWIS Response object in extract_nwis_df

Not an issue, just a recommendation! Recently, hydrofunctions extract_nwis_df changed to accept a JSON of a response instead of the raw response object. The line to turn the response to a JSON is still in the code, but commented out. It would possibly easier to use the function if it took either a JSON or a raw response. A simple type check at the top should make that doable. Something like:

if type(nwis_dict) is not dict:
    nwis_dict = nwis_dict.json()

Just a thought!

hf.map() is not a function

hf.map() is on the front page MD file, but it is not one of the functions in the package.

Either:

  • change the image on the front page
  • change the function name to .map()

is matplotlib no longer on the list of requirements?

  • HydroFunctions version: 0.1.6

Description

%matplotlib inline resulted in a

ModuleNotFoundError: No module named 'matplotlib'

Proposed Solution

Add matplotlib to the list of requirements. Perhaps add seaborn and bokeh too.

Update dependencies

  • HydroFunctions version: 0.2.0

Description

There are two changes to setup.py that need to be made:

  • nbsphinx does not need to be listed in setup.py
  • pyarrow should no longer be fixed to 0.16.0

NWIS sometimes responds with 400 status_codes (Bad Request)

  • HydroFunctions version: 0.1.5 develop
  • Python version: 3.6

Description

Sometimes requests to NWIS caused an error to occur in the parsing of the response.json into dataframes, I think.

My guess is that the NWIS returned a response.json = [] or None (I'm guessing) which resulted in parsing errors. The one time that I managed to capture the response and examine it, I had received a status code of 400: a bad request.

The problem is that I'm not sure what caused the early errors, because I couldn't or didn't capture the response, and it seemed like the error happened intermittently.

I can now get the error every time by doing something like making the start date occur after the end date. Most of the other errors get caught by the typing module and functions.

What I Did

In response, I changed hf.get_nwis() to return the response if it gets a status code of 200, and to raise an error for every other status_code after saving the response to NWIS.response.

Please revisit this Issue if the errors happen again!

Pandas error: TypeError: Cannot compare type 'Timedelta' with type 'Minute'

  • HydroFunctions version: hydrofunctions==0.2.0
  • Python version: Python 3.6.5 :: Anaconda, Inc.
  • Operating System: Microsoft Windows 10 Pro

I just installed Hydrofunctions and tried the "Basic Usage" commands here: https://github.com/mroberge/hydrofunctions:

INPUT

import hydrofunctions as hf
%matplotlib inline
herring = hf.NWIS('01585200', 'iv', period='P55D')

OUTPUT

Requested data from https://waterservices.usgs.gov/nwis/iv/?format=json%2C1.1&sites=01585200&period=P55D

TypeError Traceback (most recent call last)
in ()
----> 1 herring = hf.NWIS('01585200', 'iv', period='P55D')

~\AppData\Roaming\Python\Python36\site-packages\hydrofunctions\station.py in init(self, site, service, start_date, end_date, stateCd, countyCd, bBox, parameterCd, period, file)
132 try:
133 self.json = self.response.json()
--> 134 self._dataframe, self.meta = hf.extract_nwis_df(self.json)
135 self.ok = self.response.ok
136 if file is not None:

~\AppData\Roaming\Python\Python36\site-packages\hydrofunctions\hydrofunctions.py in extract_nwis_df(nwis_dict, interpolate)
511 )
512 DF = DF[~DF.index.duplicated(keep="first")]
--> 513 if local_freq > to_offset("0min"):
514 local_clean_index = pd.date_range(
515 start=local_start, end=local_end, freq=local_freq, tz="UTC"

pandas_libs\tslibs\timedeltas.pyx in pandas._libs.tslibs.timedeltas._Timedelta.richcmp()

TypeError: Cannot compare type 'Timedelta' with type 'Minute'

I imagine this is a Pandas issue and I'll attempt to resolve, but I thought you may have run into this already.

requesting rdb data from nonexistent site returns uncaught error

  • HydroFunctions version: 0.1.8dev

Description

Requesting data from a non-existent site will return a status code of 200, so everything is fine, you would think. But instead of returning an RDB table, you get a webpage with an error.

This link is for a non-existent site, 01581000:
https://waterdata.usgs.gov/pa/nwis/measurements?site_no=01581000&agency_cd=USGS&format=rdb_expanded

What I Did

hf.field_meas('01581000')

Possible Solution

Create a system for requesting data from the USGS that also handles errors.
Check for non-200 errors
Check the text in 200 codes for a <title>USGS NwisWeb error message</title>

I don't like the NWIS object.

  • HydroFunctions version: 0.1.4 develop branch

Description

  • The NWIS object doesn't handle the same parameters as hf.get_nwis(). These need to be added.
  • The NWIS object should be updated on init(). Right now you need to call NWIS.get_data(), which just seems stupid. In the tests, you could mock out any function that gets called and check to see that they get called with the right parameters.
  • Why do we have to keep calling the NWIS.df() method every time we want a dataframe? Can't we just calculate it on init or at least .get_data() and store it in something like NWIS.df ?
    • A major problem with NWIS.df() is that for each of the following statements, herring is a different type of object, and it is not immediately clear what is going on:
      • herring = hf.NWIS('0158520', period='P10D').get_data().df() # herring is a dataframe
      • herring = hf.NWIS('0158520', period='P10D').get_data() # herring is an instance of NWIS
    • if you store the df as a property of NWIS, then it might get changed whenever you try to run a Pandas or Numpy function that returns a new dataframe.
    • it would be nice to have methods on NWIS that generate a hydrograph or a flow duration chart off of the stored dataframe.
  • The NWIS docstrings need improvement.

The whole point of hydroFUNCTIONS was to have a collection of simple functions that were useful. I don't want to make this class some sort of super-object that does all of your thinking for you. I just want a collection of methods that will remember all of the best-looking options for a flow duration chart, or allow you to download data without having to remember all of the details of how a service works.

REMINDER: keep it simple!

Create a new data structure

The current data format is to put everything into a single large dataframe, with each station getting two columns, one for discharge, one for data flags. Odd columns contain data from different stations, Even columns contain the corresponding data flags for that site.

The PROBLEM with adding all of these dataframes to a single large dataframe using pd.concat is that sites that collect less frequently will get padded with NANs for all of the time indecies that they don't have data for.

A second problem is that every other column will have data, and the the other columns will have flags. There is no simple way to select only the data columns except to take the odd numbered columns.

A POSSIBLE SOLUTION: create a data structure that is composed of stacked dataframes. Each data frame will correspond to a single site. The first column will correspond to discharge, the second to flags, and any others can be derived values like baseflow or other measured parameters. The dataframes will be stacked, and be part of an object that allows you to select by a range of dates, by sites, and by the type of column. In this respect, it might be similar to XArray, except that package requires their n-dimensional structures to all be the same datatype.

hydrofunctions.py extract_nwis_df(response_obj):
is where the USGS json gets processed. Correcting this would be a relatively simple fix: you would simply duplicate this function and have it collect all of the dataframes into an array instead of doing a pd.concat() at the end with each new dataframe.

Add additional USGS webservices

The USGS provides several additional services, including:

  • Flood stage: returns a dictionary of stage heights for action level, flooding, moderate flooding, and severe flooding for about 5,000 gauge sites.
  • Real-time streamflow: current discharge, expressed in cfs and as a % of mean & median.
  • Current flood and highflow: similar to real-time, but returns only flooding sites
  • Average streamflow for 7, 14, and 28 day periods: similar to real-time, but returns either 7, or 14, or 28-day average discharge
  • Flow change: change expressed as a % change.

These services are described here: https://waterwatch.usgs.gov/webservices/
The most useful service seems to be flood stage. This could be used to annotate flow duration charts or other figures.

Add functionality for multiple sensor / parameter code combinations

  • HydroFunctions version: 0.2.0
  • Python version: 3.8
  • Operating System: Linux Mint 20

Description

When making a general data request using the 'NWIS' function, I could not differentiate between sensor/parameter code combinations and ultimately unable to retrieve a full dataset for a location. More specifically, I am looking at a site with multiple instances of a parameter code and the NWIS request cannot specify between these. For example, if you refer to USGS site 444306122144600 the measurements of a parameter (say Turbidity / 63680) are taken from different sensors. When querying using the parameter code '63680' there is no way to get either or both instances.

What I Did

Here is a basic query using the above mentioned specifications:

data = hf.NWIS(site = '444306122144600', service = 'iv', parameterCd = '63680', start_date = '2020-03-01', end_date = '2020-10-15') 

Here is a link to the same query via the web interface:

https://nwis.waterdata.usgs.gov/nwis/uv?cb_32295=on&cb_32316=on&cb_32319=on&cb_63680=on&format=gif_stats&site_no=444306122144600&period=&begin_date=2020-03-01&end_date=2020-07-20

Note the multiple instances of the WQ parameters.

Improve documentation

Here is a list of things that could be improved in the documentation:

  • Use Napoleon to improve the readability of Google-style docstrings and code pages.
  • Add the version number somewhere that shows up on every page.
  • Collect all of the todo's in the code into a page. sphinx.ext.todo
  • Convert the notebook pages into pages in the manual; add to the table of contents.
  • Create notebooks that demonstrate different applications or typical tasks.

Some NWIS sites have non-standard parameter codes

Description

Apparently the NWIS Parameter Codes can be supplemented with additional numbers in a way that breaks my current selection method for NWIS.df().

For example, USGS site #12010000: NASELLE RIVER NEAR NASELLE, WA records stage in two different ways, and then calculates discharge for each of these methods, resulting in two stage numbers and two discharge numbers. To keep these parameters separate, they have the normal parameter code 00060 along with a dash and a six-digit number that doesn't show up in the parameter code listing: https://help.waterdata.usgs.gov/code/parameter_cd_nm_query?parm_nm_cd=%25discharge%25&fmt=html

This completely breaks hf.NWIS.df('00060') !

Example:

>>> site_w_long_parmeter_name = hf.NWIS('12010000')
Requested data from https://waterservices.usgs.gov/nwis/dv/?format=json%2C1.1&sites=12010000

>>> site_w_long_parmeter_name

USGS:12010000: NASELLE RIVER NEAR NASELLE, WA
    00010: <0 * Minutes>  Temperature, water, degrees Celsius 
    00060-148364: <0 * Minutes>  Discharge, cubic feet per second STILLING WELL
    00060-148368: <0 * Minutes>  Discharge, cubic feet per second 
    00065-148365: <0 * Minutes>  Gage height, feet STILLING WELL
    00065-148369: <0 * Minutes>  Gage height, feet 
Start: 1973-09-29 00:00:00+00:00
End:   2021-07-01 00:00:00+00:00

Other strange examples:

USGS:12042800: BOGACHIEL RIVER NEAR FORKS, WA
00060-148506: <0 * Minutes> Discharge, cubic feet per second [(2)]
00060-243490: <0 * Minutes> Discharge, cubic feet per second

Possible Solution:

  1. I'm planning on parsing the JSON only when .df() is run... maybe I can deal with this then.
  2. I could split the column names at the : and search each field separately. This would improve the differentiation between the different types of numbers for station ID, parameterCd, and statistic.

Request site data from NWIS for stations

Description

It would be nice to be able to request site information for USGS stream gauges.

It is possible to get:

  • Latitude and Longitude
  • Drainage area

from waterdata.usgs.gov. The StreamStats station data site pulls drainage area from this source.

What I Did

https://waterdata.usgs.gov/nwis/inventory?search_site_no=01541200&format=sitefile_output&sitefile_output_format=rdb

returns:

#
#
# US Geological Survey
# retrieved: 2020-04-17 14:38:55 EDT
# URL: https://nwis.waterdata.usgs.gov/nwis/inventory
#
# The Site File stores location and general information about groundwater,
# surface water, and meteorological sites
# for sites in USA.
#
# The following selected fields are included in this output:
#
#  agency_cd       -- Agency
#  site_no         -- Site identification number
#  station_nm      -- Site name
#  state_cd        -- State code
#  county_cd       -- County code
#  huc_cd          -- Hydrologic unit code
#  lat_va          -- DMS latitude
#  long_va         -- DMS longitude
#  coord_acy_cd    -- Latitude-longitude accuracy
#  coord_datum_cd  -- Latitude-longitude datum
#  alt_va          -- Altitude of Gage/land surface
#  alt_acy_va      -- Altitude accuracy
#  alt_datum_cd    -- Altitude datum
#  drain_area_va   -- Drainage area
#  contrib_drain_area_va -- Contributing drainage area
#
#
# query started 2020-04-17 14:38:55 EDT
#
# there are 1 sites matching the search criteria.
#
#
agency_cd	site_no	station_nm	state_cd	county_cd	huc_cd	lat_va	long_va	coord_acy_cd	coord_datum_cd	alt_va	alt_acy_va	alt_datum_cd	drain_area_va	contrib_drain_area_va
5s	15s	50s	2s	3s	16s	11s	12s	1s	10s	8s	3s	10s	8s	8s
USGS	01541200	WB Susquehanna River near Curwensville, PA	42	033	02050201	405741	0783110	S	NAD27	 1124.66	.01	NGVD29	367	

Travis-CI failure: python 3.5 segmentation fault

  • HydroFunctions version: 0.1.6 all branches
  • Python version: 3.5
  • Operating System: Linux (Travis-ci build)
  • Travis Job #184

Description

Travis-ci suddenly exits during the first test with a segmentation fault. This does not affect Python 3.4 or 3.6.

What I Did

python setup.py test

...
running build_ext
test_charts_cycleplot_exists (tests.test_charts.TestCyclePlot) ... /home/travis/.travis/job_stages: line 57:  1794 Segmentation fault      (core dumped) python setup.py test

Add ability to modify the legend and titles

  • HydroFunctions version: 0.1.8dev
  • Python version: 3.7

Description

Hydrofunction's built-in charts should have the ability to set a title or create a legend.

Potential Solutions:

  1. add new parameters legend and title

    • legend default could be False; otherwise you could provide a value for legend.
    • This approach might require a legend_loc parameter too.
    • title could be set to False or text.
  2. Use **kwargs and pass these on

  3. What does Pandas do??

Problem with extract_nwis_df

  • HydroFunctions version: 0.1.7
  • Python version: 3.6.6 64-bit
  • Operating System: Windows 10

Description

I'm trying to import NWIS data, just as in the sample code, and an error is returned from the extract_nwis_df function. Any ideas what the problem might be?

What I Did

import hydrofunctions as hf

site = '01646000'
start_date = '2017-10-01'
end_date = '2018-10-01'

response = hf.get_nwis(site,'iv', start_date,end_date,parameterCd='00060')
response.json()
hf.extract_nwis_df(response)
=========================================
Traceback (most recent call last):

  File "<ipython-input-1-01f3848c24e4>", line 15, in <module>
    hf.extract_nwis_df(response)

  File "C:\Users\dgbli\Anaconda3\lib\site-packages\hydrofunctions\hydrofunctions.py", line 277, in extract_nwis_df
    ts = nwis_dict['value']['timeSeries']

TypeError: 'Response' object is not subscriptable

extract_nwis_df() function returns a tuple with dataframe and dictionary

  • HydroFunctions version: 0.2.1
  • Python version: 3.8.8
  • Operating System: windows

Description

I wrote some code for hydrofunctions a year ago that worked in getting some processed dataframes with hydrofunctions. I ran it again today and found that the code no longer works. The problem lies in the extract_nwis_df function. It used to return just a dataframe but now it returns a tuple with a df and a dictionary. In one instance it also returned 4 more columns than I called. This might of been a separate issue. I found a work around by using this subsettting the tuple with [0]. Is there a more elegant way to fix this workflow?

What I Did

def create_df(site, start, end):
    # YOUR CODE HERE
    """Creates a Panadas DataFrame with data
    downloaded from NWIS using hydrofucntions.
    Renames columns containing discharge and
    qualification codes informaiton to "discharge" and
    "flag", respectively. Creates a "siteName", "latitude",
    and "longitude" columns. Outputs the new dataframe.

    Parameters
    ----------
    site : str
    The stream gauge site number.

    start : str
    The start date as (YYYY-MM-DD) of time period of interest.

    end : str
    The end date as (YYYY-MM-DD) of time period of interest.

    Returns
    -------
    discharge : Pandas DataFrame
    Returns a dataframe containing date discharge, qualification
    codes, site name, and latitdue and longitude data

    """

    # Response from site
    parameterCd = ["00065", "00060"]
    resp = hf.get_nwis(site, "dv", start, end).json()


    # Extract values to a pandas dataframe
    discharge = hf.extract_nwis_df(resp)
    
    # Rename columns
    discharge.columns = ["discharge", "flag", 'stage', 'flag']

    # Create sitename column
    site_name = hf.get_nwis_property(resp, key="siteName")[0]

    discharge['siteName'] = site_name

    # Create lat and long column
    geoloc = hf.get_nwis_property(resp, key="geoLocation")[0]["geogLocation"]
    lat = geoloc["latitude"]
    long = geoloc["longitude"]
    discharge["latitude"] = lat
    discharge["longitude"] = long
    return discharge

site = ["06479215","06479438","06479500","06479525","06479770","06480000"]
start = "2018-01-01"
end = "2020-12-01"
temp_list = []

for i in site:
    df = create_df(i, start, end)
    temp_list.append(df)
    
stream_gage_df = pd.concat(temp_list)
stream_gage_df

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
C:\Users\NICK~1.LAM\AppData\Local\Temp/ipykernel_20488/4087259266.py in <module>
      5 
      6 for i in site:
----> 7     df = create_df(i, start, end)
      8     temp_list.append(df)
      9 

C:\Users\NICK~1.LAM\AppData\Local\Temp/ipykernel_20488/3569674067.py in create_df(site, start, end)
     36 
     37     # Rename columns
---> 38     discharge.columns = ["discharge", "flag", 'stage', 'flag']
     39 
     40     # Create sitename column

AttributeError: 'tuple' object has no attribute 'columns'

.plot() no longer works on the dataframes.

  • HydroFunctions version: 0.1.6

Description

The new dataframe format has the data qualifiers adjacent to the data, so that different columns have different types of data placed together. Now when you try to plot using: my_dataframe.plot() it generates an error.

What I Did

import hydrofunctions as hf
%matplotlib inline
sites = ['01581830', '01589330']
data = hf.NWIS(sites, start_date='2002-01-01', end_date='2005-01-01').get_data()
data.ok

data.df().plot()
-error-

Pandas has no way of converting what is in the column into something it can plot.

Proposed Solution:

Create different functions for creating different kinds of dataframe layout. There is more than one way to organize the data into a dataframe. Create a method for outputting a dataframe that can be plotted automatically.

Perhaps put the qualifiers into an identical dataframe that matches cell for cell with the data.

don't run online resources tests in Travis

  • HydroFunctions version: 0.1.4
  • Python version: all
  • Operating System:

Description

When I switched to using pytest in TravisCI, it started finding tests/test_online_resources.py, which
isn't meant to be run everytime you run the tests.

Solution

change .travis.yml to use unittest for the tests.

Create a 'verbose' mode that can be turned off.

  • HydroFunctions version: 0.2.0

Description

This request came in from Jibin Joseph:

Hello Professor Roberge:

I am creating a tool in Jupyter Notebook Appmode and I wanted to avoid the extra line from hydrofunctions ("Requested data from https://waterservices.usgs.gov/nwis/....." ). Is there a way to avoid printing in the output? Is it okay to do that?

-Jibin Joseph

Proposed Solution

The simplest solution for now is to create a verbose=True keyword and a line like:

if verbose:
    print(f"Requesting data from {url}...")

This will print under most cases, but if someone wants to cancel printing for some other application, it is still possible.

To make this work, it will be necessary to modify hydrofunctions.py get_nwis() and also station.py NWIS.

Potential Problems

  1. Testing
  2. Extract_nwis_df also has a print statement
    • There is a print when two indices of different frequencies are being joined.
  3. There are various warning messages that are printed.
    • These could be converted to warnings, perhaps.

Change name & behavior of RDB object to make less confusing

  • HydroFunctions version: 0.1.8dev

Description

When you run the peaks(), rating_curve(), field_meas(), or stats() functions, you get a RDB object back. This can be confusing.

Problem

type(peaks('01542500'))

will return the type RDB which doesn't immediately make you realize that you are dealing with something from hydrofunctions. Maybe the name should be changed to hydroRDB or something like that.

Problem 2

when you get the return value for peaks() printed on your screen, you get the header and a dataframe. I did this so that users would see the warnings in the header. But this means that you can't immediately use the output.

Could I simply set RDB.repr() to return the dataframe, and then let advanced users request to see the header manually?

Would this allow you to operate on the RDB the same as you would a dataframe?
Could you go:

my_rdb = peaks('01451200')
my_rdb.loc['2006']

Issue with converting to dataframe

  • HydroFunctions version: 0.1.7
  • Python version: 3.7.2
  • Operating System: Manjaro

Description

I tried to get the streamflow data for PA but when I tried to make a dataframe I got the following error:

Shape of passed values is (368, 546), indices imply (366, 546)

It works fine for other states though only PA.

What I Did

import hydrofunctions as hf
start = "2017-01-01"
end = "2017-12-31"
request = hf.NWIS(None, "dv", start, end, stateCd='PA').get_data()
request.df()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-f4a9c8304bcf> in <module>
      1 request = hf.NWIS(None, "dv", start, end, stateCd='PA').get_data()
----> 2 request.df()

~/anaconda/envs/hydro/lib/python3.7/site-packages/hydrofunctions/station.py in <lambda>()
    165         self.json = lambda: self.response.json()
    166         # set self.df without calling it.
--> 167         self.df = lambda: hf.extract_nwis_df(self.json())
    168 
    169         # Another option might be to do this:

~/anaconda/envs/hydro/lib/python3.7/site-packages/hydrofunctions/hydrofunctions.py in extract_nwis_df(nwis_dict)
    362         # except that package requires their n-dimensional structures to all be
    363         # the same datatype.
--> 364         DF = pd.concat([DF, dfa], axis=1)
    365 
    366     # replace missing values in the dataframe

~/.local/lib/python3.7/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, sort, copy)
    227                        verify_integrity=verify_integrity,
    228                        copy=copy, sort=sort)
--> 229     return op.get_result()
    230 
    231 

~/.local/lib/python3.7/site-packages/pandas/core/reshape/concat.py in get_result(self)
    424             new_data = concatenate_block_managers(
    425                 mgrs_indexers, self.new_axes, concat_axis=self.axis,
--> 426                 copy=self.copy)
    427             if not self.copy:
    428                 new_data._consolidate_inplace()

~/.local/lib/python3.7/site-packages/pandas/core/internals/managers.py in concatenate_block_managers(mgrs_indexers, axes, concat_axis, copy)
   2063         blocks.append(b)
   2064 
-> 2065     return BlockManager(blocks, axes)

~/.local/lib/python3.7/site-packages/pandas/core/internals/managers.py in __init__(self, blocks, axes, do_integrity_check)
    112 
    113         if do_integrity_check:
--> 114             self._verify_integrity()
    115 
    116         self._consolidate_check()

~/.local/lib/python3.7/site-packages/pandas/core/internals/managers.py in _verify_integrity(self)
    309         for block in self.blocks:
    310             if block._verify_integrity and block.shape[1:] != mgr_shape[1:]:
--> 311                 construction_error(tot_items, block.shape[1:], self.axes)
    312         if len(self.items) != tot_items:
    313             raise AssertionError('Number of manager items must equal union of '

~/.local/lib/python3.7/site-packages/pandas/core/internals/managers.py in construction_error(tot_items, block_shape, axes, e)
   1689         raise ValueError("Empty data passed with indices specified.")
   1690     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 1691         passed, implied))
   1692 
   1693 

ValueError: Shape of passed values is (368, 546), indices imply (366, 546)

calc_freq & requests for just the most recent data

  • HydroFunctions version: 0.1.8dev
  • Python version: 3.7

Calc_freq() should take the 'iv' and 'dv' parameters into account

If you don't specify a start date or a period, then you will request only the most recent reading, like so:

hf.NWIS('015412000', 'dv') 

This only requests the most recent daily mean value. Since there is only one row of data, and only one time-stamp, calc_freq will have trouble figuring out the frequency, and will assign it a freq of 15 minutes.

Instead, the method of last resort for calc_freq should check to see if the data are daily mean or instantaneous. You can get this information in two ways:

  • the requesting parameters will either be 'dv' or 'iv'.
  • the 'stat' number of the column name are the last sequence of five numbers. Daily mean will be ':00003', while instantaneous will be ':00000'.

It might also be possible to check the meta data at the start of the json file.

mysterious hydrofunctions_testing.log

  • HydroFunctions version: 0.2.1
  • Python version: 3.8.10
  • Operating System: Linux Mint 20.04

Description

A hydrofunctions_testing.log is created when importing hydrofunctions into a jupyter cell or python script. what is the source of this log? Can it be suppressed?

What I Did

In jupyter cell:

import hydrofunctioons

From terminal:

(hydro) mlr@mint-box:~/git/hydro-tools$ ls | grep hydro*.log
hydrofunctions_testing.log

redundant HydroUserWarning

  • HydroFunctions version: 0.1.8dev
  • Python version: 3.7.3
  • Operating System: Ubuntu

Description

When I use bBox to find a station I get a warning about frequency being set to zero. I think this warning is not relevant for this type of inquiry.

On another note, I noticed that if I use a number with more than 7 decimal figures for the lat and long, say -121.74788222 instead of -121.74788 in the following example, it throws an error (JSONDecodeError: Expecting value: line 1 column 1 (char 0)). It is a limitation with the query that is sent to USGS service. So I think it would be a good idea to round the lat and long up to 7 decimals before sending the inquiry.

A small note on the error, there is a space missing between frequency and for (frequencyfor).

What I Did

import hydrofunctions as hf

hf.NWIS(bBox=[-121.74788, 47.38594, -121.54788, 47.58594])

....hydrofunctions/hydrofunctions/hydrofunctions.py:100: HydroUserWarning: It is not possible to determine the frequencyfor one of the datasets in this request.This dataset will be set to a frequency of 0 minutes

What is the relationship between hydrofunctions, Ulmo, dataretrieval, HyRiver, and others?

In response to this issue DOI-USGS/dataretrieval-python#8 and comments from @emiliom @jkreft-usgs @DanCodigaMWRA

There are several open source software projects that allow you to request, parse, and analyze hydrology data from the USGS NWIS. Why are there overlapping projects? My guess is that it is a combination of scientists writing code to meet their very specific research needs, people creating projects without searching for what already exists, and because sometimes people feel uncomfortable trying to work with people they don't know yet. I'd like to work with the maintainers of other projects to eliminate some of the redundancy and improve the cooperation.

My name is Martin Roberge, and I'm the author of hydrofunctions. I do research on stream hydrology and I'm an educator. I made hydrofunctions to meet my specific needs: I download lots of stream gauge data from the USGS which I then analyze inside Pandas dataframes in a Jupyter notebook. Since most of my students do not come from programming backgrounds, I have spent most of my time trying to make hydrofunctions easy for beginners to use.

My main goals for hydrofunctions are:

  • Easy to Use
    • Comprehensive User's manual with lots of examples and applications
    • Extensive docstrings so you can use help() while coding. The docstrings have examples too.
    • Helpful error messages for common mistakes
  • Preserve the information from the NWIS
    • Maintain access to the original WaterML / json
    • Keep data quality flags
    • Access the sensor & station metadata inside the original WaterML
    • Keep track of sensor sampling frequency, in case you measure some things once an hour and other things once a day.
    • Print a quick data summary so you know what you have
  • Create custom dataframes from your request
    • Include one site or many
    • Output data for one sensor or all of the sensors.
    • Output only the data quality flags for QA/QC if you want
  • Code Quality:
    • I find it difficult to develop without lots of unit tests. If you don't have a test for every function and branch in the logic, how will you know if the latest change creates a new bug?
    • Write lots of tests
    • Use continuous integration to make sure everything works all the time.
    • Document everything

The problem is that I am just one person, and every hour I spend adding functionality to hydrofunctions is an hour I could have spent measuring how fast flood waves travel down a river, or whatever I'm up to that day. I would love to collaborate with someone else.

Other projects that work with NWIS data are:

  • PyGeoHydro package is part of the HyRiver project by Taher Chegini @cheginit. In a little more than a year he has created an enormous set of packages for data access, mapping, and analysis. HyRiver is mostly focused on working with mapping webservices, but the PyGeoHydro package allows you to request daily discharge from the NWIS and outputs it in a dataframe. He seems focused on watershed data: he makes it possible to calculate the landcover for your watershed, access the National Inventory of Dams, access meteorology data, make maps from the NHDplus data... he includes a lot of functionality.
  • Ulmo: this is the OG as the kids used to like to say. (first commit in 2011!) It has the ability to request data from at least fifteen different sources. ulmo.usgs.nwis.get_site_data() is the function that requests stream gauge data. Ulmo processes the original WaterML and returns a dictionary that needs further processing to use the data in a dataframe. It can be finicky when you are requesting stream gauge data, and I can't always figure out what is wrong with my requests. Emilio Mayorga @emiliom is the lead developer now.
  • dataretrieval was set up in 2018 as a Python alternative to the dataRetrieval R package. It is tailored specifically to the services provided by the NWIS and is maintained by USGS staff. You can access the different NWIS services easily by specifying the service in your request, and everything gets output to a dataframe. It is maintained by Timothy Hodson @thodson-usgs.
  • Pastas is for working with hydrological time series. It doesn't appear to have any functions that collect data; instead most of its functionality allows you to carefully control how you fill missing data, resample data to different frequencies, create artificial datasets, model how a dependent timeseries responds to an independent timeseries, and conduct various timeseries analyses. It is actively maintained by a large team of collaborators, and is led by Raoul Collenteur @raoulcollenteur
  • pywaterinfo is a package for downloading stream data from the Flanders Environment Agency. It is run by @stijnvanhoey
  • streamstats will retrieve the HUC8 watershed for a given point. It uses the USGS StreamStats API. This is from the very active earthlab group, who seem to do a lot of training and workshops for Earth Scientists. @mbjoseph @lwasser
  • WellApplication is a collection of tools for analyzing groundwater; it can collect data from NWIS. Maintained by Paul Inkenbrandt. @inkenbrandt
  • hydropy is no longer actively maintained.

Please let me know if anyone thinks that I have mischaracterized their project.
I would love to hear your opinion about how these different projects could collaborate or how we could 'stake out ground' so that we don't replicate functionality. Why re-invent the wheel?

Requests for only the most recent data should be parsed differently

  • HydroFunctions version: 0.1.8dev
  • Python version: 3.7

Description

If you don't specify a start date or a period, the NWIS will return the most recent data for your sites. This produces two problems:

  • you get a warning that HF can't figure out the period, so period gets set to <0 Minutes>
  • It produces a dataframe with lots of different rows, each with a different time. If you request a subset of the dataframe, you still get all of these rows.

What I Did

new = hf.NWIS(stateCd='MD')
new.df()

Potential Solutions

  • don't allow NWIS to make 'recent' requests; instead create a 'Search'/'Find_Sites' class or function that returns a dict
  • create a condition in the NWIS parameter parsing that identifies when a 'recent' request is being made, and treat it differently from your normal .df() parsing.
  • If you allow NWIS to make these requests, then at least have the NWIS.repr string list the most recent observation for each parameter.

annual statistics creates invalid URL

  • HydroFunctions version: 0.2.3

Description

When I added the 'statYearType' parameter to a request for annual statistics, it printed the wrong URL but still returned something.

What I Did

hf.stats('01542500', 'annual', parameterCd='00060', statYearType='water')

Retrieved annual statistics for site #01542500 from https://waterservices.usgs.gov/nwis/stat//

Although it prints the wrong URL, it still seems to return the correct data.

The hydrofunctions docstring example no longer works

  • HydroFunctions version: 0.2.3

Description

The docstring for hydrofunctions no longer works because the example stream gauge now collects different data.

help(hydrofunctions)

This gives us an example that uses the Harrisburg stream gauge: '01570500'.

>>> import hydrofunctions as hf
>>> site = '01570500'
>>> harrisburg = hf.NWIS(site, 'iv', period='P10D')

>>> harrisburg
    USGS:01570500: Susquehanna River at Harrisburg, PA
        00045: <30 * Minutes> Precipitation, total, inches
        00060: <30 * Minutes> Discharge, cubic feet per second
        00065: <30 * Minutes> Gage height, feet
    Start: 2019-04-06 00:30:00+00:00
    End:   2019-04-15 23:00:00+00:00

The example requests the last 10 days of IV data, returning precipitation, discharge, and stage, each with 30 minute sampling.

As of a few days ago (as far as I can tell), if you do this request again, you'll also get some data that gets collected every 15 minutes, leading to an 'upsampling' warning.

One way to avoid this is to either use a different gauge in the example, or simply request data from a range of dates that only have the 30 minute data, like back in 2019 when the example was first run.

Two Possible Solutions:

  1. Have the example request data from a time period that returns a known set of data:
harrisburg = hf.NWIS(site, 'iv', start_date='2019-04-06', end_date='2019-04-15')
  1. Request data from a different site that is less likely to cause problems in the future:
  • It should collect every 15 minutes so that this problem is less likely to repeat
  • It should be from a site that is unlikely to be 'upgraded'... Karthaus? 01542500
>>> site = '01542500'
>>> karthaus = hf.NWIS(site, 'iv', period='P10D')

Hydrofunctions should be completely pip installable

  • HydroFunctions version: 0.1.7
  • Python version: 3.4+
  • Operating System: Windows, Mac, Linux

Description

It should be possible to install Hydrofunctions only using pip. The .travis.yml basically does it this way, despite the difficulties of installing Numpy and Pandas. This process was sped up considerably by using the --only-binaries option. It might be a good idea to work this into the regular requirements.txt file as well as the test requirements and the requirements-dev.txt files.

What I Did

conda create -n myenv python=3.4
conda activate myenv
pip install --only-binary=numpy,scipy,pandas,Ipython numpy scipy pandas ipython
pip install hydrofunctions
python
import hydrofunctions as hf
dir(hf)  

works!

Obviously, if I used conda to create the environment and set it up with Python 3.4, then I could use it to install numpy! The POINT however, is that it is possible to install the whole complicated set of dependencies now without relying on Anaconda.

My goals are to:

  • see if I can somehow get the --only binaries options to fit inside of requirements.txt or something so that a simple pip install hydrofunctions will work from scratch
  • investigate how normal people install python and set up a virtual environment without conda
  • create an alternative documentation for beginners that installs python and then:
  • installs hydrofunctions and everything else.

The __repl__() keeps getting printed when you use NWIS methods.

  • HydroFunctions version: 0.1.8dev
  • Python version: 3.7

Description

I never really thought about this until now, but I've made it so that the repl gets printed every time you use a method.

I set it up so that the NWIS has a NWIS.__repl__() method that prints a nice listing of the variables in the DF every time you type in the object's name.

I also set it up so that you can chain the methods in NWIS by returning the NWIS object.

But now you will trigger the .__repl__() every time you .save() or whatever.

EXAMPLE:

>>> herring = hf.NWIS('01585200', 'iv', period='P2D')
>>> herring.save('delete_me.parq')

USGS:01585200: WEST BRANCH HERRING RUN AT IDLEWYLDE, MD
    00060: <5 * Minutes>  Discharge, cubic feet per second
    00065: <5 * Minutes>  Gage height, feet
Start: 2019-04-17 22:25:00+00:00
End:   2019-04-19 22:05:00+00:00

Possible Solution:

  • Maybe the repl should return NWIS('01585200', 'iv', period='P2D')
  • Maybe there could be a NWIS.info() method that returns the stuff I have in the repl now.
  • Alternative 2: never print anything
  • Alternative 3, don't allow method chaining.

What does Pandas do?

  • It has a special repl that returns html in Jupyter.
  • what happens in a command line situation?
  • what happens when you chain methods in pandas?

hf.peaks ParserError month

  • HydroFunctions version:
hf.__version__
Out[46]: '0.2.0'
  • Python version:
Python 3.7.10
  • Operating System:
    Windows

Description

Date parsing Error for Downloads peak data for usgs gage no. 06813500

What I Did

hf.peaks('06813500')
Retrieving annual peak discharges for site # 06813500  from  https://nwis.waterdata.usgs.gov/nwis/peak?site_no=06813500&agency_cd=USGS&format=rdb
Traceback (most recent call last):

  File "C:\Anaconda3\envs\py37\lib\site-packages\dateutil\parser\_parser.py", line 655, in parse
    ret = self._build_naive(res, default)

  File "C:\Anaconda3\envs\py37\lib\site-packages\dateutil\parser\_parser.py", line 1241, in _build_naive
    naive = default.replace(**repl)

ValueError: month must be in 1..12


The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  File "pandas\_libs\tslib.pyx", line 514, in pandas._libs.tslib.array_to_datetime

  File "pandas\_libs\tslibs\parsing.pyx", line 243, in pandas._libs.tslibs.parsing.parse_datetime_string

  File "C:\Anaconda3\envs\py37\lib\site-packages\dateutil\parser\_parser.py", line 1374, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)

  File "C:\Anaconda3\envs\py37\lib\site-packages\dateutil\parser\_parser.py", line 657, in parse
    six.raise_from(ParserError(e.args[0] + ": %s", timestr), e)

  File "<string>", line 3, in raise_from

ParserError: month must be in 1..12: 1881-00-00


During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "pandas\_libs\tslib.pyx", line 525, in pandas._libs.tslib.array_to_datetime

TypeError: invalid string coercion to datetime


During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "C:\Anaconda3\envs\py37\lib\site-packages\dateutil\parser\_parser.py", line 655, in parse
    ret = self._build_naive(res, default)

  File "C:\Anaconda3\envs\py37\lib\site-packages\dateutil\parser\_parser.py", line 1241, in _build_naive
    naive = default.replace(**repl)

ValueError: month must be in 1..12


The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  File "<ipython-input-45-74616d452e0e>", line 1, in <module>
    hf.peaks('06813500')

  File "C:\Anaconda3\envs\py37\lib\site-packages\hydrofunctions\usgs_rdb.py", line 384, in peaks
    outputDF.peak_dt = pd.to_datetime(outputDF.peak_dt)

  File "C:\Anaconda3\envs\py37\lib\site-packages\pandas\core\tools\datetimes.py", line 805, in to_datetime
    values = convert_listlike(arg._values, format)

  File "C:\Anaconda3\envs\py37\lib\site-packages\pandas\core\tools\datetimes.py", line 472, in _convert_listlike_datetimes
    allow_object=True,

  File "C:\Anaconda3\envs\py37\lib\site-packages\pandas\core\arrays\datetimes.py", line 2090, in objects_to_datetime64ns
    raise e

  File "C:\Anaconda3\envs\py37\lib\site-packages\pandas\core\arrays\datetimes.py", line 2081, in objects_to_datetime64ns
    require_iso8601=require_iso8601,

  File "pandas\_libs\tslib.pyx", line 364, in pandas._libs.tslib.array_to_datetime

  File "pandas\_libs\tslib.pyx", line 591, in pandas._libs.tslib.array_to_datetime

  File "pandas\_libs\tslib.pyx", line 726, in pandas._libs.tslib.array_to_datetime_object

  File "pandas\_libs\tslib.pyx", line 717, in pandas._libs.tslib.array_to_datetime_object

  File "pandas\_libs\tslibs\parsing.pyx", line 243, in pandas._libs.tslibs.parsing.parse_datetime_string

  File "C:\Anaconda3\envs\py37\lib\site-packages\dateutil\parser\_parser.py", line 1374, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)

  File "C:\Anaconda3\envs\py37\lib\site-packages\dateutil\parser\_parser.py", line 657, in parse
    six.raise_from(ParserError(e.args[0] + ": %s", timestr), e)

  File "<string>", line 3, in raise_from

ParserError: month must be in 1..12: 1881-00-00

SSL Certification error

  • HydroFunctions version: 0.2.0
  • Python version: 3.8
  • Operating System: Windows 10

Description

In one of my client environments, certain python libraries like siphon produce SSL certification errors. Siphon has a session manager that can either bypass this or add a certificate. Is there a similar kwarg or capability for hydrofunctions?

What I Did

Commands: current_test = hf.NWIS(['03352988'], 'iv', start_date=begin, end_date=today)

Traceback: requests.exceptions.SSLError: HTTPSConnectionPool(host='waterservices.usgs.gov', port=443): Max retries exceeded with url: /nwis/iv/?format=json%2C1.1&sites=03352988&startDT=2021-06-07&endDT=2021-07-08 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1131)')))

Thank you!

Update CI system

  • HydroFunctions version: hydrofunctions 0.2.1

Problem

Travis CI has been changing, making my Continuous Integration system less convenient.

Solution

I would like to:

  • Switch from Travis CI to GitHub Actions
  • Start using Tox to manage my testing environments
    • add a tox.ini file
    • remove travis.yml
    • How do I keep systems for local coverage & online codecov
  • Remove all unnecessary requirements
    • ipykernel
    • nbsphinx
  • keep using Black

should raise warning when NWIS.get_data() is called.

  • HydroFunctions version: 0.1.8dev
  • Python version: 3.6

Description

When the user tries to use .get_data(), a warning should be raised instead of printing something.

What I Did

import hydrofunctions as hf
test = hf.NWIS('01542500', period='P5D').get_data()

Replacement

Instead, the code should:

import warnings
...
def get_data(self):
    warning.warn("don't do this anymore", FutureWarning)
    return self

extract_nwis_df wrong variable name

  • HydroFunctions version:
  • Python version:
  • Operating System:

Description

I ran a hydrofunctions.NWIS query to get all temperature data in the state of California. When converting to a dataframe, there was a division by zero error since the data frequency was miscalculated.

What I Did to resolve this error

I adjusted the source code so that in the extract_nwis_df at lines 571 and 572 of hydrofunctions.py (version on github), I replaced the variable 'freqs' with 'freqs2', which removes the zeros and prevents the division by zero error.

The parameters used to request data should be stored as properties of NWIS.

The most important parameters that define the data request from the NWIS should be stored as properties of the NWIS object. These are:

  • start: the start time
  • stop: the stop time
  • time: alternatively, this could be a tuple (start, stop) or something.
  • sites: the site IDs
  • parameters: the parameter codes that were collected

These values should be parsed and stored as a string or a list of strings, and converted to a comma-delimited list when .get_data() is performed, or .get_nwis() runs.

Use the @property decorator to create getter methods and make this read-only.

Add error messages to read_dbf

  • HydroFunctions version: 0.1.8dev
  • Python version: 3.6+

Description

If you request a USGS data table for a sure that doesn't exist, you get a nonsense error that is hard for a beginner to figure out.

What I Did

hf.rating_curve('666')

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.