Git Product home page Git Product logo

pygeohydro's Introduction

https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/pygeohydro_logo.png

JOSS

Package Description Status
PyNHD Navigate and subset NHDPlus (MR and HR) using web services Github Actions
Py3DEP Access topographic data through National Map's 3DEP web service Github Actions
PyGeoHydro Access NWIS, NID, WQP, eHydro, NLCD, CAMELS, and SSEBop databases Github Actions
PyDaymet Access daily, monthly, and annual climate data via Daymet Github Actions
PyGridMET Access daily climate data via GridMET Github Actions
PyNLDAS2 Access hourly NLDAS-2 data via web services Github Actions
HydroSignatures A collection of tools for computing hydrological signatures Github Actions
AsyncRetriever High-level API for asynchronous requests with persistent caching Github Actions
PyGeoOGC Send queries to any ArcGIS RESTful-, WMS-, and WFS-based services Github Actions
PyGeoUtils Utilities for manipulating geospatial, (Geo)JSON, and (Geo)TIFF data Github Actions

PyGeoHydro: Retrieve Geospatial Hydrology Data

PyPi Conda Version CodeCov Python Versions Downloads

CodeFactor Ruff pre-commit Binder

Features

PyGeoHydro (formerly named hydrodata) is a part of HyRiver software stack that is designed to aid in hydroclimate analysis through web services. This package provides access to some public web services that offer geospatial hydrology data. It has three main modules: pygeohydro, plot, and helpers.

PyGeoHydro supports the following datasets:

  • gNATSGO for US soil properties.
  • SoilGrids for seamless global soil properties.
  • Derived Soil Properties for soil porosity, available water capacity, and field capacity across the US.
  • NWIS for daily mean streamflow observations (returned as a pandas.DataFrame or xarray.Dataset with station attributes),
  • SensorThings API for accessing real-time data of USGS sensors.
  • CAMELS for accessing streamflow observations (1980-2014) and basin-level attributes of 671 stations within CONUS.
  • Water Quality Portal for accessing current and historical water quality data from more than 1.5 million sites across the US,
  • NID for accessing the National Inventory of Dams web service,
  • HCDN 2009 for identifying sites where human activity affects the natural flow of the watercourse,
  • NLCD 2021 for land cover/land use, imperviousness descriptor, and canopy data. You can get data using both geometries and coordinates.
  • WBD for accessing Hydrologic Unit (HU) polygon boundaries within the US (all HUC levels).
  • SSEBop for daily actual evapotranspiration, for both single pixel and gridded data.
  • Irrigation Withdrawals for estimated monthly water use for irrigation by 12-digit hydrologic unit in the CONUS for 2015
  • STN for access USGS Short-Term Network (STN)
  • eHydro for accessing USACE Hydrographic Surveys that includes topobathymetry data
  • NFHL for accessing FEMA's National Flood Hazard Layer (NFHL) data.

Also, it includes several other functions:

  • interactive_map: Interactive map for exploring NWIS stations within a bounding box.
  • cover_statistics: Categorical statistics of land use/land cover data.
  • overland_roughness: Estimate overland roughness from land use/land cover data.
  • streamflow_fillna: Fill missing daily streamflow values with day-of-year averages. Streamflow observations must be at least for 10-year long.

The plot module includes two main functions:

  • signatures: Hydrologic signature graphs.
  • cover_legends: Official NLCD land cover legends for plotting a land cover dataset.
  • descriptor_legends: Color map and legends for plotting an imperviousness descriptor dataset.

The helpers module includes:

  • nlcd_helper: A roughness coefficients lookup table for each land cover and imperviousness descriptor type which is useful for overland flow routing among other applications.
  • nwis_error: A dataframe for finding information about NWIS requests' errors.

You can find some example notebooks here.

Moreover, under the hood, PyGeoHydro uses PyGeoOGC and AsyncRetriever packages for making requests in parallel and storing responses in chunks. This improves the reliability and speed of data retrieval significantly.

You can control the request/response caching behavior and verbosity of the package by setting the following environment variables:

  • HYRIVER_CACHE_NAME: Path to the caching SQLite database for asynchronous HTTP requests. It defaults to ./cache/aiohttp_cache.sqlite
  • HYRIVER_CACHE_NAME_HTTP: Path to the caching SQLite database for HTTP requests. It defaults to ./cache/http_cache.sqlite
  • HYRIVER_CACHE_EXPIRE: Expiration time for cached requests in seconds. It defaults to one week.
  • HYRIVER_CACHE_DISABLE: Disable reading/writing from/to the cache. The default is false.
  • HYRIVER_SSL_CERT: Path to a SSL certificate file.

For example, in your code before making any requests you can do:

import os

os.environ["HYRIVER_CACHE_NAME"] = "path/to/aiohttp_cache.sqlite"
os.environ["HYRIVER_CACHE_NAME_HTTP"] = "path/to/http_cache.sqlite"
os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
os.environ["HYRIVER_CACHE_DISABLE"] = "true"
os.environ["HYRIVER_SSL_CERT"] = "path/to/cert.pem"

You can also try using PyGeoHydro without installing it on your system by clicking on the binder badge. A Jupyter Lab instance with the HyRiver stack pre-installed will be launched in your web browser, and you can start coding!

Moreover, requests for additional functionalities can be submitted via issue tracker.

Citation

If you use any of HyRiver packages in your research, we appreciate citations:

@article{Chegini_2021,
    author = {Chegini, Taher and Li, Hong-Yi and Leung, L. Ruby},
    doi = {10.21105/joss.03175},
    journal = {Journal of Open Source Software},
    month = {10},
    number = {66},
    pages = {1--3},
    title = {{HyRiver: Hydroclimate Data Retriever}},
    volume = {6},
    year = {2021}
}

Installation

You can install PyGeoHydro using pip after installing libgdal on your system (for example, in Ubuntu run sudo apt install libgdal-dev). Moreover, PyGeoHydro has an optional dependency for using persistent caching, requests-cache. We highly recommend installing this package as it can significantly speed up send/receive queries. You don't have to change anything in your code, since PyGeoHydro under-the-hood looks for requests-cache and if available, it will automatically use persistent caching:

$ pip install pygeohydro

Alternatively, PyGeoHydro can be installed from the conda-forge repository using Conda:

$ conda install -c conda-forge pygeohydro

Quick start

We can obtain river topobathymetry data using the EHydro class. We can subset the dataset either using a geometry or a bounding box, based on their ID, or SQL query:

from pygeohydro import EHydro

ehydro = EHydro("points")
topobathy = ehydro.bygeom((-122.53, 45.57, -122.52, 45.59))

We can explore the available NWIS stations within a bounding box using interactive_map function. It returns an interactive map and by clicking on a station some of the most important properties of stations are shown.

import pygeohydro as gh

bbox = (-69.5, 45, -69, 45.5)
gh.interactive_map(bbox)
Interactive Map

We can select all the stations within this boundary box that have daily mean streamflow data from 2000-01-01 to 2010-12-31:

from pygeohydro import NWIS

nwis = NWIS()
query = {
    "bBox": ",".join(f"{b:.06f}" for b in bbox),
    "hasDataTypeCd": "dv",
    "outputDataTypeCd": "dv",
}
info_box = nwis.get_info(query)
dates = ("2000-01-01", "2010-12-31")
stations = info_box[
    (info_box.begin_date <= dates[0]) & (info_box.end_date >= dates[1])
].site_no.tolist()

Then, we can get the daily streamflow data in mm/day (by default the values are in cms) and plot them:

from pygeohydro import plot

qobs = nwis.get_streamflow(stations, dates, mmd=True)
plot.signatures(qobs)

By default, get_streamflow returns a pandas.DataFrame that has a attrs method containing metadata for all the stations. You can access it like so qobs.attrs. Moreover, we can get the same data as xarray.Dataset as follows:

qobs_ds = nwis.get_streamflow(stations, dates, to_xarray=True)

This xarray.Dataset has two dimensions: time and station_id. It has 10 variables including discharge with two dimensions while other variables that are station attitudes are one dimensional.

We can also get instantaneous streamflow data using get_streamflow. This method assumes that the input dates are in UTC time zone and returns the data in UTC time zone as well.

date = ("2005-01-01 12:00", "2005-01-12 15:00")
qobs = nwis.get_streamflow("01646500", date, freq="iv")

We can query USGS stations of type "stream" in Arizona using SensorThings API as follows:

odata = {
    "filter": "properties/monitoringLocationType eq 'Stream' and properties/stateFIPS eq 'US:04'",
}
df = sensor.query_byodata(odata)

Irrigation withdrawals data can be obtained as follows:

irr = gh.irrigation_withdrawals()

We can get the CAMELS dataset as a geopandas.GeoDataFrame that includes geometry and basin-level attributes of 671 natural watersheds within CONUS and their streamflow observations between 1980-2014 as a xarray.Dataset, like so:

attrs, qobs = gh.get_camels()

The WaterQuality has a number of convenience methods to retrieve data from the web service. Since there are many parameter combinations that can be used to retrieve data, a general method is also provided to retrieve data from any of the valid endpoints. You can use get_json to retrieve stations info as a geopandas.GeoDataFrame or get_csv to retrieve stations data as a pandas.DataFrame. You can construct a dictionary of the parameters and pass it to one of these functions. For more information on the parameters, please consult the Water Quality Data documentation. For example, let's find all the stations within a bounding box that have Caffeine data:

from pynhd import WaterQuality

bbox = (-92.8, 44.2, -88.9, 46.0)
kwds = {"characteristicName": "Caffeine"}
wq = WaterQuality()
stations = wq.station_bybbox(bbox, kwds)

Or the same criterion but within a 30-mile radius of a point:

stations = wq.station_bydistance(-92.8, 44.2, 30, kwds)

Then we can get the data for all these stations the data like this:

sids = stations.MonitoringLocationIdentifier.tolist()
caff = wq.data_bystation(sids, kwds)
Water Quality

Moreover, we can get land use/land cove data using nlcd_bygeom or nlcd_bycoods functions, percentages of land cover types using cover_statistics, and overland roughness using overland_roughness. The nlcd_bycoords function returns a geopandas.GeoDataFrame with the NLCD layers as columns and input coordinates as the geometry column. Moreover, the nlcd_bygeom function accepts both a single geometry or a geopandas.GeoDataFrame as the input.

from pynhd import NLDI

basins = NLDI().get_basins(["01031450", "01318500", "01031510"])
lulc = gh.nlcd_bygeom(basins, 100, years={"cover": [2016, 2019]})
stats = gh.cover_statistics(lulc["01318500"].cover_2016)
roughness = gh.overland_roughness(lulc["01318500"].cover_2019)
Land Use/Land Cover

Next, let's use ssebopeta_bygeom to get actual ET data for a basin. Note that there's a ssebopeta_bycoords function that returns an ETA time series for a single coordinate.

geometry = NLDI().get_basins("01315500").geometry[0]
eta = gh.ssebopeta_bygeom(geometry, dates=("2005-10-01", "2005-10-05"))
Actual ET

Additionally, we can pull all the US dams data using NID. Let's get dams that are within this bounding box and have a maximum storage larger than 200 acre-feet.

nid = NID()
dams = nid.get_bygeom((-65.77, 43.07, -69.31, 45.45), 4326)
dams = nid.inventory_byid(dams.id.to_list())
dams = dams[dams.maxStorage > 200]

We can get also all dams within CONUS with maximum storage larger than 2500 acre-feet:

conus_geom = gh.get_us_states("contiguous")

dam_list = nid.get_byfilter([{"maxStorage": ["[2500 +inf]"]}])
dams = nid.inventory_byid(dam_list[0].id.to_list(), stage_nid=True)

conus_dams = dams[dams.stateKey.isin(conus_geom.STUSPS)].reset_index(drop=True)
Dams

The WBD class allows us to get Hydrologic Unit (HU) polygon boundaries. Let's get the two Hudson HUC4s:

from pygeohydro import WBD

wbd = WBD("huc4")
hudson = wbd.byids("huc4", ["0202", "0203"])

The NFHL class allows us to retrieve FEMA's National Flood Hazard Layer (NFHL) data. Let's get the cross-section data for a small region in Vermont:

from pygeohydro import NFHL

nfhl = NFHL("NFHL", "cross-sections")
gdf_xs = nfhl.bygeom((-73.42, 43.28, -72.9, 43.52), geo_crs=4269)

Contributing

Contributions are very welcomed. Please read CONTRIBUTING.rst file for instructions.

Credits

This package was created based on the audreyr/cookiecutter-pypackage project template.

pygeohydro's People

Contributors

aaraney avatar cheginit avatar deepsourcebot avatar dependabot-preview[bot] avatar dependabot[bot] avatar emiliom avatar fernando-aristizabal avatar pre-commit-ci[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pygeohydro's Issues

ImportError: DLL load failed: The specified procedure could not be found.

What happened?

I just installed Pygeohydro on my Anaconda by using Conda install prompt. When I imported this package, this issue happened.

What did you expect to happen?

No response

Minimal Complete Verifiable Example

No response

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

ImportError                               Traceback (most recent call last)
<ipython-input-3-9e1575f9c961> in <module>
----> 1 import pygeohydro

C:\Anaconda\lib\site-packages\pygeohydro\__init__.py in <module>
      4 from .exceptions import InvalidInputRange, InvalidInputType, InvalidInputValue, ZeroMatched
      5 from .print_versions import show_versions
----> 6 from .pygeohydro import (
      7     NID,
      8     NWIS,

C:\Anaconda\lib\site-packages\pygeohydro\pygeohydro.py in <module>
     14 import pandas as pd
     15 import pygeoogc as ogc
---> 16 import pygeoutils as geoutils
     17 import rasterio as rio
     18 import xarray as xr

C:\Anaconda\lib\site-packages\pygeoutils\__init__.py in <module>
      4 from .exceptions import InvalidInputType, InvalidInputValue
      5 from .print_versions import show_versions
----> 6 from .pygeoutils import MatchCRS, arcgis2geojson, gtiff2xarray, json2geodf
      7 
      8 try:

C:\Anaconda\lib\site-packages\pygeoutils\pygeoutils.py in <module>
     14 import orjson as json
     15 import pyproj
---> 16 import rasterio as rio
     17 import rasterio.mask as rio_mask
     18 import rasterio.transform as rio_transform

C:\Anaconda\lib\site-packages\rasterio\__init__.py in <module>
     20             pass
     21 
---> 22 from rasterio._base import gdal_version
     23 from rasterio.drivers import is_blacklisted
     24 from rasterio.dtypes import (

ImportError: DLL load failed: The specified procedure could not be found.

Anything else we need to know?

I change the Python version to 3.6 and this issue can be solved. However, a new issue happen.
AttributeError Traceback (most recent call last)
in
----> 1 import pygeohydro

C:\Anaconda\envs\env\lib\site-packages\pygeohydro_init_.py in
1 from pkg_resources import DistributionNotFound, get_distribution
2
----> 3 from . import helpers, plot
4 from .exceptions import InvalidInputRange, InvalidInputType, InvalidInputValue
5 from .print_versions import show_versions

C:\Anaconda\envs\env\lib\site-packages\pygeohydro\helpers.py in
5 import numpy as np
6 import pandas as pd
----> 7 from pygeoogc import RetrySession
8
9

C:\Anaconda\envs\env\lib\site-packages\pygeoogc_init_.py in
24
25 if sys.platform.startswith("win"):
---> 26 asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

AttributeError: module 'asyncio' has no attribute 'WindowsSelectorEventLoopPolicy'

Environment

0.11.0

Error "cannot import name 'MatchCRS' from 'pygeoogc'" when importing pynhd or pygeohydro

What happened?

Got error: "cannot import name 'MatchCRS' from 'pygeoogc'" when importing pynhd. Windows conda env info:

  • python: 3.9.16
  • pynhd: 0.2.0
  • pygeoogc: 0.14.0
  • pygeohydro: 0.11.0

What did you expect to happen?

No response

Minimal Complete Verifiable Example

import pynhd

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[3], line 1
----> 1 import pynhd
      2 import pandas
      3 import geopandas

File ~\anaconda3\envs\flow_Ml\lib\site-packages\pynhd\__init__.py:7
      5 from .network_tools import prepare_nhdplus, topoogical_sort, vector_accumulation
      6 from .print_versions import show_versions
----> 7 from .pynhd import NLDI, NHDPlusHR, WaterData
      9 try:
     10     __version__ = get_distribution(__name__).version

File ~\anaconda3\envs\flow_Ml\lib\site-packages\pynhd\pynhd.py:10
      8 import pygeoogc as ogc
      9 import pygeoutils as geoutils
---> 10 from pygeoogc import WFS, ArcGISRESTful, MatchCRS, RetrySession, ServiceURL
     11 from requests import Response
     12 from shapely.geometry import MultiPolygon, Polygon

ImportError: cannot import name 'MatchCRS' from 'pygeoogc'

Anything else we need to know?

No response

Environment

get_streamflow() to_xarray inconsistent dtypes

What happened:
Repeated calls to get_streamflow() returning an xarray DataSet have different dtypes for some fields (notably, strings).

What you expected to happen:
The returned encodings/schema would be consistent for all calls, and match the internal
schema of the NWIS database from which the data is fetched.

Minimal Complete Verifiable Example:

from pygeohydro import NWIS
nwis=NWIS()
DATE_RANGE=("2020-01-01", "2020-12-31")
site_A = nwis.get_streamflow('USGS-402114105350101', DATE_RANGE, to_xarray=True )
site_B = nwis.get_streamflow('USGS-02277600', DATE_RANGE, to_xarray=True )

assert site_A['station_nm'].dtype == site_B['station_nm'].dtype
## fails

assert site_A['alt_datum_cd'].dtype == site_B['alt_datum_cd'].dtype
## fails

Anything else we need to know?:
This has come up for me as I try to fetch streamflow data one gage at a time as part of a parallelized workflow -- each worker fetches one streamgage, manipulates it, then appends to a common dataset (in my case, a zarr store). The common zarr store was templated using NWIS.get_streamflow() data, which established the 'standard' dtypes.

The dtypes for these particular fields (station_nm and alt_datum_cd) are unicode strings, with the length of the string (and the dtype) being that of the returned data for a given request. That is, the dtype for Site_A's alt_datum_cd (above) is '<U6' because the data happens to be 6 chars for that gage. For Site_B's alt_datum_cd, the dtype is '<U1'. It isn't just that the string is shorter, the dtype is different, which causes the zarr write to fail.

I can work around this by re-casting in the case of these two strings:

Site_B['alt_datum_cd'] = xr.DataArray(data=Site_B['alt_datum_cd'].values.astype('<U6'), dims='gage_id')

But in the case of the station name field, I don't know what the max length might be from the database. I can cast to '<U46' (the dtype for Site_A's station_nm), but other gages may have longer names, which will be truncated when cast to this dtype.

It would be useful to have get_streamflow() return the same string encoding/dtype in all cases, so that separate calls can be treated identically.

Environment:

Output of pygeohydro.show_versions()
INSTALLED VERSIONS
------------------
commit: None
python: 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:56:21) 
[GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 5.4.181-99.354.amzn2.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.12.2
libnetcdf: 4.8.1
aiodns: 3.0.0
aiohttp: 3.8.3
aiohttp-client-cache: 0.7.3
aiosqlite: 0.17.0
async-retriever: 0.3.6
bottleneck: 1.3.5
brotli: installed
cchardet: 2.1.7
click: 8.0.2
cytoolz: 0.12.0
dask: 2022.04.2
defusedxml: 0.7.1
folium: 0.13.0
geopandas: 0.11.1
lxml: 4.9.1
matplotlib: 3.4.3
netCDF4: 1.6.0
networkx: 2.8.7
numpy: 1.23.3
owslib: 0.27.2
pandas: 1.4.2
py3dep: 0.13.6
pyarrow: 9.0.0
pydantic: 1.10.2
pydaymet: 0.13.6
pygeohydro: 0.13.6
pygeoogc: 0.13.6
pygeos: 0.13
pygeoutils: 0.13.6
pynhd: 0.13.6
pyproj: 3.4.0
pytest: None
pytest-cov: None
rasterio: 1.3.2
requests: 2.28.1
requests-cache: 0.9.6
richdem: None
rioxarray: 0.12.2
scipy: 1.9.1
shapely: 1.8.4
tables: 3.7.0
ujson: 5.5.0
urllib3: 1.26.11
xarray: 2022.9.0
xdist: None
yaml: 5.4.1

StreamStats vs. NLDI Watershed

I was compiling a list of projects/users for the StreamStats services and Dave Blodgett indicated you were using StreamStats. I see you've replaced StreamStats with NLDI services. I have a couple of additional pieces of information for you:
NLDI and StreamStats are working together to revise the NLDI delineation tools so they will delineate from a click point not just from the catchment.
The data processing steps and quality assurance work, as well as the underlying data in StreamStats, typically mean that delineations from StreamStats will be more accurate than from the NHDPlus datasets being queried in NLDI. For example, South Carolina data is based on lidar data, we're currently working on 3-meter lidar data in Nebraska. Thus, depending on the use, you may want to include the option of using StreamStats as well as NLDI.

Thanks!

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Describe the solution you'd like
A clear and concise description of what you want to happen.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context about the feature request here.

`pygeohydro.soil_gnatsgo()` function erroring out

What happened?

I was following the soil storage capacity tutorial and was unable to run the pygeohydro.soil_gnatsgo() function without getting errors. My code is below. I'm not sure if this is an issue with my conda environment or something with the gnatsgo database related so I've also provided my environment yaml and the error message. It's uploaded as a text file because I couldn't upload it as a .yaml file.

I'm on Windows 11 with Python 3.12.2. I've defined my PROJ_LIB and PROJ_DATA paths at the start of the script and those should be pointing to the environment created by the yaml. I've tried to update rasterio with mamba with mamba update rasterio, but it's telling me everything is up-to-date.

What did you expect to happen?

I expected to have a similar output for soil thickness as is shown in the soil storage capacity tutorial, but my code is erroring out and I'm not able to access the thickness data from gNATSGO. I've tried other STAC variables as well, but have had no luck with those either.

Minimal Complete Verifiable Example

# get basin
test_basin = pynhd.NLDI().get_basins("11092450")

# get basin wkt string
test_basin_rasterio_wkt = rasterio.crs.CRS.from_wkt(test_basin.crs.to_wkt())

# get basin geom
test_basin_geom = test_basin.geometry["USGS-11092450"]

# get soil properties data (this works fine)
test_soils_data = pygeohydro.soil_properties() # this runs with rasterio warnings but gives result

# mask soil properties data with basin geom
test_soils_data_mask = pygeoutils.xarray_geomask(test_soil_data, test_basin_geom, test_basin_rasterio_wkt)
# i kept getting rasterio errors if i used test_basin.crs.to_wkt() here rather than test_basin_rasterio_wkt

# get soil thickness data
test_thickness_data = pygeohydro.soil_gnatsgo("tk0_999a", test_field_geom, test_basin_rasterio_wkt)
# this has similar rasterio warnings as above but errors out with more rasterio errors

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

Error messages:

>>> test_mukey_data = pygeohydro.soil_gnatsgo("tk0_999a", test_field_geom, test_basin_rasterio_wkt)

WARNING:rasterio._env:CPLE_AppDefined in PROJ: proj_create_from_database: SQLite error on SELECT name, type, coordinate_system_auth_name, coordinate_system_code, datum_auth_name, datum_code, area_of_use_auth_name, area_of_use_code, text_definition, deprecated FROM geodetic_crs WHERE auth_name = ? AND code = ?: no such column: area_of_use_auth_name
WARNING:rasterio._env:CPLE_AppDefined in PROJ: proj_create_from_database: SQLite error on SELECT name, ellipsoid_auth_name, ellipsoid_code, prime_meridian_auth_name, prime_meridian_code, area_of_use_auth_name, area_of_use_code, deprecated FROM geodetic_datum WHERE auth_name = ? AND code = ?: no such column: area_of_use_auth_name
Traceback (most recent call last):
  File "C:\Users\sheila.saia\AppData\Local\anaconda3\envs\esmc_env\Lib\site-packages\xarray\backends\file_manager.py", line 211, in _acquire_with_cache_info
    file = self._cache[self._key]
           ~~~~~~~~~~~^^^^^^^^^^^
  File "C:\Users\sheila.saia\AppData\Local\anaconda3\envs\esmc_env\Lib\site-packages\xarray\backends\lru_cache.py", line 56, in __getitem__
    value = self._cache[key]
            ~~~~~~~~~~~^^^^^
KeyError: [<function open at 0x000001F4F2AAE3E0>, (WindowsPath('cache/005089ad56d76b182f3308ea5dc486455e1b3e28e2af21f4c554edab4c89a04a.tiff'),), 'r', (('sharing', False),), '69f84f8f-772c-4148-8a69-80468c6750b4']

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "rasterio\\crs.pyx", line 730, in rasterio.crs.CRS.from_wkt
  File "rasterio\\_err.pyx", line 209, in rasterio._err.exc_wrap_ogrerr
rasterio._err.CPLE_BaseError: OGR Error code 5

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\sheila.saia\AppData\Local\anaconda3\envs\esmc_env\Lib\site-packages\pygeohydro\pygeohydro.py", line 800, in soil_gnatsgo
    ds = xr.merge((get_layer(lyr) for lyr in lyrs), combine_attrs="drop_conflicts")
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\sheila.saia\AppData\Local\anaconda3\envs\esmc_env\Lib\site-packages\xarray\core\merge.py", line 1015, in merge
    for obj in objects:
  File "C:\Users\sheila.saia\AppData\Local\anaconda3\envs\esmc_env\Lib\site-packages\pygeohydro\pygeohydro.py", line 800, in <genexpr>
    ds = xr.merge((get_layer(lyr) for lyr in lyrs), combine_attrs="drop_conflicts")
                   ^^^^^^^^^^^^^^
  File "C:\Users\sheila.saia\AppData\Local\anaconda3\envs\esmc_env\Lib\site-packages\pygeohydro\pygeohydro.py", line 791, in get_layer
    ds = xr.merge(_open_tiff(f, lyr) for f in fpaths)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\sheila.saia\AppData\Local\anaconda3\envs\esmc_env\Lib\site-packages\xarray\core\merge.py", line 1015, in merge
    for obj in objects:
  File "C:\Users\sheila.saia\AppData\Local\anaconda3\envs\esmc_env\Lib\site-packages\pygeohydro\pygeohydro.py", line 791, in <genexpr>
    ds = xr.merge(_open_tiff(f, lyr) for f in fpaths)
                  ^^^^^^^^^^^^^^^^^^
  File "C:\Users\sheila.saia\AppData\Local\anaconda3\envs\esmc_env\Lib\site-packages\pygeohydro\pygeohydro.py", line 740, in _open_tiff
    ds = rxr.open_rasterio(file)
         ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\sheila.saia\AppData\Local\anaconda3\envs\esmc_env\Lib\site-packages\rioxarray\_io.py", line 1124, in open_rasterio
    riods = manager.acquire()
            ^^^^^^^^^^^^^^^^^
  File "C:\Users\sheila.saia\AppData\Local\anaconda3\envs\esmc_env\Lib\site-packages\xarray\backends\file_manager.py", line 193, in acquire
    file, _ = self._acquire_with_cache_info(needs_lock)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\sheila.saia\AppData\Local\anaconda3\envs\esmc_env\Lib\site-packages\xarray\backends\file_manager.py", line 217, in _acquire_with_cache_info
    file = self._opener(*self._args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\sheila.saia\AppData\Local\anaconda3\envs\esmc_env\Lib\site-packages\rasterio\env.py", line 451, in wrapper
    return f(*args, **kwds)
           ^^^^^^^^^^^^^^^^
  File "C:\Users\sheila.saia\AppData\Local\anaconda3\envs\esmc_env\Lib\site-packages\rasterio\__init__.py", line 304, in open
    dataset = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "rasterio\\_base.pyx", line 331, in rasterio._base.DatasetBase.__init__
  File "rasterio\\_base.pyx", line 350, in rasterio._base.DatasetBase._set_attrs_from_dataset_handle
  File "rasterio\\_base.pyx", line 408, in rasterio._base.DatasetBase.read_crs
  File "rasterio\\_base.pyx", line 385, in rasterio._base.DatasetBase._handle_crswkt
  File "rasterio\\crs.pyx", line 732, in rasterio.crs.CRS.from_wkt
rasterio.errors.CRSError: The WKT could not be parsed. OGR Error code 5

Anything else we need to know?

Thank you for creating the HyRiver python tools and the tutorials! They are very helpful and I'm excited to use them more in my work. 💦🐍

Environment

sheila_env_yaml.txt

The tree canopy layer name has changed

What happened?

The MRLC web service has changed the layer names of the tree canopy layers.

What did you expect to happen?

Requesting for canopy shouldn't fail.

Minimal Complete Verifiable Example

from pygeoogc import WMS, ServiceURL

wms = WMS(
    ServiceURL().wms.mrlc,
    layers="NLCD_2011_Tree_Canopy_L48",
    outformat="image/geotiff",
    crs=4326,
    validation=False,
)
wms_resp = wms.getmap_bybox(
    (-69.77, 45.07, -69.31, 45.45),
    1e3,
)

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

SYS INFO

commit: None
python: 3.10.11 | packaged by conda-forge | (main, May 10 2023, 19:07:22) [Clang 14.0.6 ]
python-bits: 64
OS: Darwin
OS-release: 22.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: en_US.UTF-8
libhdf5: 1.14.0
libnetcdf: 4.9.2

PACKAGE VERSION

aiodns 3.0.0
aiohttp 3.8.4
aiohttp-client-cache 0.8.1
aiosqlite 0.19.0
async-retriever 0.15.0
bottleneck 1.3.7
brotli 1.0.9
click 8.0.3
cytoolz 0.12.0
dask 2023.5.1
defusedxml 0.7.1
folium 0.14.0
geopandas 0.13.0
h5netcdf 1.2.0
hydrosignatures 0.15.1.dev3+gfaa6354
lxml 4.9.2
matplotlib 3.7.1
netCDF4 1.6.3
networkx 3.1
numba 0.57.0
numpy 1.24.3
owslib 0.29.2
pandas 2.0.2
py3dep 0.14.1.dev30+g4eb740f
pyarrow 12.0.0
pydaymet 0.14.1.dev20+g9aa0d8b
pygeohydro 0.15.1.dev1+g1c902b0.d20230523
pygeoogc 0.14.1.dev31+g0c6d4f1
pygeos N/A
pygeoutils 0.14.1.dev22+gf377c19
pynhd 0.14.1.dev38+g7d12f75
pynldas2 0.14.1.dev27+g8d2f7cb
pyproj 3.5.0
pytest 7.3.1
pytest-cov 4.1.0
rasterio 1.3.7
requests 2.31.0
requests-cache 1.0.1
richdem N/A
rioxarray 0.14.1
scipy 1.10.1
shapely 2.0.1
tables 3.8.0
ujson 5.7.0
urllib3 2.0.2
xarray 2023.5.0
xdist N/A
yaml N/A

'utf-8' codec error from pynhd

What happened?

When passing nhd_info=True to nwis.get_info() function, got error. I was able to replicate this error in new Colab environment. with pygeohydro-0.16.0 and pynhd-0.16.2

from pygeohydro import NWIS

Outlet = '01500500'
ParamCd = '00060'

nwis = NWIS()

query = {
    "site": Outlet,
    "parameterCd": ParamCd,
    "siteTypeCd": "ST",
    "hasDataTypeCd": "dv"
}
Outlet_gdf = nwis.get_info(query, expanded=True, nhd_info=True)

What did you expect to happen?

It has been working recently, but got error today.

Minimal Complete Verifiable Example

No response

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

UnicodeDecodeError                        Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/async_retriever/_utils.py](https://localhost:8080/#) in retriever(uid, url, s_kwds, session, read_type, r_kwds, raise_status)
     81         try:
---> 82             return uid, await getattr(response, read_type)(**r_kwds)
     83         except (ClientResponseError, ValueError) as ex:

17 frames
[/usr/local/lib/python3.10/dist-packages/aiohttp/client_reqrep.py](https://localhost:8080/#) in text(self, encoding, errors)
   1147 
-> 1148         return self._body.decode(  # type: ignore[no-any-return,union-attr]
   1149             encoding, errors=errors

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 31378: invalid start byte

During handling of the above exception, another exception occurred:

UnicodeDecodeError                        Traceback (most recent call last)
[<ipython-input-2-89c816bf29ae>](https://localhost:8080/#) in <cell line: 14>()
     12     "hasDataTypeCd": "dv"
     13 }
---> 14 Outlet_gdf = nwis.get_info(query, expanded=True, nhd_info=True)

[/usr/local/lib/python3.10/dist-packages/pygeohydro/nwis.py](https://localhost:8080/#) in get_info(self, queries, expanded, fix_names, nhd_info)
    385 
    386         if nhd_info:
--> 387             nhd = self._nhd_info(sites["site_no"].to_list())
    388             sites = pd.merge(sites, nhd, left_on="site_no", right_on="site_no", how="left")
    389 

[/usr/local/lib/python3.10/dist-packages/pygeohydro/nwis.py](https://localhost:8080/#) in _nhd_info(site_ids)
    296         except (TypeError, IntCastingNaNError):
    297             area["comid"] = area["comid"].astype("Int32")
--> 298         nhd_area = pynhd.streamcat("fert", comids=area["comid"].dropna().to_list())
    299         area = area.merge(
    300             nhd_area[["COMID", "WSAREASQKM"]], left_on="comid", right_on="COMID", how="left"

[/usr/local/lib/python3.10/dist-packages/pynhd/nhdplus_derived.py](https://localhost:8080/#) in streamcat(metric_names, metric_areas, comids, regions, states, counties, conus, percent_full, area_sqkm)
    666         A dataframe with the requested metrics.
    667     """
--> 668     sc = StreamCatValidator()
    669     names = [metric_names] if isinstance(metric_names, str) else metric_names
    670     names = [sc.alt_names.get(s.lower(), s.lower()) for s in names]

[/usr/local/lib/python3.10/dist-packages/pynhd/nhdplus_derived.py](https://localhost:8080/#) in __init__(self)
    533 class StreamCatValidator(StreamCat):
    534     def __init__(self) -> None:
--> 535         super().__init__()
    536 
    537     def validate(

[/usr/local/lib/python3.10/dist-packages/pynhd/nhdplus_derived.py](https://localhost:8080/#) in __init__(self)
    508 
    509         url_vars = f"{self.base_url}/variable_info.csv"
--> 510         names = pd.read_csv(io.StringIO(ar.retrieve_text([url_vars])[0]))
    511         names["METRIC_NAME"] = names["METRIC_NAME"].str.replace(r"\[AOI\]|Slp[12]0", "", regex=True)
    512         names["SLOPE"] = [

[/usr/local/lib/python3.10/dist-packages/async_retriever/async_retriever.py](https://localhost:8080/#) in retrieve_text(urls, request_kwds, request_method, max_workers, cache_name, timeout, expire_after, ssl, disable, raise_status)
    500     '01646500'
    501     """
--> 502     return retrieve(
    503         urls,
    504         "text",

[/usr/local/lib/python3.10/dist-packages/async_retriever/async_retriever.py](https://localhost:8080/#) in retrieve(urls, read_method, request_kwds, request_method, max_workers, cache_name, timeout, expire_after, ssl, disable, raise_status)
    433     results = (loop.run_until_complete(session(url_kwds=c)) for c in chunked_reqs)
    434 
--> 435     resp = [r for _, r in sorted(tlz.concat(results))]
    436     if new_loop:
    437         loop.close()

[/usr/local/lib/python3.10/dist-packages/async_retriever/async_retriever.py](https://localhost:8080/#) in <genexpr>(.0)
    431     chunked_reqs = tlz.partition_all(max_workers, inp.url_kwds)
    432     loop, new_loop = utils.get_event_loop()
--> 433     results = (loop.run_until_complete(session(url_kwds=c)) for c in chunked_reqs)
    434 
    435     resp = [r for _, r in sorted(tlz.concat(results))]

[/usr/local/lib/python3.10/dist-packages/nest_asyncio.py](https://localhost:8080/#) in run_until_complete(self, future)
     96                 raise RuntimeError(
     97                     'Event loop stopped before Future completed.')
---> 98             return f.result()
     99 
    100     def _run_once(self):

[/usr/lib/python3.10/asyncio/futures.py](https://localhost:8080/#) in result(self)
    199         self.__log_traceback = False
    200         if self._exception is not None:
--> 201             raise self._exception.with_traceback(self._exception_tb)
    202         return self._result
    203 

[/usr/lib/python3.10/asyncio/tasks.py](https://localhost:8080/#) in __step(***failed resolving arguments***)
    230                 # We use the `send` method directly, because coroutines
    231                 # don't have `__iter__` and `__next__` methods.
--> 232                 result = coro.send(None)
    233             else:
    234                 result = coro.throw(exc)

[/usr/local/lib/python3.10/dist-packages/async_retriever/async_retriever.py](https://localhost:8080/#) in async_session_with_cache(url_kwds, read, r_kwds, request_method, cache_name, timeout, expire_after, ssl, raise_status)
    233             for uid, url, kwds in url_kwds
    234         )
--> 235         return await asyncio.gather(*tasks)  # pyright: ignore[reportGeneralTypeIssues]
    236 
    237 

[/usr/lib/python3.10/asyncio/tasks.py](https://localhost:8080/#) in __wakeup(self, future)
    302     def __wakeup(self, future):
    303         try:
--> 304             future.result()
    305         except BaseException as exc:
    306             # This may also be a cancellation.

[/usr/lib/python3.10/asyncio/tasks.py](https://localhost:8080/#) in __step(***failed resolving arguments***)
    230                 # We use the `send` method directly, because coroutines
    231                 # don't have `__iter__` and `__next__` methods.
--> 232                 result = coro.send(None)
    233             else:
    234                 result = coro.throw(exc)

[/usr/local/lib/python3.10/dist-packages/async_retriever/_utils.py](https://localhost:8080/#) in retriever(uid, url, s_kwds, session, read_type, r_kwds, raise_status)
     83         except (ClientResponseError, ValueError) as ex:
     84             if raise_status:
---> 85                 raise ServiceError(await response.text(), str(response.url)) from ex
     86             return uid, None
     87 

[/usr/local/lib/python3.10/dist-packages/aiohttp/client_reqrep.py](https://localhost:8080/#) in text(self, encoding, errors)
   1146             encoding = self.get_encoding()
   1147 
-> 1148         return self._body.decode(  # type: ignore[no-any-return,union-attr]
   1149             encoding, errors=errors
   1150         )

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 31378: invalid start byte

Anything else we need to know?

No response

Environment

SYS INFO

commit: None
python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0]
python-bits: 64
OS: Linux
OS-release: 6.1.58+
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')

PACKAGE VERSION

async-retriever 0.16.0
pygeoogc 0.16.1
pygeoutils 0.16.1
py3dep N/A
pynhd 0.16.2
pygridmet N/A
pydaymet N/A
hydrosignatures 0.16.0
pynldas2 N/A
pygeohydro 0.16.0
aiohttp 3.9.3
aiohttp-client-cache 0.11.0
aiosqlite 0.20.0
cytoolz 0.12.3
ujson 5.9.0
defusedxml 0.7.1
joblib 1.3.2
multidict 6.0.5
owslib 0.30.0
pyproj 3.6.1
requests 2.31.0
requests-cache 1.2.0
shapely 2.0.3
url-normalize 1.4.3
urllib3 2.0.7
yarl 1.9.4
geopandas 0.13.2
netcdf4 1.6.5
numpy 1.25.2
rasterio 1.3.9
rioxarray 0.15.3
scipy 1.11.4
xarray 2023.7.0
click 8.1.7
pyflwdir N/A
networkx 3.2.1
pyarrow 14.0.2
folium 0.14.0
h5netcdf 1.3.0
matplotlib 3.7.1
pandas 2.0.3
numba 0.58.1
bottleneck N/A
py7zr N/A
pyogrio N/A

NLCD not working

What happened?

Running any nlcd* fails, with the following error:
ServiceUnavailableError: Service is currently not available, try again later: https://www.mrlc.gov/geoserver/mrlc_download/wms

What did you expect to happen?

I expected the service to be available again within a few days. But the outage is continuing longer than I expected. i am wondering if there is a breaking change in the API.

Minimal Complete Verifiable Example

import pygeohydro  as gh
nlcd = gh.nlcd_bycoords([(-87, 34)])

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

INSTALLED VERSIONS

commit: None
python: 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:56:21)
[GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 5.15.0-72-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.6
libnetcdf: 4.8.0

aiodns: 3.0.0
aiohttp: 3.8.3
aiohttp-client-cache: 0.7.3
aiosqlite: 0.17.0
async-retriever: 0.3.6
bottleneck: 1.3.5
brotli: installed
cchardet: 2.1.7
click: 7.1.2
cytoolz: 0.12.0
dask: 2022.10.0
defusedxml: 0.7.1
folium: 0.13.0
geopandas: 0.11.1
lxml: 4.8.0
matplotlib: 3.4.3
netCDF4: 1.5.7
networkx: 2.8.7
numpy: 1.23.4
owslib: 0.27.2
pandas: 1.5.1
py3dep: 0.13.6
pyarrow: 5.0.0
pydantic: 1.10.2
pydaymet: 0.13.6
pygeohydro: 0.13.6
pygeoogc: 0.13.6
pygeos: 0.10.2
pygeoutils: 0.13.6
pynhd: 0.13.6
pyproj: 3.3.1
pytest: None
pytest-cov: None
rasterio: 1.2.1
requests: 2.28.1
requests-cache: 0.9.6
richdem: 0.3.4
rioxarray: 0.12.2
scipy: 1.9.2
shapely: 1.8.0
tables: None
ujson: 5.5.0
urllib3: 1.26.11
xarray: 2022.10.0
xdist: None
yaml: 6.0

NWIS data retrieval enhancement ideas

It would be great to be able to set some parameter to ensure that the retrieved NWIS data are in UTC.

Also it would be nice to have the ability to return the data (along with metadata such as units!) as an xarray dataset instead of a pandas dataframe.

There is an example NWIS code here by @dnowacki-usgs that optionally returns an xarray dataset.

Example code for NWIS query does not work

What happened:
Running the example code to generate a list of NWIS sites throws the following error 'NWIS' object has no attribute 'query_bybox'

Minimal Complete Verifiable Example:

from pygeohydro import NWIS

nwis = NWIS()
query = {
    **nwis.query_bybox(bbox),
    "hasDataTypeCd": "dv",
    "outputDataTypeCd": "dv",
}
info_box = nwis.get_info(query)
dates = ("2000-01-01", "2010-12-31")
stations = info_box[
    (info_box.begin_date <= dates[0]) & (info_box.end_date >= dates[1])
].site_no.tolist()

Environment:

Output of pygeohydro.show_versions() INSTALLED VERSIONS

commit: None
python: 3.8.13 | packaged by conda-forge | (default, Mar 25 2022, 06:04:10)
[GCC 10.3.0]
python-bits: 64
OS: Linux
OS-release: 4.15.0-167-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US
LOCALE: en_US.ISO8859-1
libhdf5: 1.12.1
libnetcdf: 4.8.1

aiodns: 3.0.0
aiohttp: 3.8.1
aiohttp-client-cache: 0.7.1
aiosqlite: 0.17.0
async-retriever: 0.3.3
bottleneck: 1.3.4
brotli: installed
cchardet: 2.1.7
click: 6.7
cytoolz: 0.11.2
dask: 2022.6.1
defusedxml: 0.7.1
folium: 0.12.1.post1
geopandas: 0.11.0
lxml: 4.8.0
matplotlib: 3.4.3
netCDF4: 1.6.0
networkx: 2.8.4
numpy: 1.23.0
owslib: 0.25.0
pandas: 1.4.3
py3dep: 0.13.1
pyarrow: 6.0.1
pydantic: 1.9.1
pydaymet: 0.13.1
pygeohydro: 0.13.2
pygeoogc: 0.13.2
pygeos: 0.12.0
pygeoutils: 0.13.2
pynhd: 0.13.2
pyproj: 3.3.0
pytest: None
pytest-cov: None
rasterio: 1.2.10
requests: 2.28.1
requests-cache: 0.9.4
richdem: 0.3.4
rioxarray: 0.11.1
scipy: 1.8.1
shapely: 1.8.2
tables: 3.7.0
ujson: 5.3.0
urllib3: 1.26.9
xarray: 2022.3.0
xdist: None
yaml: 6.0```

</details>

NLCD not working

What happened:
Running any nlcd* fails.

What you expected to happen:
It should work!

Minimal Complete Verifiable Example:

import pygeohydro  as gh

nlcd = gh.nlcd_bycoords([(-87.11890, 34.70421)])

Anything else we need to know?:

Environment:

Output of pygeohydro.show_versions()
INSTALLED VERSIONS
------------------
commit: 2d4c4ed0aa39f85ff62a47c52645dbc80b9dceb0
python: 3.10.2 | packaged by conda-forge | (main, Jan 14 2022, 08:03:02) [Clang 11.1.0 ]
python-bits: 64
OS: Darwin
OS-release: 21.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.12.1
libnetcdf: 4.8.1

aiodns: 3.0.0
aiohttp: 3.8.1
aiohttp-client-cache: 0.6.1
aiosqlite: 0.17.0
async-retriever: 0.3.2.dev20+g8f29acf
bottleneck: 1.3.2
brotli: installed
cchardet: 2.1.7
click: 8.0.3
cytoolz: 0.11.2
dask: 2022.02.0
defusedxml: 0.7.1
folium: 0.12.1.post1
geopandas: 0.10.2
lxml: 4.7.1
matplotlib: 3.5.1
netCDF4: 1.5.8
networkx: 2.6.3
numpy: 1.21.5
owslib: 0.25.0
pandas: 1.4.1
py3dep: 0.12.3.dev9+g469244f
pyarrow: 7.0.0
pydantic: 1.8.1
pydaymet: 0.12.3
pygeohydro: 0.12.4
pygeoogc: 0.12.3.dev16+g99eff81
pygeos: 0.12.0
pygeoutils: 0.12.4.dev2+g7f077f9
pynhd: 0.3.2.dev20+g8f29acf
pyproj: 3.3.0
pytest: 7.0.1
pytest-cov: 3.0.0
rasterio: 1.2.10
requests: 2.26.0
requests-cache: 0.9.2
richdem: 2.3.0
rioxarray: 0.10.0
scipy: 1.8.0
shapely: 1.8.0
tables: 3.7.0
ujson: 5.1.0
urllib3: 1.26.7
xarray: 0.21.1
xdist: 2.5.0
yaml: 6.0
None

Inclusion of NLCD 2021 Data

Is your feature request related to a problem?

Following the NLCD 2021 data release announcement, I'm looking forward to updating workflows to utilize the most recent land cover data. Would it be possible to add the 2021 data for the pygeohydro nlcd functions?

Describe the solution you'd like

Adding 2021 as a selectable option for the "years" arguments.

Describe alternatives you've considered

No response

Additional context

No response

Retrieve records availability from NWIS

Is your feature request related to a problem?

Currently, the get_info function from NWIS does not return info like data availability range.

from pygeohydro import NWIS

nwis = NWIS()
SiteID = "01636500"
ParamCd = "00060"
query = {
    "site": SiteID,
    "parameterCd": ParamCd,
    "siteStatus": "all",
}
SiteInfo = nwis.get_info(query, expanded=True)
print(SiteInfo.columns)
Index(['agency_cd', 'site_no', 'station_nm', 'site_tp_cd', 'dec_lat_va',
       'dec_long_va', 'coord_acy_cd', 'dec_coord_datum_cd', 'alt_va',
       'alt_acy_va', 'alt_datum_cd', 'huc_cd', 'lat_va', 'long_va',
       'coord_meth_cd', 'coord_datum_cd', 'district_cd', 'state_cd',
       'county_cd', 'country_cd', 'land_net_ds', 'map_nm', 'map_scale_fc',
       'alt_meth_cd', 'basin_cd', 'topo_cd', 'instruments_cd',
       'construction_dt', 'inventory_dt', 'drain_area_va',
       'contrib_drain_area_va', 'tz_cd', 'local_time_fg', 'reliability_cd',
       'gw_file_cd', 'hcdn_2009', 'geometry'],
      dtype='object')

But it's available in the xarray retrieved using get_streamflow function, begin_date and end_date .

SiteFlow = nwis.get_streamflow(SiteID, dates=("2010-01-01", "2010-01-05"), to_xarray=True)
SiteFlow

image

I feel it's better to examine the availability range, then decide the dates we use in get_streamflow.

Describe the solution you'd like

It's directly available thru NWIS site service by setting seriesCatalogOutput to True:

url = f"https://waterservices.usgs.gov/nwis/site/?format=rdb&sites=01636500&seriesCatalogOutput=true&siteStatus=all&hasDataTypeCd=dv&outputDataTypeCd=dv"
r = requests.get(url, allow_redirects=True)
content = r.content.decode('utf-8')
lines = content.split('\n')
start_index = next(i for i, line in enumerate(lines) if not line.startswith('#'))
column_names = lines[start_index].split('\t')
data_rows = [line.split('\t') for line in lines[start_index+2:] if line.strip()]
df = pd.DataFrame(data_rows, columns=column_names)
df

image

Describe alternatives you've considered

No response

Additional context

No response

Add support for SensorThings

Is your feature request related to a problem? Please describe.
No. USGS water data has a new web service called SensorThings that provides access to many USGS datasets.

Describe the solution you'd like
A demo repository of its initial implementation is here.

Describe alternatives you've considered
N/A

Additional context
N/A

NHDPlus Implementation

Before I offer my suggestion, I may be missing the utility of shipping the NHD with the repo. With that in mind and if you don't mind elaborating later, what are your thought on moving away from shipping the NHDPlus dataset to users and instead relying on the USGS's api to verify and obtain gauge metadata? It should be a straight forward call that doesn't require a key.

Error when using readme tutorial on gh.nlcd_bygeom() and gh.cover_statistics()... Updated syntax?

Hello,

First of all thanks for this tutorial & code, super helpful. Just wondering if the syntax has been updated or if I'm just misunderstanding:

I tired to run the following code in the readme:

import pygeohydro as gh
from pynhd import NLDI

basins = NLDI().get_basins(["01031450", "01031500", "01031510"])
lulc = gh.nlcd_bygeom(geometry, 100, years={"cover": [2016, 2019]})
stats = gh.cover_statistics(lulc.cover_2016) 

and got the following error:
NameError: name 'geometry' is not defined

Well no surprise... I changed "geometry" to "basins" and reran to get the follow error message:
AttributeError: 'dict' object has no attribute 'cover_2016

So ultimate I ran something like this:


basins = NLDI().get_basins(["01031450", "01031500", "01031510"])
lulc = gh.nlcd_bygeom(basins, 100, years={"cover": [2016, 2019]})
stats = gh.cover_statistics(lulc["01031450"]['cover_2016'])
stats

{'classes': {'Open Water': 2.846106932303314,
'Developed, Open Space': 2.240292742427323,
'Developed, Low Intensity': 0.5488920512299248,
'Developed, Medium Intensity': 0.1910957511689368,
'Developed, High Intensity': 0.028461069323033134,
'Deciduous Forest': 28.660296808294365,
'Evergreen Forest': 14.474486684285424,
'Mixed Forest': 29.58731449481602,
'Shrub-Forest': 11.20552957918276,
'Herbaceous-Forest': 2.4232567595039645,
'Shrub/Scrub': 0.14637121366131328,
'Grassland/Herbaceous': 0.028461069323033134,
'Pasture/Hay': 0.5976824557836958,
'Cultivated Crops': 0.06505387273836145,
'Woody Wetlands': 6.753405163651148,
'Emergent Herbaceous Wetlands': 0.20329335230737955},
'categories': {'Background': 0.0,
'Unclassified': 0.0,
'Water': 2.846106932303314,
'Developed': 3.0087416141492174,
'Barren': 0.0,
'Forest': 86.35088432608254,
'Shrubland': 0.14637121366131328,
'Herbaceous': 0.028461069323033134,
'Planted/Cultivated': 0.6627363285220573,
'Wetlands': 6.956698515958529}}

WBD Feature returning keyerror: 'layers'

What happened?

image
When loading a huc6 byid, received KeyError: "layers"

What did you expect to happen?

Expected huc6 170900 polygon to be loaded into notebook.

Minimal Complete Verifiable Example

No response

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

No response

Anything else we need to know?

No response

Environment

Handling server disconnects

I need to run millions of points against the new gh.nlcd_bycoords. During a recent run, the server disconnected:

  File "/home/kovar/anaconda3/envs/a50-dev/lib/python3.9/site-packages/aiohttp/c
lient.py", line 559, in _request
    await resp.start(conn)
  File "/home/kovar/anaconda3/envs/a50-dev/lib/python3.9/site-packages/aiohttp/c
lient_reqrep.py", line 898, in start
    message, payload = await protocol.read()  # type: ignore[union-attr]
  File "/home/kovar/anaconda3/envs/a50-dev/lib/python3.9/site-packages/aiohttp/s
treams.py", line 616, in read
    await self._waiter
aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected

Questions:

  1. Should pygeohydro handle this more gracefully or:
  2. Is it up to me to handle it and, if so, how?
  3. I need to rerun this, and many other collections of points. Am I abusing the server? Is the a way to do this locally?

Thank you.

-David

Monthly & annual SSEBop ET available via OPenDAP from USGS THREDDS server

Re: https://github.com/cheginit/hydrodata/blob/master/hydrodata/datasets.py#L854

Since there's still no web service available for subsetting SSEBop, the data first
needs to be downloaded for the requested period then it is masked by the
region of interest locally. Therefore, it's not as fast as other functions and
the bottleneck could be the download speed.

FYI, there is an OPeNDAP endpoint available from the USGS CIDA THREDDS server (managed by David Blodgett, I think) for monthly and annual SSEBop ET -- though not daily:

Missing examples/tutorial.ipynb

Description

Its seems examples/tutorial.ipynb which was added on commit 68fe37f has been removed. What are the future plans for examples/ and if you would like me to write something up, do you have features in mind you would like me to showcase?

Shapely import issue on Darwin

  • Hydrodata version: 0.4.4
  • Python version: 3.7.7
  • Operating System: macOS 10.14.4
  • Using Conda

Description

I found an issue when importing any from shapely.geometry packages on Shapely 1.7.0.
Getting the following:

File "/Users/austinraney/miniconda3/envs/hydrodata/lib/python3.7/site-packages/shapely/geos.py", line 62, in load_dll
    libname, fallbacks or []))
OSError: Could not find lib cxx or load any of its variants [].

I found the related issue and it seems that the PR fixed it (I tested the change on my system at least). Just something to know about. Hopefully they will update the package on PyPI soon.

What I Did

python -c "from shapely.geometry import Point"

Inconsistent results between groups of coordinates and single coordinates.

    import geopandas as gpd
    import pygeohydro as gh

    DATA_URL = (
        "Resources/Overlays/Landmarks/Energy_-_Nuclear/Energy_-_Nuclear.shp"
    )
    gdf = gpd.read_file(DATA_URL).to_crs("epsg:4326").head(3)
    coords = list(zip(gdf.geometry.x, gdf.geometry.y))

    print("Data read in...")

    result = gh.nlcd_bycoords(coords, years={"cover": [2019]})
    print(f"PyGeoHydro\n----------\n{result}")

    print("\n\nTesting single points\n")

    coord = (gdf.geometry[0].x, gdf.geometry[0].y)
    result = gh.nlcd_bycoords([coord], years={"cover": [2019]})
    print(f"PyGeoHydro\n----------\n{result}")

This produces:

Data read in...
PyGeoHydro
----------
                     geometry  cover_2019
0  POINT (-87.11890 34.70420)          23
1  POINT (-88.83390 40.17190)          24
2  POINT (-95.68978 38.23926)          24


Testing single points

PyGeoHydro
----------
                     geometry  cover_2019
0  POINT (-87.11890 34.70420)          24

The first point is the same in each case but the land use value differs.

Anything else we need to know?:

Environment:

Output of pygeohydro.show_versions() ``` INSTALLED VERSIONS ------------------ commit: 9729f67e75fe31fa6b5eb122562e4c0c22792c6d python: 3.9.9 | packaged by conda-forge | (main, Dec 20 2021, 02:41:07) [Clang 11.1.0 ] python-bits: 64 OS: Darwin OS-release: 21.1.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

aiohttp-client-cache>=0.5.1: None
aiohttp>=3.8.1: None
aiosqlite: 0.17.0
async-retriever: 0.3.1
async-retriever>=0.3.1: None
cytoolz: 0.11.2
dask: 2021.12.0
defusedxml: 0.7.1
folium: 0.12.1.post1
geopandas>=0.7: None
lxml: 4.7.1
matplotlib>=3.0: None
netCDF4: 1.5.8
networkx: 2.6.3
numpy>=1.17: None
owslib: 0.25.0
pandas>=1.0: None
pip: 21.3.1
py3dep: None
pyarrow: 6.0.1
pydantic: 1.9.0
pydaymet: None
pygeohydro: 0.12.2
pygeoogc: 0.12.1
pygeoogc>=0.12: None
pygeoutils: 0.12.1
pygeoutils>=0.12: None
pynhd: 0.3.1
pynhd>=0.12: None
pyproj>=2.2: None
pytest: None
rasterio>=1.2: None
requests: 2.27.1
requests-cache>=0.8: None
rioxarray>=0.8: None
scipy: 1.7.3
setuptools: 60.5.0
shapely>=1.6: None
ujson: 5.1.0
urllib3: 1.26.8
ward: None
xarray>=0.18: None
yaml: 6.0

</details>

Add support for the new NWCC REST API

Is your feature request related to a problem?

NWCC released a new REST API that provides access to many snow-related datasets. The Swagger UI can be found here.

Describe the solution you'd like

They have a demo provided here.

Describe alternatives you've considered

No response

Additional context

No response

NLCD by location

For an aviation use case - "Where did the drone launch?' - I'd like to get the land use for a lot of points in the U.S.

A similar use case that I use is "What is the elevation at a particular point?" To answer this, I run an https://open-elevation.com/ docker instance and use the API to pass in thousands of lat/lon pairs. It returns the pairs with the elevation and I take the elevation and add it as a column to my dataframe.

A good solution would be a function that operated on a local copy of the NLCD database, took lat/lon pairs, and returned the text description of the land use. The pairs could be distinct lat / lon values or possibly a geodataframe with one or more points.

If this function was vectorized and could process large numbers of points quickly that would be a bonus but not necessary.

The best option I could come up with involved setting a bounding box around each point. (See Discussions for details.)

NLCD not working

What happened:
The NLCD service is down.

What you expected to happen:
Layer names have been changed and some of the science products are not available as well.

Minimal Complete Verifiable Example:

import pygeohydro as gh
from pynhd import NLDI

geometry = NLDI().get_basins("01031500").geometry[0]
lulc = gh.nlcd(geometry, 100, years={"impervious": None, "cover": 2016, "canopy": None})

Anything else we need to know?:

The MRLC developers are working on updating the database and adding 2019 version. On their website they state that the science product will be added soon. I contacted them and they gave a two-week time frame for bringing the service back with the new dataset.

I have already added support for the new dataset (with this commit and tested it with the mrlc_display layers (not the science product). Once it's back up I will carry out the final tests and release a new version.

Environment:

Output of pygeohydro.show_versions()
INSTALLED VERSIONS
------------------
commit: 90ee816f2741e0f969327406fd17f77676a8a62e
python: 3.9.5 | packaged by conda-forge | (default, Jun 19 2021, 00:32:32) 
[GCC 9.3.0]
python-bits: 64
OS: Linux
OS-release: 5.8.0-59-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

affine: 2.3.0
aiohttp: 3.7.4.post0
aiohttp-client-cache: 0.4.0
aiosqlite: 0.17.0
async-retriever: 0.2.1.dev28+ga7c8f33.d20210706
click: 7.1.2
cytoolz: 0.11.0
dask: 2021.06.2
defusedxml: 0.7.1
folium: unknown
geopandas: 0.9.0
lxml: 4.6.3
matplotlib: 3.4.2
nest-asyncio: installed
netCDF4: 1.5.6
networkx: 2.5.1
numpy: 1.21.0
orjson: 3.5.4
owslib: 0.24.1
pandas: 1.3.0
pip: 21.1.3
py3dep: 0.11.1.dev18+g13e8ea1
pyarrow: 4.0.0
pydantic: 1.8.2
pydaymet: 0.11.1.dev12+g1096693.d20210706
pygeohydro: 0.11.1.dev11+g90ee816.d20210706
pygeoogc: 0.11.1.dev31+g1c457f5.d20210706
pygeoutils: 0.11.2.dev25+g05892c6.d20210706
pynhd: 0.11.1.dev13+g3b76c3d.d20210706
pyproj: 3.1.0
pytest: 6.2.4
rasterio: 1.2.6
requests: 2.25.1
requests-cache: 0.6.4
scipy: 1.7.0
setuptools: 49.6.0.post20210108
shapely: 1.7.1
simplejson: 3.17.2
urllib3: 1.26.6
ward: None
xarray: 0.18.2
yaml: 5.4.1
None

py3dep.elevation_bycoords returns 'None' for valid locations

What happened:
Passing a list of zipped lat,lon values to py3dep.elevation_bycoords returned 'None' for all pairs.

What you expected to happen:

Expected a list of elevations.

Minimal Complete Verifiable Example:

import py3dep

coords = [(42.69513, -71.030437),
 (42.694901, -71.027653),
 (42.695388, -71.026931),
 (42.695383, -71.026942),
 (42.696471, -71.023837),
 (42.697136, -71.023545),
 (42.699233, -71.024387),
 (42.698356, -71.021488),
 (42.696643, -71.023499),
 (42.694305, -71.030054),
 (42.693343, -71.03474),
 (42.693349, -71.034757),
 (42.694002, -71.035491),
 (42.693452, -71.033743)]

elev = py3dep.elevation_bycoords(coords, crs="epsg:4326")
print (elev)

[None, None, None, None, None, None, None, None, None, None, None, None, None, None]

Anything else we need to know?:

Environment:

Output of pygeohydro.show_versions() ``` INSTALLED VERSIONS ------------------ commit: 9729f67e75fe31fa6b5eb122562e4c0c22792c6d python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:24:02) [Clang 11.1.0 ] python-bits: 64 OS: Darwin OS-release: 21.1.0 machine: arm64 processor: arm byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

aiohttp: 3.8.1
aiohttp-client-cache: 0.5.2
aiosqlite: 0.17.0
async-retriever: 0.2.0
async-retriever>=0.2: None
click: 8.0.3
cytoolz: 0.11.2
dask: 2021.12.0
defusedxml: 0.7.1
geopandas>=0.7: None
lxml: 4.7.1
nest-asyncio: installed
netCDF4: 1.5.8
networkx: 2.6.3
numpy>=1.17: None
orjson: 3.6.5
owslib: 0.25.0
pandas>=1.0: None
pip: 21.3.1
py3dep: 0.11.4
py3dep>=0.11.3: None
pyarrow: 6.0.1
pydantic: 1.8.2
pydaymet: 0.11.4
pygeohydro: 0.11.4
pygeoogc: 0.11.7
pygeoogc>=0.11: None
pygeoogc>=0.11.5: None
pygeoutils: 0.11.7
pygeoutils>=0.11: None
pygeoutils>=0.11.5: None
pynhd: 0.11.1
pyproj>=2.2: None
pytest: None
rasterio>=1.2: None
requests: 2.26.0
requests-cache>=0.8: None
rioxarray>=0.8: None
scipy: 1.7.3
setuptools: 59.4.0
shapely>=1.6: None
simplejson: 3.17.6
ujson: 4.2.0
urllib3: 1.26.7
ward: None
xarray>=0.18: None
yaml: 6.0

</details>

Did something change with NWIS?

What happened:
my NWIS example stopped working

What you expected to happen:

Minimal Complete Verifiable Example:

from pygeohydro import NWIS

nwis = NWIS()

start = '1979-02-01T01:00:00'
stop =  '2020-12-31T23:00:00'

sta = ['USGS-01030350', 'USGS-01030500']

ds_obs = nwis.get_streamflow(sta, (start,stop), to_xarray=True)

I tried pygeohydro versions 0.13.0 and 0.13.1

Add support for Water Quality Portal

Is your feature request related to a problem? Please describe.
The Water Quality Portal has a RESTful service that could be useful.

Describe the solution you'd like
The documentation for this service can be found here

Describe alternatives you've considered
N/A

Additional context
N/A

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.