Comments (11)
Thanks for filing this! I'm looking into this problem now.
What I've figured out so far:
- This is a large request! It has 546 stations in it. Good thing you didn't ask for iv values!
- One of the stations returned some duplicate values in the time index, somehow. Instead of 366 days of data being returned, apparently 368 were returned by one of the datasets.
I'm still working on this!
-Marty
from hydrofunctions.
It turns out that the request is 31.2 MB! That's without zip compression.
I added a few lines of code that checks for duplicated rows and gets rid of them. This request works now, but it takes forever to combine all of the dataseries into one table. Your error message had 546 data series in it, but it was just getting started when it choked on the bad data! The final dataframe has 2618 columns!! Many of these are for temperature readings, which get summarized with a daily max and a daily min and a third column too.
BTW, this is a much smaller request that duplicates your problem:
request2 = hf.NWIS('03107698', "dv", start, end)
from hydrofunctions.
closed with merged pull request #47.
from hydrofunctions.
@taataam
I'm trying to think how you can get this bugfix installed. Unfortunately, my new version has changed the internals substantially, so I can't just patch my old version with the fix. I'm getting ready to release version 1.8, but I've got to rewrite a lot of the docstrings and the user's manual, so you probably don't want to wait a week or two for that.
You can install the new version directly from github however. Try using:
pip install git+https://github.com/mroberge/hydrofunctions.git@develop
I'm about to merge the bugfix into develop now too.
from hydrofunctions.
@mroberge Thank you for your quick response and help. I will give it a try.
All the other states worked fine. I think it took about half an hour for the data of all the states over a period of one year to be downloaded and saved to a HDF file. My final goal is to get the data for a period of 20 or 30 years.
from hydrofunctions.
@taataam So you are trying to download all of the data from all of the states for the past 20 to 30 years?
That is a lot!!!
One thing you can do is to limit your requests to only the discharge data. You probably don't want the temperature or chemistry data, for example.
Also, you might want to reconsider getting all of the data locally. Why not use the internet as your hard drive, and request the data at the moment you need it? For example, if you wanted to calculate a flow duration chart for every station, you could download all of the data for one station, create your chart, and then move on to the next station.
If you include all of the EPA chemistry data, there are over a million data collection sites!!!
from hydrofunctions.
@mroberge I think I read somewhere in your documentation that by default it downloads only the discharge data. In the final data that I got with my code, there were only two columns other than date, discharge and the qualification. So do I have to explicitly give the data type in the request line?
The reason that I download it locally is exactly because of the large amount of computations that I am planning to do with the data. They act as checkpoints so if something goes wrong somewhere in the code, whether a bug or a hardware issues (specially on a cluster) I don't have to do everything from the beginning.
from hydrofunctions.
In the new versions, the software will request every variable that gets measured at a site unless you specify which parameter that you want. So, for example, if you only want discharge, then you can do this:
my_PA_discharge = hf.NWIS(service='dv', parameterCd='00060', stateCd='pa' )
I'm sorry that the User's Guide is in such a woeful state! The docstrings do a much better job of explaining the parameters, and I've kept them up to date better. You can access them in IPython by typing ?func_name
or using the help()
function, like this: help(hf.NWIS)
.
I haven't been updating the User's Guide much lately because the code has been going through some major changes. Now that I've merged everything into my develop
branch, I'm going to be working on the documentation before releasing version 0.1.8. I may even make this 0.2.0, but we'll see.
Please feel free to contact me by email too.
-Marty
from hydrofunctions.
Thanks for the tip. Then, I will check the help for now. The library is very useful, thanks for the time and effort.
from hydrofunctions.
from hydrofunctions.
Thank you. Sure, would be happy to contribute as much as I can.
from hydrofunctions.
Related Issues (20)
- 'discharge' does not filter to only discharge HOT 5
- Make pyarrow an optional dependency
- Some NWIS sites have non-standard parameter codes
- SSL Certification error HOT 3
- mysterious hydrofunctions_testing.log HOT 3
- extract_nwis_df() function returns a tuple with dataframe and dictionary HOT 4
- annual statistics creates invalid URL HOT 1
- The hydrofunctions docstring example no longer works HOT 2
- Request site data from NWIS for stations HOT 2
- requesting rdb data from nonexistent site returns uncaught error
- Update dependencies HOT 7
- Add functionality for multiple sensor / parameter code combinations HOT 6
- What is the relationship between hydrofunctions, Ulmo, dataretrieval, HyRiver, and others? HOT 13
- hf.peaks ParserError month HOT 6
- NWIS & get_nwis should print url before making request HOT 1
- Pandas error: TypeError: Cannot compare type 'Timedelta' with type 'Minute' HOT 5
- Add additional USGS webservices HOT 1
- Update CI system HOT 1
- extract_nwis_df wrong variable name HOT 5
- Create a 'verbose' mode that can be turned off. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hydrofunctions.