Git Product home page Git Product logo

Comments (11)

mroberge avatar mroberge commented on May 20, 2024

Thanks for filing this! I'm looking into this problem now.
What I've figured out so far:

  • This is a large request! It has 546 stations in it. Good thing you didn't ask for iv values!
  • One of the stations returned some duplicate values in the time index, somehow. Instead of 366 days of data being returned, apparently 368 were returned by one of the datasets.

I'm still working on this!
-Marty

from hydrofunctions.

mroberge avatar mroberge commented on May 20, 2024

It turns out that the request is 31.2 MB! That's without zip compression.

I added a few lines of code that checks for duplicated rows and gets rid of them. This request works now, but it takes forever to combine all of the dataseries into one table. Your error message had 546 data series in it, but it was just getting started when it choked on the bad data! The final dataframe has 2618 columns!! Many of these are for temperature readings, which get summarized with a daily max and a daily min and a third column too.

BTW, this is a much smaller request that duplicates your problem:
request2 = hf.NWIS('03107698', "dv", start, end)

from hydrofunctions.

mroberge avatar mroberge commented on May 20, 2024

closed with merged pull request #47.

from hydrofunctions.

mroberge avatar mroberge commented on May 20, 2024

@taataam
I'm trying to think how you can get this bugfix installed. Unfortunately, my new version has changed the internals substantially, so I can't just patch my old version with the fix. I'm getting ready to release version 1.8, but I've got to rewrite a lot of the docstrings and the user's manual, so you probably don't want to wait a week or two for that.

You can install the new version directly from github however. Try using:
pip install git+https://github.com/mroberge/hydrofunctions.git@develop

I'm about to merge the bugfix into develop now too.

from hydrofunctions.

cheginit avatar cheginit commented on May 20, 2024

@mroberge Thank you for your quick response and help. I will give it a try.

All the other states worked fine. I think it took about half an hour for the data of all the states over a period of one year to be downloaded and saved to a HDF file. My final goal is to get the data for a period of 20 or 30 years.

from hydrofunctions.

mroberge avatar mroberge commented on May 20, 2024

@taataam So you are trying to download all of the data from all of the states for the past 20 to 30 years?
That is a lot!!!

One thing you can do is to limit your requests to only the discharge data. You probably don't want the temperature or chemistry data, for example.

Also, you might want to reconsider getting all of the data locally. Why not use the internet as your hard drive, and request the data at the moment you need it? For example, if you wanted to calculate a flow duration chart for every station, you could download all of the data for one station, create your chart, and then move on to the next station.

If you include all of the EPA chemistry data, there are over a million data collection sites!!!

from hydrofunctions.

cheginit avatar cheginit commented on May 20, 2024

@mroberge I think I read somewhere in your documentation that by default it downloads only the discharge data. In the final data that I got with my code, there were only two columns other than date, discharge and the qualification. So do I have to explicitly give the data type in the request line?

The reason that I download it locally is exactly because of the large amount of computations that I am planning to do with the data. They act as checkpoints so if something goes wrong somewhere in the code, whether a bug or a hardware issues (specially on a cluster) I don't have to do everything from the beginning.

from hydrofunctions.

mroberge avatar mroberge commented on May 20, 2024

In the new versions, the software will request every variable that gets measured at a site unless you specify which parameter that you want. So, for example, if you only want discharge, then you can do this:

my_PA_discharge = hf.NWIS(service='dv', parameterCd='00060', stateCd='pa' ) 

I'm sorry that the User's Guide is in such a woeful state! The docstrings do a much better job of explaining the parameters, and I've kept them up to date better. You can access them in IPython by typing ?func_name or using the help() function, like this: help(hf.NWIS).

I haven't been updating the User's Guide much lately because the code has been going through some major changes. Now that I've merged everything into my develop branch, I'm going to be working on the documentation before releasing version 0.1.8. I may even make this 0.2.0, but we'll see.

Please feel free to contact me by email too.

-Marty

from hydrofunctions.

cheginit avatar cheginit commented on May 20, 2024

Thanks for the tip. Then, I will check the help for now. The library is very useful, thanks for the time and effort.

from hydrofunctions.

mroberge avatar mroberge commented on May 20, 2024

from hydrofunctions.

cheginit avatar cheginit commented on May 20, 2024

Thank you. Sure, would be happy to contribute as much as I can.

from hydrofunctions.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.