Git Product home page Git Product logo

Comments (4)

lekoenig avatar lekoenig commented on September 13, 2024

Thanks, @msleckman! This sounds really useful for speeding up our build time for the met data. I'm kind of confused where we would implement this swap though - for example, I tried replacing ds_to_dataframe_faster(ds_comids) with ds.to_dataframe(ds_comids).reset_index() in the subset_nc_to_comids() function (in 2_process/src/subset_nc_to_comid.py), but I got an error when I tried to run that. I'm probably misinterpreting how we'd use this other xarray function and so any tips you have would be great 🙂

from drb-gw-hw-model-prep.

lekoenig avatar lekoenig commented on September 13, 2024

I was able to modify the function to incorporate your suggestions and use ds.to_dataframe(ds_comids).reset_index() instead of calling our previously-defined ds_to_dataframe_faster() function.

def subset_nc_to_comids(nc_file, comids):
    comids = [int(c) for c in comids]

    ds = xr.open_dataset(nc_file, decode_times = True)

    # filter out comids that are not in climate drivers (should only be 4781767)
    comids = np.array(comids)
    comids_in_climate = comids[np.isin(comids, ds.COMID.values)]
    comids_not_in_climate = comids[~np.isin(comids, ds.COMID.values)]
    print(comids_not_in_climate)

    # We know of one COMID that has no catchment and so should be included
    # in `comids_not_in_climate` if passed through in `comids`. Use assert 
    # statement to make sure we are aware of any others. COMIDs within 
    # `comids_not_in_climate` will not have matched climate data. 
    if len(comids_not_in_climate) > 0 :
        assert list(comids_not_in_climate) == [4781767]
    ds_comids = ds.sel(COMID=comids_in_climate)
    # [Lauren] we have been using a function written by Jeff Sadler for the DRB
    # PGDL-DO project to process the xarray object to a ~tiday data frame. Below
    # I've replaced ds_to_dataframe_faster(ds_comids) with a more generic function
    # to speed up the run time. See this issue for further details: 
    # https://github.com/USGS-R/drb-gw-hw-model-prep/issues/44.
    ds_comids_df = ds_comids.to_dataframe().reset_index()
    return ds_comids_df

However, I don't notice great time improvements like you report in your examples above. The build time was previously ~34 min for me:

> tar_meta() %>% filter(name == "p2_met_data_nhd_mainstem_reaches") %>% pull(seconds)/60
[1] 33.42533

And with the changes to subset_nc_to_comids(), it's ~30 min:

> tar_meta() %>% filter(name == "p2_met_data_nhd_mainstem_reaches") %>% pull(seconds)/60
[1] 29.52967
>

Did you have other edits in mind besides what I pasted here?

from drb-gw-hw-model-prep.

lekoenig avatar lekoenig commented on September 13, 2024

@msleckman I made an attempt to incorporate your suggestions (see above) but didn't see much improvement in the build time. So I've unassigned myself from this issue and added a wontfix label. I'm not sure if my attempt fully captured what you had in mind, so if you're able to implement this please feel free to do so.

from drb-gw-hw-model-prep.

lekoenig avatar lekoenig commented on September 13, 2024

Closing this issue for now as wontfix.

from drb-gw-hw-model-prep.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.