<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-

tidy netcdf with xarray about drb-gw-hw-model-prep HOT 4 CLOSED

msleckman commented on September 13, 2024

tidy netcdf with xarray

from drb-gw-hw-model-prep.

Comments (4)

lekoenig commented on September 13, 2024

Thanks, @msleckman! This sounds really useful for speeding up our build time for the met data. I'm kind of confused where we would implement this swap though - for example, I tried replacing ds_to_dataframe_faster(ds_comids) with ds.to_dataframe(ds_comids).reset_index() in the subset_nc_to_comids() function (in 2_process/src/subset_nc_to_comid.py), but I got an error when I tried to run that. I'm probably misinterpreting how we'd use this other xarray function and so any tips you have would be great 🙂

from drb-gw-hw-model-prep.

lekoenig commented on September 13, 2024

I was able to modify the function to incorporate your suggestions and use ds.to_dataframe(ds_comids).reset_index() instead of calling our previously-defined ds_to_dataframe_faster() function.

def subset_nc_to_comids(nc_file, comids):
    comids = [int(c) for c in comids]

    ds = xr.open_dataset(nc_file, decode_times = True)

    # filter out comids that are not in climate drivers (should only be 4781767)
    comids = np.array(comids)
    comids_in_climate = comids[np.isin(comids, ds.COMID.values)]
    comids_not_in_climate = comids[~np.isin(comids, ds.COMID.values)]
    print(comids_not_in_climate)

    # We know of one COMID that has no catchment and so should be included
    # in `comids_not_in_climate` if passed through in `comids`. Use assert 
    # statement to make sure we are aware of any others. COMIDs within 
    # `comids_not_in_climate` will not have matched climate data. 
    if len(comids_not_in_climate) > 0 :
        assert list(comids_not_in_climate) == [4781767]
    ds_comids = ds.sel(COMID=comids_in_climate)
    # [Lauren] we have been using a function written by Jeff Sadler for the DRB
    # PGDL-DO project to process the xarray object to a ~tiday data frame. Below
    # I've replaced ds_to_dataframe_faster(ds_comids) with a more generic function
    # to speed up the run time. See this issue for further details: 
    # https://github.com/USGS-R/drb-gw-hw-model-prep/issues/44.
    ds_comids_df = ds_comids.to_dataframe().reset_index()
    return ds_comids_df

However, I don't notice great time improvements like you report in your examples above. The build time was previously ~34 min for me:

> tar_meta() %>% filter(name == "p2_met_data_nhd_mainstem_reaches") %>% pull(seconds)/60
[1] 33.42533

And with the changes to subset_nc_to_comids(), it's ~30 min:

> tar_meta() %>% filter(name == "p2_met_data_nhd_mainstem_reaches") %>% pull(seconds)/60
[1] 29.52967
>

Did you have other edits in mind besides what I pasted here?

from drb-gw-hw-model-prep.

lekoenig commented on September 13, 2024

@msleckman I made an attempt to incorporate your suggestions (see above) but didn't see much improvement in the build time. So I've unassigned myself from this issue and added a wontfix label. I'm not sure if my attempt fully captured what you had in mind, so if you're able to implement this please feel free to do so.

from drb-gw-hw-model-prep.

lekoenig commented on September 13, 2024

Closing this issue for now as wontfix.

from drb-gw-hw-model-prep.

tidy netcdf with xarray about drb-gw-hw-model-prep HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent