Comments (4)
Thanks, @msleckman! This sounds really useful for speeding up our build time for the met data. I'm kind of confused where we would implement this swap though - for example, I tried replacing ds_to_dataframe_faster(ds_comids)
with ds.to_dataframe(ds_comids).reset_index()
in the subset_nc_to_comids()
function (in 2_process/src/subset_nc_to_comid.py
), but I got an error when I tried to run that. I'm probably misinterpreting how we'd use this other xarray function and so any tips you have would be great 🙂
from drb-gw-hw-model-prep.
I was able to modify the function to incorporate your suggestions and use ds.to_dataframe(ds_comids).reset_index()
instead of calling our previously-defined ds_to_dataframe_faster()
function.
def subset_nc_to_comids(nc_file, comids):
comids = [int(c) for c in comids]
ds = xr.open_dataset(nc_file, decode_times = True)
# filter out comids that are not in climate drivers (should only be 4781767)
comids = np.array(comids)
comids_in_climate = comids[np.isin(comids, ds.COMID.values)]
comids_not_in_climate = comids[~np.isin(comids, ds.COMID.values)]
print(comids_not_in_climate)
# We know of one COMID that has no catchment and so should be included
# in `comids_not_in_climate` if passed through in `comids`. Use assert
# statement to make sure we are aware of any others. COMIDs within
# `comids_not_in_climate` will not have matched climate data.
if len(comids_not_in_climate) > 0 :
assert list(comids_not_in_climate) == [4781767]
ds_comids = ds.sel(COMID=comids_in_climate)
# [Lauren] we have been using a function written by Jeff Sadler for the DRB
# PGDL-DO project to process the xarray object to a ~tiday data frame. Below
# I've replaced ds_to_dataframe_faster(ds_comids) with a more generic function
# to speed up the run time. See this issue for further details:
# https://github.com/USGS-R/drb-gw-hw-model-prep/issues/44.
ds_comids_df = ds_comids.to_dataframe().reset_index()
return ds_comids_df
However, I don't notice great time improvements like you report in your examples above. The build time was previously ~34 min for me:
> tar_meta() %>% filter(name == "p2_met_data_nhd_mainstem_reaches") %>% pull(seconds)/60
[1] 33.42533
And with the changes to subset_nc_to_comids()
, it's ~30 min:
> tar_meta() %>% filter(name == "p2_met_data_nhd_mainstem_reaches") %>% pull(seconds)/60
[1] 29.52967
>
Did you have other edits in mind besides what I pasted here?
from drb-gw-hw-model-prep.
@msleckman I made an attempt to incorporate your suggestions (see above) but didn't see much improvement in the build time. So I've unassigned myself from this issue and added a wontfix
label. I'm not sure if my attempt fully captured what you had in mind, so if you're able to implement this please feel free to do so.
from drb-gw-hw-model-prep.
Closing this issue for now as wontfix
.
from drb-gw-hw-model-prep.
Related Issues (20)
- Map temp observations onto NHDPlusv2 COMIDs
- Compile air temp data for NHD and NHM scales?
- Figure out how to pull `drb_climate_2022_06_14.nc` in pipeline with S3
- Resolve remaining issues running adjusted NHM widths HOT 2
- Check/QC NHD-scale gridmet data
- Check on preferred column names for temperature observations HOT 2
- Fetch Wieczorek et al. NHDPlusv2 attributes
- Process Wieczorek attributes to NHM-scale
- Compile channel confinement data HOT 20
- Outline structure for groundwater data release HOT 2
- Standardize function documentation
- Standardize NHM segment identifiers across catchment attribute targets HOT 1
- Python file not invalidating in targets
- Describe pipeline structure in README
- Omit `seg_id_nat` 3558 from output table
- Omit unused MODFLOW values from `p1_modflow_discharge`
- Use date range to restrict NWIS widths download
- add seg_tave_water to PRMS outputs at NHD resolution HOT 6
- nhd observation file columns names HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from drb-gw-hw-model-prep.