Comments (1)
Here's an idea that should help speed up the extraction of rows: The extraction of rows should be oriented around coordinates and URIs, not just URIs.
In the BQ pipeline...
break up theextract_rows
steps into two steps. The first of the new steps should open the dataset, filter by area, and then produce chunks of coordinate, URI pairs. (The chunks should be a range of the output of get_coordinates(data_ds, uri)
. Maybe ~1k coordinates is a good unit? Will likely have to experimentally verify).
The second extract step should consume these pairs (do preprocessing for the rows like filter out variables, etc.). and produce rows:
Other tactics to investigate:
- Move these lines outside of the
to_rows
loop; instead, perform this operation once on the whole xarray dataset: - Determine if XArray's native parallelism capabilities are a good fit to produce rows with multiple threads (https://xarray.pydata.org/en/stable/user-guide/dask.html)
- Investigate if there's a light-weight way to get coordinate information for the first of the two processing steps (e.g. can we just get the coordinates and not the data? Can we open the dataset, but not in memory?)
from weather-tools.
Related Issues (20)
- `gcloud alpha commands` used but not installed in enviroment
- ruff not used in CI pipeline
- Missing ruff checks
- Don't keep NULLs in the CSVs for feature collection
- Provide support to give time range while opening zarr HOT 1
- weather-mv rg gave data with offset by 180 degree longitude.
- weather-sp: Provide an option to append the filename with the splitted filename.
- weather-mv bq raster issue while reading ecmwf grib file HOT 2
- Find a way to exclude test data when building docker image. HOT 2
- All tools should make use of public runtime container image to manage dependencies
- weather-mv ee: Add a couple of time-metrics to asset attributes
- Deprecated Apache Beam Version Causing Error in weather-dl tool.
- Make use of secret-manager while using weather-dl for license keys. HOT 1
- Enhanced support in weather-dl for downloading data across month ranges spanning multiple years. HOT 1
- Add new functionality (--async) in weather-dl to terminate tool after dataflow job launched.
- Strengthen feature collection ingestion logic in weather-mv
- [CI/CD failing] Ruff version deprecated. HOT 2
- Add a feature in weather-mv to extract specific date's data from any files.
- Faster ingestion into BQ by converting the chunk into pd.Dataframe
- Pangeo Showcase talk on weather-tools/xql? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from weather-tools.