Comments (3)
Right now, we have a two path templating system for our config language. The default way to template files is to use {}
as placeholders for values specified in the partition_key
section – the target_path
. See this config, for example. In addition, we have a system to create files in a data hierarchy via a boolean flag. The process for doing this is documented here.
There are a few problems with the date-name hierarchy approach. Mainly, it causes confusion for users and introduces more areas for code to go wrong. For example, #127 is an issue that has cropped up due to this more complex implementation. Instead, it would be preferable if the values referenced by the partition_keys
could be used in the target_path
via python's Format string syntax. Then, users could format file patterns in totally arbitrary ways without us having to individually support corner cases. In addition to date hierarchies, users are currently not able to express that an integer string has more than one digit. For example, If I had a config like:
[parameters]
target_path=gs://my-bucket/{}-data.nc
partition_keys=
days
[selection]
days=1/2/3/4
I have no way of expressing paths like gs://my-bucket/01-data.nc
, gs://my-bucket/02-data.nc
etc. These require that in the template, that I use something like gs://my-data/{days:02d}.nc
.
To support this, somewhere, we basically need to run:
target_template.format(parameter_keys.values(), **parameter_keys)
Today, this is approximately done here:
In fact, a naive implementation of this issue would involve:
- Deleting the append_date_dir code (or, raising an error if a user tries this)
- structuring
partition_key_values
as an ordered dictionary - Calling format
- Updating the
process_config
function to encourage correct usage of the parser. - Document the usage everywhere
A problem that you would run into is that pretty much all of the values in partition_key_values
are strings! You can't format a string like you would an int. Thus, you would not be able to use formatting options like {days:02d}.nc
. Thus, a pre-requisite ticket is required – #5.
from weather-tools.
Note: Ideally, a side effect of implementing this change (and it's siblings) is that the following method:
Does not fundamentally alter the data in the config. Specifically, the code around this block:
Should no longer be needed, and can be removed.
from weather-tools.
There's a tricky case here to handle: partition_keys
that have a date
. These can't be parsed like integers, and it's hard to specify all of the fields in the target template (e.g., I want to give day and month two digits, and year four digits).
I think the best / simplest solution here would be to parse the date
fields as python datetime.date
objects. Then, we can encourage config writers to use python's date string formatting function (native to the call to format
) to incorporate date information into the config target path.
See this SO post: https://stackoverflow.com/a/22842734
For example, after the change, the MARS config string with append_date_dir
could look like:
[parameters]
client=cds
dataset=reanalysis-era5-pressure-levels
# This config creates a date-based directory hierarchy.
# In this case, the two files that will be created are
# gs://ecmwf-output-test/era5/2017/01/01-pressure-500.nc
# gs://ecmwf-output-test/era5/2017/01/02-pressure-500.nc
# gs://ecmwf-output-test/era5/2017/01/01-pressure-1000.nc
# gs://ecmwf-output-test/era5/2017/01/02-pressure-1000.nc
target_filename=
target_path=gs://ecmwf-output-test/era5/{:%Y/%m/%d}-pressure-{}.nc
partition_keys=
date
pressure_level
[selection]
product_type=reanalysis
format=netcdf
variable=
divergence
fraction_of_cloud_cover
geopotential
pressure_level=
500
1000
date=2017-01-01/to/2017-01-02
time=
00:00
06:00
12:00
18:00
from weather-tools.
Related Issues (20)
- `gcloud alpha commands` used but not installed in enviroment
- ruff not used in CI pipeline
- Missing ruff checks
- Don't keep NULLs in the CSVs for feature collection
- Provide support to give time range while opening zarr HOT 1
- weather-mv rg gave data with offset by 180 degree longitude.
- weather-sp: Provide an option to append the filename with the splitted filename.
- weather-mv bq raster issue while reading ecmwf grib file HOT 2
- Find a way to exclude test data when building docker image. HOT 2
- All tools should make use of public runtime container image to manage dependencies
- weather-mv ee: Add a couple of time-metrics to asset attributes
- Deprecated Apache Beam Version Causing Error in weather-dl tool.
- Make use of secret-manager while using weather-dl for license keys. HOT 1
- Enhanced support in weather-dl for downloading data across month ranges spanning multiple years. HOT 1
- Add new functionality (--async) in weather-dl to terminate tool after dataflow job launched.
- Strengthen feature collection ingestion logic in weather-mv
- [CI/CD failing] Ruff version deprecated. HOT 2
- Add a feature in weather-mv to extract specific date's data from any files.
- Faster ingestion into BQ by converting the chunk into pd.Dataframe
- Pangeo Showcase talk on weather-tools/xql? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from weather-tools.