arcticsnow / topopyscale Goto Github PK
View Code? Open in Web Editor NEWTopoPyScale: a Python library to perform simplistic climate downscaling at the hillslope scale
Home Page: https://topopyscale.readthedocs.io
License: MIT License
TopoPyScale: a Python library to perform simplistic climate downscaling at the hillslope scale
Home Page: https://topopyscale.readthedocs.io
License: MIT License
replace for loop of L162 with a pooling solution, spreading each horizon computation on different core in parallel
Resources:
solar geometry is computied via a foor loop. Could this be parallelize?
l.74 solar_geom.py
running "swat" sim (points in Karakorum)
Same issue as this:
pydata/xarray#2931
solved as suggested:
"point_id": df_position.index),
to
"point_id":(("point_id"), df_position.index),
But why?! Have not seen in map jobs.
All official WMO in-situ observation are available from: https://cds.climate.copernicus.eu/cdsapp#!/dataset/insitu-observations-surface-land?tab=overview
Similarly, we could fecth NOAA Weather dataset too
Add functionnalities to:
download observation using spatial and temporal extents
write tool to perform comparision with downscaled data (lat lon available in csv file)
write functions
add functions to topoclass.py
The paper by Jennings et al (2018) presents a new method to compute solid vs. liquid precipitation when close the 0C. This method could be implemented in TopoPyscale.
config:
split:
IO: False # Flag to split downscaling in time or not
time: 2 # number of years to split timeline in
space: None # NOT IMPLEMENTED
fails at this line (l.327):
horizon = horizon_da.sel(x=row.x, y=row.y, azimuth=np.rad2deg(ds_solar.azimuth.isel(point_id=31)), method='nearest')
with:
IndexError: Index is not smaller than dimension 31 >= 31
Explanation:
The object ds_solar.azimuth
is:
Out[2]:
<xarray.DataArray 'azimuth' (point_id: 31, time: 50376)>
dask.array<open_dataset-6baaeed24838583c892469b5b4861c12azimuth, shape=(31, 50376), dtype=float32, chunksize=(31, 50376), chunktype=numpy.ndarray>
Coordinates:
* point_id (point_id) int64 1 2 3 4 5 6 7 8 9 ... 23 24 25 26 27 28 29 30 31
* time (time) datetime64[ns] 2016-09-01 ... 2022-05-31T23:00:00
point_id = 1:31
the iterator here is the issue (l.266):
for i, row in df_centroids.iterrows():
makes values 1:31 SHOULD be 0:30
this works:
ds_solar.azimuth.isel(point_id=0)
this causes fail:
ds_solar.azimuth.isel(point_id=31)
More info:
print('Downscaling LW, SW for point: {} out of {}'.format(pt_id+1,
df_centroids.point_id.max()+1))
this print statement assumes a python index as adds one. for this example print statemnts give iterrations 2:32, should be 1:31.
I guess this is connected to the last changes of making downscaling split in time?
Solution:
add -1
here:
horizon = horizon_da.sel(x=row.x, y=row.y, azimuth=np.rad2deg(ds_solar.azimuth.isel(point_id=pt_id-1)), method='nearest')
All other incidences of pt_id
are correct with current indexing.
removed +1
here:
print('Downscaling LW, SW for point: {} out of {}'.format(pt_id,
df_centroids.point_id.max()))
to get correct index in print statement
Check output of q_2_rh() function as the current version seems to have a probelm
TopoPyScale/TopoPyScale/meteo_util.py
Line 49 in 7a79dff
Not really a bug as behaviour is I think as expected. Map below is monthly precip total over davos with range of c. 90-130mm. I have applied an elevation lapse rate. The distribution of precip is correct (mainly from NW). SE is drier. Tops are wetter valleys are drier. What I think is up though is that there is way too much slope detail. Aspect seems to be comming through here. I would expect to see mainly an elevation gradient with also the NW to SE Precip gradient. I think this is because elev and aspect are weighted the same in toposub. I guess for a variable like precip we just want elevation and x y as dimensions of variability. I think in my Toposub elevation was weighted much higher so if I have few samples the main variability is elevation. I had a weighting factor for each dimension before it went into Kmeans.
Many interesting datasets available at Copernicus data center
Observation:
GCMs:
Reanalysis:
When no variables defined, the to_netcdf() function of the Topoclass (not topoexport!) raises an error.
Suggestion how to fix it:
add the follow at the beginning of the function:
if variables is None:
variables = list(self.downscaled_pts.keys())
This initializes the variables when not defined to all (which is described as default in the function description.)
Another option would be to remove the sub-setting of 'self.downscaled_pts[variables]' in the te.to_netcdf call and leave it to the topoexport class, which handles it already. But this I would not recommend, since it is prone to errors (when the function changes for example).
In the case the elevation of the point is lower than the geopotential elevation of the lowest pressure level (1000hPa for ERA5), toposcale currently breaks down.
Strategies:
Implementation
topo_scale.py
Find out how to use the interpolation method of xarray to avoid the looping over each point, at least for the horizontal/planar interpolation in topo_scale.py
, line 94 -132
http://xarray.pydata.org/en/stable/user-guide/interpolation.html
commit: ebf380c#diff-14a14030423355c28b612bef60be37e718aeed0e301499661d91f8f84a78c46f
ds_plev = xr.open_mfdataset(path_forcing + 'PLEV*.nc', engine='h5netcdf',parallel=True).sel(time=slice(start_date, end_date))
ds_surf = xr.open_mfdataset(path_forcing + 'SURF*.nc', engine='h5netcdf', parallel=True).sel(time=slice(start_date, end_date))
Option engine='h5netcdf' causes error in opening netcdf.
From docu:
https://xarray.pydata.org/en/stable/generated/xarray.open_mfdataset.html
engine ({"netcdf4", "scipy", "pydap", "h5netcdf", "pynio", "cfgrib", "zarr"}, optional) – Engine to use when reading files. If not provided, the default engine is chosen based on available dependencies, with a preference for “netcdf4”.
engine="netcdf4" is probably most useful for era5 data?
Or add an option to config if there are other use cases?
Both open_mfdataset()
and save_mfdataset()
are not effective. They should be replaced with the methode used in topo_scale.py
import glob
import xarray as xr
flist = glob.glob(file_pattern)
flist.sort()
ds_list = []
for file in flist:
ds_list.append(xr.open_dataset(file))
ds = xr.concat(ds_list, dim='time')
Do simple horizontal interpolation of surface ERA5 and no vertical interp for large flat areas. This is relevant in places where there close to no discrepancy in between ERA5 topography and the actual topography.
Modify the code that we do not need to indicate the lat/lon extent to download. Instead grab directly the extent of the DEM. Why indicating twice something that is readily available in the DEM!!!
Add an export function to_musa()
Netcdf should contain:
Currently neither LW nor SW has a component coming from the surrunding terrain. In the case of snow I/O type of place, this radiation budget can be greatly affected.
idea: Terrain exposure of one pixel could be simplified by being a half sphere - sky view factor
First comparison of this version with previous ones.
Temperature -> good but a biais of few degrees
SW -> good, but presence of an artefact in topopyscale close to 0
LW -> complete wack in topopyscale
ws -> look good
q -> good
precip -> ??
pressure -> ?? (not avail in Kris's file)
The topo_scale.py
routine currently works well for small project, but quickly runs into trouble when used for larger projects. The internal computation of topo_scale could be improved and re-imprelemented using pure Dask syntax in order to limit the intermediate computation and memory usage.
TODO:
This will justify a new release version
write a new version of the code to avoid the for loop, and use the full potential of xarray. Add a new dimension called point_id
along which to do the interpolation.
In era5 accumulated variables treatment changed from accumulated since forecast start to accumulated over last timestep. As I understood it a 3h step precip value is still precip over 1h. To be honest docs confuse the hell out of me so we should double check this once eg:
https://confluence.ecmwf.int/pages/viewpage.action?pageId=197702790
https://confluence.ecmwf.int/pages/viewpage.action?pageId=74764925
Diffenernt issue buyt interesting about "rain bombs" in era5:
https://confluence.ecmwf.int/display/CUSF/ERA5+versus+ERA-Interim+Total+Precipitation
Example issue:
Topo_scale.py l.169
down_pt['tp'] = surf_interp.tp * 1 / tstep_dict.get(tstep) * 10**3 # Convert to mm/hr
should be, I think:
down_pt['tp'] = surf_interp.tp * 1 / 1* 10**3 # Convert to mm/hr
or just:
down_pt['tp'] = surf_interp.tp * 10**3 # Convert m/hr to mm/hr
add routine to fetch CORDEX climate data. Check compatibility with current code developped following ERA5 convention
The toolbox HORAYZON offers new computation scheme to derive SVF and horizon angle. It may also be capable to compute SW adjustment (atmosheric effect). Authors claim much improvement in computation time in comparison to Dozier's method.
We could add tools that help deciding what is an appropriate number of clusters to run a job. This may be simple metrics of average size of a clusters, and but also more advanced metrics such as the one presented here:
If each pixel is assigned to more than one clusters then this can be used to smooth out the output of toposub. @joelfiddes, I'll let you fill in the details of the idea ;)
We could implement our version of the sky view factor method using parallelizatino for the computation of all horizons (72).
Original TopoCalc code avail: https://github.com/USDA-ARS-NWRC/topocalc/blob/main/topocalc/viewf.py
add routine to fetch ASTER dem to fetch_dem.py
write routines to evaluate bias and scaling issues against local observation
Mesh can be an alternative to approximate DEM and reduce the number of points in area with little topography. There are many tools available to generate mesh:
This paper provides a good explanation.
Try to remove the o.is_file() in the lambda line 71. The current implementatino checking if the file exist is slow when looping over more than 30 files.
One option is to load file name with glob and compare string instead
Hi, I'm currently reviewing the software for JOSS here, and have got through the install ok, but am running into problems when getting to the extraction of topographic metrics stage. I'm not entirely sure from the error output message where exactly it is falling over. I'm using the Norway ex1 example dataset.
2023-02-21 21:56:22,639 INFO Download rate 2.5M/s
[USER_DIRS]/TopoPyScale_examples/ex1_norway_finse/inputs/climate/PLEV_202107.nc complete
2023-02-21 22:09:21,200 INFO Request is completed
2023-02-21 22:09:21,200 INFO Downloading https://download-0008-clone.copernicus-climate.eu/cache-compute-0008/cache/data7/adaptor.mars.internal-1677017196.2052243-11103-8-173d15ea-9bf4-42a7-9dd6-1ff637b84745.nc to [USER_DIRS]/TopoPyScale_examples/ex1_norway_finse/inputs/climate/PLEV_202108.nc (2M)
2023-02-21 22:09:21,862 INFO Download rate 3.1M/s
[USER_DIRS]/TopoPyScale_examples/ex1_norway_finse/inputs/climate/PLEV_202108.nc complete
---> Extracting DEM parameters (slope, aspect, svf)
Computing slope and aspect ...
Computing svf ...
---> File [USER_DIRS]/TopoPyScale_examples/ex1_norway_finse/outputs/ds_param.nc saved
---> Scaling data prior to clustering
Traceback (most recent call last):
File "[USER_DIRS]/TopoPyScale_examples/ex1_norway_finse/pipeline.py", line 9, in <module>
mp.extract_topo_param()
File "[USER_DIRS]/TopoPyScale/TopoPyScale/topoclass.py", line 321, in extract_topo_param
self.extract_topo_cluster_param()
File "[USER_DIRS]/dev/TopoPyScale/TopoPyScale/topoclass.py", line 269, in extract_topo_cluster_param
df_scaled, self.toposub.scaler = ts.scale_df(df_param,
File "[USER_DIRS]/dev/TopoPyScale/TopoPyScale/topo_sub.py", line 57, in scale_df
df_scaled = pd.DataFrame(scaler.fit_transform(df_param[feature_list].values),
File "[USER_DIRS]/miniconda3/envs/downscaling/lib/python3.9/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "[USER_DIRS]/miniconda3/envs/downscaling/lib/python3.9/site-packages/sklearn/base.py", line 859, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "[USER_DIRS]/miniconda3/envs/downscaling/lib/python3.9/site-packages/sklearn/preprocessing/_data.py", line 824, in fit
return self.partial_fit(X, y, sample_weight)
File "[USER_DIRS]/miniconda3/envs/downscaling/lib/python3.9/site-packages/sklearn/preprocessing/_data.py", line 861, in partial_fit
X = self._validate_data(
File "[USER_DIRS]/miniconda3/envs/downscaling/lib/python3.9/site-packages/sklearn/base.py", line 546, in _validate_data
X = check_array(X, input_name="X", **check_params)
File "[USER_DIRS]/miniconda3/envs/downscaling/lib/python3.9/site-packages/sklearn/utils/validation.py", line 940, in check_array
raise ValueError(
ValueError: Found array with 0 feature(s) (shape=(40512, 0)) while a minimum of 1 is required by StandardScaler.
Problem: precipitation frequency too high
Probable reason: comes from the fact ( I think) that we do a 2d interpolation which will lead to more non zero events (even if very small) than just a single gridbox would contain. This can of course cause problems in terms of frequency related processes.
Solution:
(1) filter small precip events out (lose precip, conserve frequency)
(2) allocate small precip amounts to larger wet days (conserve annual total and frequency) - take frequency from the gridbox? Just apply a simple threshold of 1mm/day
(3) others?
Hi Joel,
thanks a lot for the data!
Quick analysis - toposcale temperature looks better than original era5, R^2 vs AWS measurements is respectively 0.87 and 0.81.
For precipitation, the toposcale predicts at least some precipitation 4.5x more frequently compared to era5 - 82 % of the hourly timesteps have some precipitation, vs 18 % for era5 (see attached plot). At daily aggregation (used by the mass balance model) toposcale has some precipitation on 96 % of days, vs 46 % for era5. This is not a problem for the model, but I'm just curious - have you observed a similar pattern at other locations?
Cheers,
Enrico
Check code of partition_snow() function as the rain fraction provides some negative values.
TopoPyScale/TopoPyScale/meteo_util.py
Line 31 in 7a79dff
The current implementation follows the one in Cryogrid. there might be a bug there too:
https://github.com/CryoGrid/CryoGrid/blob/1436176a76ccdec9d9f0ddf34f153ffbc27ad51f/source/IO/FORCING/FORCING_slope_seb_readNc.m#L111
Write a function to the class topoclass to downscale in chunck of time (about 10 years max) in case of long time series.
this includes solar_geom()
and downscale_climate()
functions
See project for a example: https://github.com/ArcticSnow/TPS_project_tibet/blob/main/lexiewudan/run_lexeiwudan.py
In places close to the coast or large flat areas (below the 1000hPa) we could have the code to simply perform a horizontal interpolation based on Surface level ERA5 data.
To be implemented.
execution:
mp = tc.Topoclass(config_file)
mp.compute_dem_param()
mp.extract_topo_param()
-> Same would work correctly with environment with release version of topopyscale in it.
Terminal output
Traceback (most recent call last):
File "pipeline.py", line 9, in
mp.extract_topo_param()
File ".../TopoPyScale/TopoPyScale/topoclass.py", line 308, in extract_topo_param
self.extract_topo_cluster_param()
File ".../TopoPyScale/TopoPyScale/topoclass.py", line 261, in extract_topo_cluster_param
df_scaled, self.toposub.scaler = ts.scale_df(df_param, features=self.config.sampling.toposub.clustering_features)
File ".../TopoPyScale/TopoPyScale/topo_sub.py", line 56, in scale_df
df_scaled = pd.DataFrame(scaler.fit_transform(df_param[feature_list].values),
File "...miniconda3/envs/TopoPyScaleClone/lib/python3.8/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "...miniconda3/envs/TopoPyScaleClone/lib/python3.8/site-packages/sklearn/base.py", line 859, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "...miniconda3/envs/TopoPyScaleClone/lib/python3.8/site-packages/sklearn/preprocessing/_data.py", line 824, in fit
return self.partial_fit(X, y, sample_weight)
File "...miniconda3/envs/TopoPyScaleClone/lib/python3.8/site-packages/sklearn/preprocessing/_data.py", line 861, in partial_fit
X = self._validate_data(
File "...miniconda3/envs/TopoPyScaleClone/lib/python3.8/site-packages/sklearn/base.py", line 546, in _validate_data
X = check_array(X, input_name="X", **check_params)
File "...miniconda3/envs/TopoPyScaleClone/lib/python3.8/site-packages/sklearn/utils/validation.py", line 940, in check_array
raise ValueError(
Wind can be corrected using a simplistic correction method described in Liston and Elder (2006).
This can be implemented as a separate/optional method.
See Appendice C in Fiddes and Gruber (2014) for algorithm structure
write a simple check for overlap of lat/lon extent in between climate data and DEM.
Do that in topoclass.py
precip in era5 is still hourly sums (m) even at timesteps more than 1h - we are just simply missing precip budget. eg 6h data could be:
0, 0, 0.002, 0.002 (m)
this doesnt mean daily sum of 0.004m but more likely 0.004*4 = 0.016m
here is the bug in topo_scale.py:
down_pt['tp'] = surf_interp.tp * 1 / tstep_dict.get(tstep) * 10**3 # Convert to mm/hr
do not divide by timestep.
Furthermore, a rough way to get the budget right is to multiply by timestep.
Current code produces precip budgets around 9 times too low for a 3h timestep - twice an error of factor three.
There is a number of sensitivity analysis to perform:
Write another method in the use case of toposub where the horizon value should be a value derive from the properties of the pixel falling in each cluster.
Keep the current logic in the case of predefined list of points.
TopoPyScale/TopoPyScale/topoclass.py
Line 124 in ebf380c
Use case: start new job as a slurm batch job on a cluster without dem
What happens? fails as interactive statement "do you want to download..." can not be handled in a batch job
Solutions:
run toposcale in number of sites with the original, Matlab, and this new implementation for validation. use sites listed in: TopoPyScale_examples
This could later be setup as one of the Github action
Curcial need to improve the precip downscaling as right there are no orographic or hillslope effect on the precip.
https://www.frontiersin.org/articles/10.3389/feart.2018.00020/full
https://onlinelibrary.wiley.com/doi/abs/10.1002/hyp.7073?casa_token=l8QTJPbx-oAAAAAA:vq850HMWgVRAg14HR3esF0bvU7Qq-6R3jLXpmtuZVAQ4AFyA-HjiA39GIXuurUkee23eXFXLSaLJs2k-Uw
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.