arcticsnow / topopyscale Goto Github PK

View Code? Open in Web Editor NEW

39.0 5.0 9.0 14.43 MB

TopoPyScale: a Python library to perform simplistic climate downscaling at the hillslope scale

Home Page: https://topopyscale.readthedocs.io

License: MIT License

Python 95.59% TeX 4.41%

science climate downscaling geoscience climate-science clustering dem era5 timeseries xarray joss

topopyscale's People

Contributors

Stargazers

Watchers

Forkers

juaai ealonsogzl nilick hugoledoux abidsarwar srinivas2036 ibdv kyung-robin-kim raineyabe

topopyscale's Issues

parallelize computation of horizons on multiple core

replace for loop of L162 with a pooling solution, spreading each horizon computation on different core in parallel

Resources:

parallelize computation of solar geometry

solar geometry is computied via a foor loop. Could this be parallelize?

MergeError: coordinate point_id shares a name with a dataset dimension ….

l.74 solar_geom.py

running "swat" sim (points in Karakorum)

Same issue as this:
pydata/xarray#2931

solved as suggested:
"point_id": df_position.index),
to
"point_id":(("point_id"), df_position.index),

But why?! Have not seen in map jobs.

adding WMO insitu fetch data from CDS

All official WMO in-situ observation are available from: https://cds.climate.copernicus.eu/cdsapp#!/dataset/insitu-observations-surface-land?tab=overview

Similarly, we could fecth NOAA Weather dataset too

Add functionnalities to:

download observation using spatial and temporal extents
write tool to perform comparision with downscaled data (lat lon available in csv file)
write functions
add functions to topoclass.py

New method for precipitation partitioning by Jenning et al 2018

The paper by Jennings et al (2018) presents a new method to compute solid vs. liquid precipitation when close the 0C. This method could be implemented in TopoPyscale.

topo_scale.py downscaling fails on last itersation of SW/LW section

config:

split:
    IO: False       # Flag to split downscaling in time or not
    time: 2         # number of years to split timeline in
    space: None     # NOT IMPLEMENTED

fails at this line (l.327):

horizon = horizon_da.sel(x=row.x, y=row.y, azimuth=np.rad2deg(ds_solar.azimuth.isel(point_id=31)), method='nearest')

with:

IndexError: Index is not smaller than dimension 31 >= 31

Explanation:

The object ds_solar.azimuth is:

Out[2]: 
<xarray.DataArray 'azimuth' (point_id: 31, time: 50376)>
dask.array<open_dataset-6baaeed24838583c892469b5b4861c12azimuth, shape=(31, 50376), dtype=float32, chunksize=(31, 50376), chunktype=numpy.ndarray>
Coordinates:
  * point_id  (point_id) int64 1 2 3 4 5 6 7 8 9 ... 23 24 25 26 27 28 29 30 31
  * time      (time) datetime64[ns] 2016-09-01 ... 2022-05-31T23:00:00

point_id = 1:31

the iterator here is the issue (l.266):

for i, row in df_centroids.iterrows():

makes values 1:31 SHOULD be 0:30

this works:
ds_solar.azimuth.isel(point_id=0)

this causes fail:
ds_solar.azimuth.isel(point_id=31)

More info:

        print('Downscaling LW, SW for point: {} out of {}'.format(pt_id+1,
                                                                  df_centroids.point_id.max()+1))

this print statement assumes a python index as adds one. for this example print statemnts give iterrations 2:32, should be 1:31.

I guess this is connected to the last changes of making downscaling split in time?

Solution:

add -1 here:

horizon = horizon_da.sel(x=row.x, y=row.y, azimuth=np.rad2deg(ds_solar.azimuth.isel(point_id=pt_id-1)), method='nearest')

All other incidences of pt_id are correct with current indexing.

removed +1 here:

        print('Downscaling LW, SW for point: {} out of {}'.format(pt_id,
                                                                  df_centroids.point_id.max()))

to get correct index in print statement

q_2_rh() seems to have a problem

Check output of q_2_rh() function as the current version seems to have a probelm

TopoPyScale/TopoPyScale/meteo_util.py

Line 49 in 7a79dff

def q_2_rh(temp, pressure, qair):

Precip distribution in Toposub

Not really a bug as behaviour is I think as expected. Map below is monthly precip total over davos with range of c. 90-130mm. I have applied an elevation lapse rate. The distribution of precip is correct (mainly from NW). SE is drier. Tops are wetter valleys are drier. What I think is up though is that there is way too much slope detail. Aspect seems to be comming through here. I would expect to see mainly an elevation gradient with also the NW to SE Precip gradient. I think this is because elev and aspect are weighted the same in toposub. I guess for a variable like precip we just want elevation and x y as dimensions of variability. I think in my Toposub elevation was weighted much higher so if I have few samples the main variability is elevation. I had a weighting factor for each dimension before it went into Kmeans.

List of dataset of interest

Many interesting datasets available at Copernicus data center

Observation:

https://cds.climate.copernicus.eu/cdsapp#!/dataset/insitu-observations-gruan-reference-network?tab=overview

GCMs:

https://cds.climate.copernicus.eu/cdsapp#!/dataset/projections-cmip6?tab=form

Reanalysis:

Arctic CARRA: surface and Pressure levels

export netcdf function from Topoclass

When no variables defined, the to_netcdf() function of the Topoclass (not topoexport!) raises an error.

Suggestion how to fix it:
add the follow at the beginning of the function:

if variables is None:
    variables = list(self.downscaled_pts.keys())

This initializes the variables when not defined to all (which is described as default in the function description.)

Another option would be to remove the sub-setting of 'self.downscaled_pts[variables]' in the te.to_netcdf call and leave it to the topoexport class, which handles it already. But this I would not recommend, since it is prone to errors (when the function changes for example).

#toposcale: How to deal with situation where the lowest pressure level

In the case the elevation of the point is lower than the geopotential elevation of the lowest pressure level (1000hPa for ERA5), toposcale currently breaks down.

Strategies:

skip the point and return NaNs
take values from closest pressure level without interpolation

Implementation

see line 134 of topo_scale.py

`topo_scale.py` interpolation improvements

Find out how to use the interpolation method of xarray to avoid the looping over each point, at least for the horizontal/planar interpolation in topo_scale.py, line 94 -132

http://xarray.pydata.org/en/stable/user-guide/interpolation.html

unable to read some netcdf files with h5ncdf option

commit: ebf380c#diff-14a14030423355c28b612bef60be37e718aeed0e301499661d91f8f84a78c46f

ds_plev = xr.open_mfdataset(path_forcing + 'PLEV*.nc', engine='h5netcdf',parallel=True).sel(time=slice(start_date, end_date))
ds_surf = xr.open_mfdataset(path_forcing + 'SURF*.nc', engine='h5netcdf', parallel=True).sel(time=slice(start_date, end_date))

Option engine='h5netcdf' causes error in opening netcdf.

From docu:
https://xarray.pydata.org/en/stable/generated/xarray.open_mfdataset.html

engine ({"netcdf4", "scipy", "pydap", "h5netcdf", "pynio", "cfgrib", "zarr"}, optional) – Engine to use when reading files. If not provided, the default engine is chosen based on available dependencies, with a preference for “netcdf4”.

engine="netcdf4" is probably most useful for era5 data?
Or add an option to config if there are other use cases?

remove `open_mfdataset()` and `save_mfdataset()` usage throughout the toolbox

Both open_mfdataset() and save_mfdataset() are not effective. They should be replaced with the methode used in topo_scale.py

import glob
import xarray as xr 

flist = glob.glob(file_pattern)
flist.sort()

ds_list = []
for file in flist:
   ds_list.append(xr.open_dataset(file))
ds = xr.concat(ds_list, dim='time')

Feature Cryogrid for flat areas

Do simple horizontal interpolation of surface ERA5 and no vertical interp for large flat areas. This is relevant in places where there close to no discrepancy in between ERA5 topography and the actual topography.

drop setting lat/lon extent from config file

Modify the code that we do not need to indicate the lat/lon extent to download. Instead grab directly the extent of the DEM. Why indicating twice something that is readily available in the DEM!!!

export to MuSa

Add an export function to_musa()
Netcdf should contain:

variables: RH, Tair, Precip, WS
cluster mapping

LW and SW, no contribution from surrounding terrain

Currently neither LW nor SW has a component coming from the surrunding terrain. In the case of snow I/O type of place, this radiation budget can be greatly affected.

idea: Terrain exposure of one pixel could be simplified by being a half sphere - sky view factor

Intra-comparison toposcale output

First comparison of this version with previous ones.

Temperature -> good but a biais of few degrees
SW -> good, but presence of an artefact in topopyscale close to 0
LW -> complete wack in topopyscale
ws -> look good
q -> good
precip -> ??
pressure -> ?? (not avail in Kris's file)

rewrite `topo_scale.py` using barebone Dask

The topo_scale.py routine currently works well for small project, but quickly runs into trouble when used for larger projects. The internal computation of topo_scale could be improved and re-imprelemented using pure Dask syntax in order to limit the intermediate computation and memory usage.

TODO:

figure Dask usage
seek potential Dash user
identify bottleneck of current method

This will justify a new release version

`topo_scale.py` skip for loop

write a new version of the code to avoid the for loop, and use the full potential of xarray. Add a new dimension called point_id along which to do the interpolation.

Documentation

which format (RTH or other)
how to include?
list config variable possibilities with compatibilities
example codes with TopoPyScale_examples
how to prepare DEM

Check all accumulated variables TP,ISWR, ILWR

In era5 accumulated variables treatment changed from accumulated since forecast start to accumulated over last timestep. As I understood it a 3h step precip value is still precip over 1h. To be honest docs confuse the hell out of me so we should double check this once eg:

https://confluence.ecmwf.int/pages/viewpage.action?pageId=197702790
https://confluence.ecmwf.int/pages/viewpage.action?pageId=74764925

Diffenernt issue buyt interesting about "rain bombs" in era5:
https://confluence.ecmwf.int/display/CUSF/ERA5+versus+ERA-Interim+Total+Precipitation

Example issue:

Topo_scale.py l.169
down_pt['tp'] = surf_interp.tp * 1 / tstep_dict.get(tstep) * 10**3 # Convert to mm/hr

should be, I think:
down_pt['tp'] = surf_interp.tp * 1 / 1* 10**3 # Convert to mm/hr

or just:
down_pt['tp'] = surf_interp.tp * 10**3 # Convert m/hr to mm/hr

adding CORDEX climate data fecthing

add routine to fetch CORDEX climate data. Check compatibility with current code developped following ERA5 convention

Implement the new toolbox HORAYZON to compute SVF and horizon angles

The toolbox HORAYZON offers new computation scheme to derive SVF and horizon angle. It may also be capable to compute SW adjustment (atmosheric effect). Authors claim much improvement in computation time in comparison to Dozier's method.

Shortwave partitiong bug

Kt is unconstrained to interval 0-1
Kd was temporally constant (np.max term) - should be vector of same length as time.
Inverted logic in illumination angle: down_pt['cos_illumination'] = down_pt.cos_illumination_tmp * (down_pt.cos_illumination_tmp < 0) # remove selfdowing ccuring when |Solar.azi - aspect| > 90

Introduce tools to determine number of clusters

We could add tools that help deciding what is an appropriate number of clusters to run a job. This may be simple metrics of average size of a clusters, and but also more advanced metrics such as the one presented here:

Assign DEM pixle to multiple cluster (neighbors)

If each pixel is assigned to more than one clusters then this can be used to smooth out the output of toposub. @joelfiddes, I'll let you fill in the details of the idea ;)

rewrite internally sky view factor computation

We could implement our version of the sky view factor method using parallelizatino for the computation of all horizons (72).

Original TopoCalc code avail: https://github.com/USDA-ARS-NWRC/topocalc/blob/main/topocalc/viewf.py

SW radiation reversed in toposub runs

SW seems to be high on north face and low on south in toposub runs (here blue is low rad, greens/yellows high rad)

am investigating....

add ASTER dem fetching

add routine to fetch ASTER dem to fetch_dem.py

Add validation routines against observation

write routines to evaluate bias and scaling issues against local observation

Downscale to mesh

Mesh can be an alternative to approximate DEM and reduce the number of points in area with little topography. There are many tools available to generate mesh:

mesher
mmg
gmsh
or simpler python based code: https://stackoverflow.com/questions/53696900/render-voronoi-diagram-to-numpy-array

This paper provides a good explanation.

improve logic to check if forcing file exist

Try to remove the o.is_file() in the lambda line 71. The current implementatino checking if the file exist is slow when looping over more than 30 files.

One option is to load file name with glob and compare string instead

Error when getting to topo parameters extraction stage

Hi, I'm currently reviewing the software for JOSS here, and have got through the install ok, but am running into problems when getting to the extraction of topographic metrics stage. I'm not entirely sure from the error output message where exactly it is falling over. I'm using the Norway ex1 example dataset.

2023-02-21 21:56:22,639 INFO Download rate 2.5M/s
[USER_DIRS]/TopoPyScale_examples/ex1_norway_finse/inputs/climate/PLEV_202107.nc complete
2023-02-21 22:09:21,200 INFO Request is completed
2023-02-21 22:09:21,200 INFO Downloading https://download-0008-clone.copernicus-climate.eu/cache-compute-0008/cache/data7/adaptor.mars.internal-1677017196.2052243-11103-8-173d15ea-9bf4-42a7-9dd6-1ff637b84745.nc to [USER_DIRS]/TopoPyScale_examples/ex1_norway_finse/inputs/climate/PLEV_202108.nc (2M)
2023-02-21 22:09:21,862 INFO Download rate 3.1M/s
[USER_DIRS]/TopoPyScale_examples/ex1_norway_finse/inputs/climate/PLEV_202108.nc complete

---> Extracting DEM parameters (slope, aspect, svf)
Computing slope and aspect ...
Computing svf ...
---> File [USER_DIRS]/TopoPyScale_examples/ex1_norway_finse/outputs/ds_param.nc saved
---> Scaling data prior to clustering
Traceback (most recent call last):
  File "[USER_DIRS]/TopoPyScale_examples/ex1_norway_finse/pipeline.py", line 9, in <module>
    mp.extract_topo_param()
  File "[USER_DIRS]/TopoPyScale/TopoPyScale/topoclass.py", line 321, in extract_topo_param
    self.extract_topo_cluster_param()
  File "[USER_DIRS]/dev/TopoPyScale/TopoPyScale/topoclass.py", line 269, in extract_topo_cluster_param
    df_scaled, self.toposub.scaler = ts.scale_df(df_param,
  File "[USER_DIRS]/dev/TopoPyScale/TopoPyScale/topo_sub.py", line 57, in scale_df
    df_scaled = pd.DataFrame(scaler.fit_transform(df_param[feature_list].values),
  File "[USER_DIRS]/miniconda3/envs/downscaling/lib/python3.9/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "[USER_DIRS]/miniconda3/envs/downscaling/lib/python3.9/site-packages/sklearn/base.py", line 859, in fit_transform
    return self.fit(X, **fit_params).transform(X)
  File "[USER_DIRS]/miniconda3/envs/downscaling/lib/python3.9/site-packages/sklearn/preprocessing/_data.py", line 824, in fit
    return self.partial_fit(X, y, sample_weight)
  File "[USER_DIRS]/miniconda3/envs/downscaling/lib/python3.9/site-packages/sklearn/preprocessing/_data.py", line 861, in partial_fit
    X = self._validate_data(
  File "[USER_DIRS]/miniconda3/envs/downscaling/lib/python3.9/site-packages/sklearn/base.py", line 546, in _validate_data
    X = check_array(X, input_name="X", **check_params)
  File "[USER_DIRS]/miniconda3/envs/downscaling/lib/python3.9/site-packages/sklearn/utils/validation.py", line 940, in check_array
    raise ValueError(
ValueError: Found array with 0 feature(s) (shape=(40512, 0)) while a minimum of 1 is required by StandardScaler.

Frequency of precipitation

Problem: precipitation frequency too high

Probable reason: comes from the fact ( I think) that we do a 2d interpolation which will lead to more non zero events (even if very small) than just a single gridbox would contain. This can of course cause problems in terms of frequency related processes.

Solution:
(1) filter small precip events out (lose precip, conserve frequency)
(2) allocate small precip amounts to larger wet days (conserve annual total and frequency) - take frequency from the gridbox? Just apply a simple threshold of 1mm/day
(3) others?

Hi Joel,

thanks a lot for the data!

Quick analysis - toposcale temperature looks better than original era5, R^2 vs AWS measurements is respectively 0.87 and 0.81.

For precipitation, the toposcale predicts at least some precipitation 4.5x more frequently compared to era5 - 82 % of the hourly timesteps have some precipitation, vs 18 % for era5 (see attached plot). At daily aggregation (used by the mass balance model) toposcale has some precipitation on 96 % of days, vs 46 % for era5. This is not a problem for the model, but I'm just curious - have you observed a similar pattern at other locations?

Cheers,

Enrico

make topo_class compatible with ERA5 ensemble

partition_snow() is provising negative value for rain

Check code of partition_snow() function as the rain fraction provides some negative values.

TopoPyScale/TopoPyScale/meteo_util.py

Line 31 in 7a79dff

 def partition_snow(precip, temp, tair_low_thresh=272.15, tair_high_thresh=274.15): 

The current implementation follows the one in Cryogrid. there might be a bug there too:
https://github.com/CryoGrid/CryoGrid/blob/1436176a76ccdec9d9f0ddf34f153ffbc27ad51f/source/IO/FORCING/FORCING_slope_seb_readNc.m#L111

develop a gap filling algorithm

Gap filling of observation based on downscaled climate data

add function to run topoclass in chunck of time

Write a function to the class topoclass to downscale in chunck of time (about 10 years max) in case of long time series.
this includes solar_geom() and downscale_climate() functions

See project for a example: https://github.com/ArcticSnow/TPS_project_tibet/blob/main/lexiewudan/run_lexeiwudan.py

Add possibility to simply perform horizontal interpolation from PSURF only

In places close to the coast or large flat areas (below the 1000hPa) we could have the code to simply perform a horizontal interpolation based on Surface level ERA5 data.

To be implemented.

extract_topo_param() from Topoclass fails, when running with the version in current main branch.

execution:
mp = tc.Topoclass(config_file)
mp.compute_dem_param()
mp.extract_topo_param()

-> Same would work correctly with environment with release version of topopyscale in it.

Terminal output
Traceback (most recent call last):
File "pipeline.py", line 9, in
mp.extract_topo_param()
File ".../TopoPyScale/TopoPyScale/topoclass.py", line 308, in extract_topo_param
self.extract_topo_cluster_param()
File ".../TopoPyScale/TopoPyScale/topoclass.py", line 261, in extract_topo_cluster_param
df_scaled, self.toposub.scaler = ts.scale_df(df_param, features=self.config.sampling.toposub.clustering_features)
File ".../TopoPyScale/TopoPyScale/topo_sub.py", line 56, in scale_df
df_scaled = pd.DataFrame(scaler.fit_transform(df_param[feature_list].values),
File "...miniconda3/envs/TopoPyScaleClone/lib/python3.8/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "...miniconda3/envs/TopoPyScaleClone/lib/python3.8/site-packages/sklearn/base.py", line 859, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "...miniconda3/envs/TopoPyScaleClone/lib/python3.8/site-packages/sklearn/preprocessing/_data.py", line 824, in fit
return self.partial_fit(X, y, sample_weight)
File "...miniconda3/envs/TopoPyScaleClone/lib/python3.8/site-packages/sklearn/preprocessing/_data.py", line 861, in partial_fit
X = self._validate_data(
File "...miniconda3/envs/TopoPyScaleClone/lib/python3.8/site-packages/sklearn/base.py", line 546, in _validate_data
X = check_array(X, input_name="X", **check_params)
File "...miniconda3/envs/TopoPyScaleClone/lib/python3.8/site-packages/sklearn/utils/validation.py", line 940, in check_array
raise ValueError(

Wind: implement Liston and Elder method

Wind can be corrected using a simplistic correction method described in Liston and Elder (2006).
This can be implemented as a separate/optional method.
See Appendice C in Fiddes and Gruber (2014) for algorithm structure

check overlap of lat/lon extent in between climate data and DEM

write a simple check for overlap of lat/lon extent in between climate data and DEM.
Do that in topoclass.py

Precipitation totals wrong at timesteps higher than 1h

precip in era5 is still hourly sums (m) even at timesteps more than 1h - we are just simply missing precip budget. eg 6h data could be:

0, 0, 0.002, 0.002 (m)

this doesnt mean daily sum of 0.004m but more likely 0.004*4 = 0.016m

here is the bug in topo_scale.py:

down_pt['tp'] = surf_interp.tp * 1 / tstep_dict.get(tstep) * 10**3 # Convert to mm/hr

do not divide by timestep.

Furthermore, a rough way to get the budget right is to multiply by timestep.

Current code produces precip budgets around 9 times too low for a 3h timestep - twice an error of factor three.

Sensitivity Analysis

There is a number of sensitivity analysis to perform:

DEM resolution. How far does it make sense to push the DEM resolution. Does toposcale hit a plateau, stop adding values? Could there be an optimal resolution (given a certain landscape?)
DEM segmentation sensitivity to the random seeding, and clustering parameters

Horizon computation for toposub usage

Write another method in the use case of toposub where the horizon value should be a value derive from the properties of the pixel falling in each cluster.

Keep the current logic in the case of predefined list of points.

TopoPyScale/TopoPyScale/topoclass.py

Line 124 in ebf380c

self.toposub.df_centroids['hori_azi_'+str(az)] = self.horizon_da.sel(x=tgt_x,

Interactive parts of code (dem download) fail on cluster (slurm batch job)

Use case: start new job as a slurm batch job on a cluster without dem
What happens? fails as interactive statement "do you want to download..." can not be handled in a batch job

Solutions:

use keyword in config file
decide we do not support this usecase

Check output of toposcale against previous implementations

run toposcale in number of sites with the original, Matlab, and this new implementation for validation. use sites listed in: TopoPyScale_examples

This could later be setup as one of the Github action

Improve precipitation downscaling

Curcial need to improve the precip downscaling as right there are no orographic or hillslope effect on the precip.

https://www.frontiersin.org/articles/10.3389/feart.2018.00020/full
https://onlinelibrary.wiley.com/doi/abs/10.1002/hyp.7073?casa_token=l8QTJPbx-oAAAAAA:vq850HMWgVRAg14HR3esF0bvU7Qq-6R3jLXpmtuZVAQ4AFyA-HjiA39GIXuurUkee23eXFXLSaLJs2k-Uw