Git Product home page Git Product logo

topopyscale's People

Contributors

arcticsnow avatar ealonsogzl avatar hugoledoux avatar joelfiddes avatar krisaalstad avatar nilick avatar paswyss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

topopyscale's Issues

topo_scale.py downscaling fails on last itersation of SW/LW section

config:

split:
    IO: False       # Flag to split downscaling in time or not
    time: 2         # number of years to split timeline in
    space: None     # NOT IMPLEMENTED

fails at this line (l.327):

horizon = horizon_da.sel(x=row.x, y=row.y, azimuth=np.rad2deg(ds_solar.azimuth.isel(point_id=31)), method='nearest')

with:

IndexError: Index is not smaller than dimension 31 >= 31

Explanation:

The object ds_solar.azimuth is:

Out[2]: 
<xarray.DataArray 'azimuth' (point_id: 31, time: 50376)>
dask.array<open_dataset-6baaeed24838583c892469b5b4861c12azimuth, shape=(31, 50376), dtype=float32, chunksize=(31, 50376), chunktype=numpy.ndarray>
Coordinates:
  * point_id  (point_id) int64 1 2 3 4 5 6 7 8 9 ... 23 24 25 26 27 28 29 30 31
  * time      (time) datetime64[ns] 2016-09-01 ... 2022-05-31T23:00:00

point_id = 1:31

the iterator here is the issue (l.266):

for i, row in df_centroids.iterrows():

makes values 1:31 SHOULD be 0:30

this works:
ds_solar.azimuth.isel(point_id=0)

this causes fail:
ds_solar.azimuth.isel(point_id=31)

More info:

        print('Downscaling LW, SW for point: {} out of {}'.format(pt_id+1,
                                                                  df_centroids.point_id.max()+1))

this print statement assumes a python index as adds one. for this example print statemnts give iterrations 2:32, should be 1:31.

I guess this is connected to the last changes of making downscaling split in time?

Solution:

add -1 here:

horizon = horizon_da.sel(x=row.x, y=row.y, azimuth=np.rad2deg(ds_solar.azimuth.isel(point_id=pt_id-1)), method='nearest')

All other incidences of pt_id are correct with current indexing.

removed +1 here:

        print('Downscaling LW, SW for point: {} out of {}'.format(pt_id,
                                                                  df_centroids.point_id.max()))

to get correct index in print statement

Precip distribution in Toposub

Not really a bug as behaviour is I think as expected. Map below is monthly precip total over davos with range of c. 90-130mm. I have applied an elevation lapse rate. The distribution of precip is correct (mainly from NW). SE is drier. Tops are wetter valleys are drier. What I think is up though is that there is way too much slope detail. Aspect seems to be comming through here. I would expect to see mainly an elevation gradient with also the NW to SE Precip gradient. I think this is because elev and aspect are weighted the same in toposub. I guess for a variable like precip we just want elevation and x y as dimensions of variability. I think in my Toposub elevation was weighted much higher so if I have few samples the main variability is elevation. I had a weighting factor for each dimension before it went into Kmeans.
image

export netcdf function from Topoclass

When no variables defined, the to_netcdf() function of the Topoclass (not topoexport!) raises an error.

Suggestion how to fix it:
add the follow at the beginning of the function:

if variables is None:
    variables = list(self.downscaled_pts.keys())

This initializes the variables when not defined to all (which is described as default in the function description.)

Another option would be to remove the sub-setting of 'self.downscaled_pts[variables]' in the te.to_netcdf call and leave it to the topoexport class, which handles it already. But this I would not recommend, since it is prone to errors (when the function changes for example).

#toposcale: How to deal with situation where the lowest pressure level

In the case the elevation of the point is lower than the geopotential elevation of the lowest pressure level (1000hPa for ERA5), toposcale currently breaks down.

Strategies:

  1. skip the point and return NaNs
  2. take values from closest pressure level without interpolation

Implementation

  • see line 134 of topo_scale.py

unable to read some netcdf files with h5ncdf option

commit: ebf380c#diff-14a14030423355c28b612bef60be37e718aeed0e301499661d91f8f84a78c46f

ds_plev = xr.open_mfdataset(path_forcing + 'PLEV*.nc', engine='h5netcdf',parallel=True).sel(time=slice(start_date, end_date))
ds_surf = xr.open_mfdataset(path_forcing + 'SURF*.nc', engine='h5netcdf', parallel=True).sel(time=slice(start_date, end_date))

Option engine='h5netcdf' causes error in opening netcdf.

From docu:
https://xarray.pydata.org/en/stable/generated/xarray.open_mfdataset.html

engine ({"netcdf4", "scipy", "pydap", "h5netcdf", "pynio", "cfgrib", "zarr"}, optional) – Engine to use when reading files. If not provided, the default engine is chosen based on available dependencies, with a preference for “netcdf4”.

engine="netcdf4" is probably most useful for era5 data?
Or add an option to config if there are other use cases?

Feature Cryogrid for flat areas

Do simple horizontal interpolation of surface ERA5 and no vertical interp for large flat areas. This is relevant in places where there close to no discrepancy in between ERA5 topography and the actual topography.

drop setting lat/lon extent from config file

Modify the code that we do not need to indicate the lat/lon extent to download. Instead grab directly the extent of the DEM. Why indicating twice something that is readily available in the DEM!!!

export to MuSa

Add an export function to_musa()
Netcdf should contain:

  • variables: RH, Tair, Precip, WS
  • cluster mapping

LW and SW, no contribution from surrounding terrain

Currently neither LW nor SW has a component coming from the surrunding terrain. In the case of snow I/O type of place, this radiation budget can be greatly affected.

idea: Terrain exposure of one pixel could be simplified by being a half sphere - sky view factor

Intra-comparison toposcale output

First comparison of this version with previous ones.

Temperature -> good but a biais of few degrees
SW -> good, but presence of an artefact in topopyscale close to 0
LW -> complete wack in topopyscale
ws -> look good
q -> good
precip -> ??
pressure -> ?? (not avail in Kris's file)

rewrite `topo_scale.py` using barebone Dask

The topo_scale.py routine currently works well for small project, but quickly runs into trouble when used for larger projects. The internal computation of topo_scale could be improved and re-imprelemented using pure Dask syntax in order to limit the intermediate computation and memory usage.

TODO:

  • figure Dask usage
  • seek potential Dash user
  • identify bottleneck of current method

This will justify a new release version

`topo_scale.py` skip for loop

write a new version of the code to avoid the for loop, and use the full potential of xarray. Add a new dimension called point_id along which to do the interpolation.

Documentation

Documentation

  1. which format (RTH or other)
  2. how to include?
  3. list config variable possibilities with compatibilities
  4. example codes with TopoPyScale_examples
  5. how to prepare DEM

Check all accumulated variables TP,ISWR, ILWR

In era5 accumulated variables treatment changed from accumulated since forecast start to accumulated over last timestep. As I understood it a 3h step precip value is still precip over 1h. To be honest docs confuse the hell out of me so we should double check this once eg:

https://confluence.ecmwf.int/pages/viewpage.action?pageId=197702790
https://confluence.ecmwf.int/pages/viewpage.action?pageId=74764925

Diffenernt issue buyt interesting about "rain bombs" in era5:
https://confluence.ecmwf.int/display/CUSF/ERA5+versus+ERA-Interim+Total+Precipitation

Example issue:

Topo_scale.py l.169
down_pt['tp'] = surf_interp.tp * 1 / tstep_dict.get(tstep) * 10**3 # Convert to mm/hr

should be, I think:
down_pt['tp'] = surf_interp.tp * 1 / 1* 10**3 # Convert to mm/hr

or just:
down_pt['tp'] = surf_interp.tp * 10**3 # Convert m/hr to mm/hr

Shortwave partitiong bug

  1. Kt is unconstrained to interval 0-1
  2. Kd was temporally constant (np.max term) - should be vector of same length as time.
  3. Inverted logic in illumination angle: down_pt['cos_illumination'] = down_pt.cos_illumination_tmp * (down_pt.cos_illumination_tmp < 0) # remove selfdowing ccuring when |Solar.azi - aspect| > 90

improve logic to check if forcing file exist

Try to remove the o.is_file() in the lambda line 71. The current implementatino checking if the file exist is slow when looping over more than 30 files.

One option is to load file name with glob and compare string instead

Error when getting to topo parameters extraction stage

Hi, I'm currently reviewing the software for JOSS here, and have got through the install ok, but am running into problems when getting to the extraction of topographic metrics stage. I'm not entirely sure from the error output message where exactly it is falling over. I'm using the Norway ex1 example dataset.

2023-02-21 21:56:22,639 INFO Download rate 2.5M/s
[USER_DIRS]/TopoPyScale_examples/ex1_norway_finse/inputs/climate/PLEV_202107.nc complete
2023-02-21 22:09:21,200 INFO Request is completed
2023-02-21 22:09:21,200 INFO Downloading https://download-0008-clone.copernicus-climate.eu/cache-compute-0008/cache/data7/adaptor.mars.internal-1677017196.2052243-11103-8-173d15ea-9bf4-42a7-9dd6-1ff637b84745.nc to [USER_DIRS]/TopoPyScale_examples/ex1_norway_finse/inputs/climate/PLEV_202108.nc (2M)
2023-02-21 22:09:21,862 INFO Download rate 3.1M/s
[USER_DIRS]/TopoPyScale_examples/ex1_norway_finse/inputs/climate/PLEV_202108.nc complete

---> Extracting DEM parameters (slope, aspect, svf)
Computing slope and aspect ...
Computing svf ...
---> File [USER_DIRS]/TopoPyScale_examples/ex1_norway_finse/outputs/ds_param.nc saved
---> Scaling data prior to clustering
Traceback (most recent call last):
  File "[USER_DIRS]/TopoPyScale_examples/ex1_norway_finse/pipeline.py", line 9, in <module>
    mp.extract_topo_param()
  File "[USER_DIRS]/TopoPyScale/TopoPyScale/topoclass.py", line 321, in extract_topo_param
    self.extract_topo_cluster_param()
  File "[USER_DIRS]/dev/TopoPyScale/TopoPyScale/topoclass.py", line 269, in extract_topo_cluster_param
    df_scaled, self.toposub.scaler = ts.scale_df(df_param,
  File "[USER_DIRS]/dev/TopoPyScale/TopoPyScale/topo_sub.py", line 57, in scale_df
    df_scaled = pd.DataFrame(scaler.fit_transform(df_param[feature_list].values),
  File "[USER_DIRS]/miniconda3/envs/downscaling/lib/python3.9/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "[USER_DIRS]/miniconda3/envs/downscaling/lib/python3.9/site-packages/sklearn/base.py", line 859, in fit_transform
    return self.fit(X, **fit_params).transform(X)
  File "[USER_DIRS]/miniconda3/envs/downscaling/lib/python3.9/site-packages/sklearn/preprocessing/_data.py", line 824, in fit
    return self.partial_fit(X, y, sample_weight)
  File "[USER_DIRS]/miniconda3/envs/downscaling/lib/python3.9/site-packages/sklearn/preprocessing/_data.py", line 861, in partial_fit
    X = self._validate_data(
  File "[USER_DIRS]/miniconda3/envs/downscaling/lib/python3.9/site-packages/sklearn/base.py", line 546, in _validate_data
    X = check_array(X, input_name="X", **check_params)
  File "[USER_DIRS]/miniconda3/envs/downscaling/lib/python3.9/site-packages/sklearn/utils/validation.py", line 940, in check_array
    raise ValueError(
ValueError: Found array with 0 feature(s) (shape=(40512, 0)) while a minimum of 1 is required by StandardScaler.

Frequency of precipitation

Problem: precipitation frequency too high

Probable reason: comes from the fact ( I think) that we do a 2d interpolation which will lead to more non zero events (even if very small) than just a single gridbox would contain. This can of course cause problems in terms of frequency related processes.

Solution:
(1) filter small precip events out (lose precip, conserve frequency)
(2) allocate small precip amounts to larger wet days (conserve annual total and frequency) - take frequency from the gridbox? Just apply a simple threshold of 1mm/day
(3) others?

image

Hi Joel,

thanks a lot for the data!

Quick analysis - toposcale temperature looks better than original era5, R^2 vs AWS measurements is respectively 0.87 and 0.81.

For precipitation, the toposcale predicts at least some precipitation 4.5x more frequently compared to era5 - 82 % of the hourly timesteps have some precipitation, vs 18 % for era5 (see attached plot). At daily aggregation (used by the mass balance model) toposcale has some precipitation on 96 % of days, vs 46 % for era5. This is not a problem for the model, but I'm just curious - have you observed a similar pattern at other locations?

Cheers,

Enrico

partition_snow() is provising negative value for rain

Check code of partition_snow() function as the rain fraction provides some negative values.

def partition_snow(precip, temp, tair_low_thresh=272.15, tair_high_thresh=274.15):

The current implementation follows the one in Cryogrid. there might be a bug there too:
https://github.com/CryoGrid/CryoGrid/blob/1436176a76ccdec9d9f0ddf34f153ffbc27ad51f/source/IO/FORCING/FORCING_slope_seb_readNc.m#L111

extract_topo_param() from Topoclass fails, when running with the version in current main branch.

execution:
mp = tc.Topoclass(config_file)
mp.compute_dem_param()
mp.extract_topo_param()

-> Same would work correctly with environment with release version of topopyscale in it.

Terminal output
Traceback (most recent call last):
File "pipeline.py", line 9, in
mp.extract_topo_param()
File ".../TopoPyScale/TopoPyScale/topoclass.py", line 308, in extract_topo_param
self.extract_topo_cluster_param()
File ".../TopoPyScale/TopoPyScale/topoclass.py", line 261, in extract_topo_cluster_param
df_scaled, self.toposub.scaler = ts.scale_df(df_param, features=self.config.sampling.toposub.clustering_features)
File ".../TopoPyScale/TopoPyScale/topo_sub.py", line 56, in scale_df
df_scaled = pd.DataFrame(scaler.fit_transform(df_param[feature_list].values),
File "...miniconda3/envs/TopoPyScaleClone/lib/python3.8/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "...miniconda3/envs/TopoPyScaleClone/lib/python3.8/site-packages/sklearn/base.py", line 859, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "...miniconda3/envs/TopoPyScaleClone/lib/python3.8/site-packages/sklearn/preprocessing/_data.py", line 824, in fit
return self.partial_fit(X, y, sample_weight)
File "...miniconda3/envs/TopoPyScaleClone/lib/python3.8/site-packages/sklearn/preprocessing/_data.py", line 861, in partial_fit
X = self._validate_data(
File "...miniconda3/envs/TopoPyScaleClone/lib/python3.8/site-packages/sklearn/base.py", line 546, in _validate_data
X = check_array(X, input_name="X", **check_params)
File "...miniconda3/envs/TopoPyScaleClone/lib/python3.8/site-packages/sklearn/utils/validation.py", line 940, in check_array
raise ValueError(

Wind: implement Liston and Elder method

Wind can be corrected using a simplistic correction method described in Liston and Elder (2006).
This can be implemented as a separate/optional method.
See Appendice C in Fiddes and Gruber (2014) for algorithm structure

Precipitation totals wrong at timesteps higher than 1h

precip in era5 is still hourly sums (m) even at timesteps more than 1h - we are just simply missing precip budget. eg 6h data could be:

0, 0, 0.002, 0.002 (m)

this doesnt mean daily sum of 0.004m but more likely 0.004*4 = 0.016m

here is the bug in topo_scale.py:

down_pt['tp'] = surf_interp.tp * 1 / tstep_dict.get(tstep) * 10**3 # Convert to mm/hr

do not divide by timestep.

Furthermore, a rough way to get the budget right is to multiply by timestep.

Current code produces precip budgets around 9 times too low for a 3h timestep - twice an error of factor three.

Sensitivity Analysis

There is a number of sensitivity analysis to perform:

  1. DEM resolution. How far does it make sense to push the DEM resolution. Does toposcale hit a plateau, stop adding values? Could there be an optimal resolution (given a certain landscape?)
  2. DEM segmentation sensitivity to the random seeding, and clustering parameters

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.