Git Product home page Git Product logo

xdem's People

Contributors

adehecq avatar alessandro-gentilini avatar atedstone avatar cusicand avatar dependabot[bot] avatar erikmannerfelt avatar fnands avatar friedrichknuth avatar jlandmann avatar liuh886 avatar matteae avatar rhugonnet avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

xdem's Issues

Rename the repository from DemUtils to xdem

I think this is good to do as soon as possible, before this package becomes more popular. Running pip install git+github.com/GlacioHack/DemUtils.git only to import xdem doesn't make much sense!

Setup a good test data set.

  • find a DEM that can be downloaded from a single URL and write a script to download sample data set
  • setup a dataset path to access the sample data

Dependency handling needs revision

In setup.py, richdem is set as an extra_requires, but it is needed to import xdem. In addition, scikit-gstat is needed. geoutils obviously a requirement, but since it is not yet on PyPi, it won't be downloaded by pip install xdem. I don't know exactly how to fix that (apart from uploading geoutils on PyPi, of course)...

Right now, the "correct" way to install xdem without conda is quite complex:

pip install scikit-gstat
pip install richdem

pip install git+github.com/GlacioHack/GeoUtils.git
pip install git+github.com/GlacioHack/xdem.git  # or pip install xdem :)

Skipping any of the above steps leads in an ImportError when trying to import xdem.

If we would add scikit-gstat and richdem as requirements, this would be simpler. Question is, however, if we want them as mandatory requirements? As far as I'm aware, richdem only provides "nice to have" functions right now, unless @rhugonnet uses them more intricately somewhere? scikit-gstat is mandatory for variograms, right? Could geoutils be replaced somehow with git+git+github.com/GlacioHack/GeoUtils.git so that pip automatically fetches the latest master when trying to install xdem?

I know that worrying about "ease of install" in such an early stage of development is kind of dumb. I feel, however, that it will be easier to keep up with these issues instead of delaying them to a "super-fix" later!

Add "equal count" hypsometric binning

Right now, the hypsometric approach (PR #36) only allows for equally spaced bins. An alternative to equal spacing is having bins with equal pixel counts.

Correct me if I'm wrong, but the advantage would be on glaciers with a combination of flat and steep surfaces. If a nonlinear elevation vs. elevation change relationship is assumed, flat surfaces would be under-represented in bins with equal spacing. Bins with equal pixel counts, on the other hand, would make sure that each part of the glacier is represented better.

Well, you can calculate a percentile bin, then use it in your percentile functions:
pbins = np.arange(min, max, bin_size)
bins = np.percentile(mean_dem, pbins)

Originally posted by @adehecq in #36 (comment)

Test different options for volume integration

Should we use the newer, older or mean elevation when calculating the mass balance?
We could write a small jupyter notebook on a test case that implements all solutions and look at the differences.

Add normalized regional hypsometric interpolation

Currently, the implemented hypsometric approaches are local and (absolute) regional interpolation. Estimating a regional gradient using the normalized elevation range (0-1) for each glacier specifically has been shown by @rhugonnet to perform better. This should be included as an interpolation type somewhere.

Add a general-purpose `apply_matrix()` function for DEMs

In #71, we discussed errors associated to DEM gridding, since ICP (and possibly more future approaches) need to convert the DEM into a point cloud. ICP currently does:

  1. Convert DEM to point cloud (preparation)
  2. Match the ref and tba point clouds (estimates the rigid transform)
  3. Re-grid the tba DEM after transforming it (applies the rigid transform)

Step 3 currently uses scipy.interpolate.griddata which is extremely slow! A faster and more reliable approach would be fantastic to have. Something with the approximate signature:

def apply_matrix(dem: np.ndarray, transform: rio.transform.Affine, matrix: np.ndarray) -> np.ndarray:
    # do matrix magic
    return transformed_dem

One idea I had was to describe the 3D rigid transform as a 2D similarity transform and a linear vertical correction:

A 3D rigid transform from above (from 2D) would include horizontal translation, horizontal rotation, horizontal scaling (due to rotation-induced perspective differences), and elevation (bias + rotation) differences. As far as I understand, the first 3 parts could be described as a 2D similarity transform, and the final as a deramp.

The advantage of such an approach is that the DEM never has to be converted to a point cloud and regridded; it just needs to be moved and twisted a bit. I guess the most difficult part of this, however, would be to mathematically do the conversion...

This general-purpose function is integral for #79 to implement.

I am also all-ears to other alternatives.

DEM.copy() returns a Raster instance

The .copy() method, which is inherited by the Raster class, does not work as expected on the DEM class:

In [1]: import xdem

In [2]: dem = xdem.dem.DEM("examples/Longyearbyen/data/DEM_2009_ref.tif")
No metadata could be read from filename.

In [3]: dem_copy = dem.copy()

In [4]: assert isinstance(dem_copy, xdem.dem.DEM)
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-4-54169263df41> in <module>
----> 1 assert isinstance(dem_copy, xdem.dem.DEM)

AssertionError: 

In [5]: type(dem_copy)
Out[5]: geoutils.georaster.Raster

In [6]: assert isinstance(dem, xdem.dem.DEM)

This could be argued to be a flaw in geoutils, but could also be fixed here.

Duplicated examples

There is an examples folder in the base folder, and when running the tests or examples, a new folder is created under xdem. I think it's best to keep only one...

Coreg functions modify the `to_be_aligned_dem` inplace

Minimal example:

import geoutils as gu
import xdem

ref = xdem.dem.DEM(xdem.examples.FILEPATHS["longyearbyen_ref_dem"])
tba = xdem.dem.DEM(xdem.examples.FILEPATHS["longyearbyen_tba_dem"])
glacier_outlines = gu.geovector.Vector(xdem.examples.FILEPATHS["longyearbyen_glacier_outlines"])

tba_original = tba.copy()

xdem.coreg.coregister(ref, tba, mask=glacier_outlines)

assert tba_original == tba  # Raises AssertionError

This will fail, because the tba will be masked with the glacier_outlines. This can obviously lead to some unexpected side-effects, but we should still discuss it before I do something about it:

If the DEMs in question are huge, I can't simply do tba.copy() inside the coregister() function due to memory limits. Maybe an optional keyword should exist, to choose between modifying inplace (default would be to copy) or not? What do you think, @rhugonnet and @adehecq?

Allow filters in a `CoregPipeline`

Copying the code suggested by @rhugonnet, this (or something very similar) should be possible:

step1 = filters.NMAD()
step2 = coreg.BiasCorr()
step3. = filters.MedianKernel()
step4 = coreg.ICP()
step5 = biascorr.AlongTrack()

pipeline = step1 + step2 + step3 + step4 + step5

It either suggests a more general Pipeline class, or just that the (not yet implemented) Filter class just assumes it will be used in a CoregPipeline when calling Filter.__add__()

Add subsampling setting to improve coregistration

Right now, all coreg functions work on the entire DEM. This may or may not be necessary, and may consequently be a performance bottleneck on large DEMs. Perhaps coregistration can work almost equally well on a randomly subsampled set of pixels.

NMAD functions not working with masked_arrays

First of all, there are two idential functions:

xdem.coreg.calculate_nmad()
xdem.spatial_tools.nmad()

This is largely my fault, since I added my own in November when spatial_tools existed but wasn't in my field of vision.

Second, both trigger odd warnings when passing masked_arrays to them:

In [1]: import xdem

In [2]: import numpy

In [3]: masked = np.ma.masked_array([1, 2, 3, 3], mask=[True, True, False, False])

In [4]: equivalent_array = np.array([3, 3])

In [5]: masked.mean() == equivalent_array.mean()
Out [5]: True

In [6]: masked_nmad = xdem.spatial_tools.nmad(masked)
/home/erik/.local/share/conda/xdem/lib/python3.8/site-packages/numpy/core/fromnumeric.py:753: UserWarning: Warning: 'partition' will ignore the 'mask' of the MaskedArray.
  a.partition(kth, axis=axis, kind=kind, order=order)

In [7]: nmad = xdem.spatial_tools.nmad(equivalent_array)

In [8]: masked_nmad == nmad
Out [8]: False

In [9]: nmad
Out [9]: 0.0

In [10]: masked_nmad
Out [10]: 1.4826

For both masked and equivalent_array, I expected the NMAD to be 0. The output of the masked_nmad suggests to me that it just returns the nfact which is exactly 1.4826.

I think the function has to check first if the array is a masked_array or not, then act accordingly.
Also, in the coreg.py revision (#24), I should remove the duplicated calculate_nmad().

Add DEMs with NaNs (or synthesise them) for tests

Right now, the Longyearbyen test DEMs are void-free. Therefore, all the tests that pass are only guaranteed to pass with void-free examples. I've had large issues with the coreg functions and I think it is because of nans, so this should be added to further find out what's wrong with them.

I think this would be a good first step toward fixing #61.

Add hypsometric extrapolation methods

See discussion in PR #85. Some methods should be implemented to extrapolate hypsometric bins when they are missing at the bottom/top part of the glacier. @rhugonnet suggested the following:

0/ Existing: fill with the mean regional signal at a given elevation (Fanny's paper, and in Bob's paper, I don't like this much and I quote Bob's results here: the global hypsometric methods are furthest from the true values, which is perhaps not surprising in a region with a high variability of glacier elevation changes)
1/ Easy: loop through all local hypsometric pd.DataFrame, normalize those by their hypsometry and apply the area-weighted mean dhdt at the same normalized hypsometry to gap fill: at least a bit more representative of the regional signal independently of the elevation location at the scale of each glacier, but can still be off due to high inter-regional variability
2/ Best: loop through all local hypsometric pd.DataFrame, normalize by both hypsometry and elevation change and derive a mean normalized signal for the region; then, for glacier with local hypsometric (maybe with a criteria covering at least X% of the area), de-normalize by an estimated range of elevation change rate. If the rate at high elevation is not available (accumulation area full of voids), you can use the signal at half of the normalized elevation range, or maybe the best would be to use the mean elevation change rate at highest elevation captured in the region (essentially, you scale the regional hypsometric signal only by the dhdt in the ablation area of each glacier, keeping the accumulation dhdt similar than where measured for others in the region).

For 2/, the best would be a Bayesian approach, but you don't have time for this right now.
@adehecq I think that the detailed steps should be:
a) Normalize all local hypsometric estimation with more than X% coverage in accumulation area by both hypsometry and elevation change rate.
b) Derive area-weighted, mean hypsometric regional normalized signal from everything in a).
c) Derive area-weighted, mean regional elevation change rate at highest elevation from everything in a).
d) For glaciers with more than X% coverage in ablation area, de-normalize the regional signal of b) with the maximum dhdt of the glacier and the dhdt at highest elevation of c).
e) For glaciers with less than X% coverage in ablation area (glacier with pretty much no coverage), use with the mean elevation change rate in the region? (or for a more beautiful visual rendering, fill with the mean hypsometric signal without normalization)

Move `coreg.BiasCorr` and `coreg.Deramp` to `biascorr.py`?

To better fit the names of the script files, these may fit better in biascorr.py.

I strongly suggest that they still inherit from the Coreg class, however, to be fully compatible with each other. I suggest the same for e.g. biascorr.AlongTrack().

Function to merge multiple rasters

Raster mosaicking handling in GDAL just allows for "first" or "last" operations. Mean and median operations would make for much better mosaicking, and it would be nice to have it integrated with GeoUtils Rasters.

`silent=True` possibly not working properly for `xdem.DEM`

Minimal example:

In [1]: import xdem

In [2]: dem = xdem.DEM(xdem.examples.FILEPATHS["longyearbyen_ref_dem"], silent=True)
No metadata could be read from filename.

Should the silent flag not silence the "No metadata could be read from filename." text?

dem_coreg.py fails: wrong shape for reference and to be aligned DEM

Running 'python DemUtils/coreg.py' fails with error:

File "DemUtils/coreg.py", line 196, in icp_coregistration
assert reference_dem.shape == dem_to_be_aligned.shape
AssertionError

This is because of several issues. Be aware that rasterio ds shapes will be (nbands, height, width), so even for single band raster, shape will be of length 3. Hence the following issues:

Also, at https://github.com/GlacioHack/DemUtils/blob/95cf00be26499b09830e300f273d25ceef4a3b57/DemUtils/coreg.py#L629 (and after), I would suggest to:

Standardize input dimensions for DEMs

Should xdem allow DEMs with the shape (1, height, width), (height, width) or either of the two? Right now, there's no real consensus on which one works (or at least the first might not always work, but will in some cases).

  1. If we only want to allow (height, width), all functions should have:
assert len(dem.shape) == 2, f"Given DEM has shape '{dem.shape}'. Expected 2D array"
  1. If we want to allow both, but it should conform to (height, width):
demc = dem.squeeze()  # Create a squeezed copy and work on the copy

assert len(dem.shape) == 2,  f"Given DEM has shape '{dem.shape}'. Expected 2D array or 3D array with shape[0] == 1"
  1. Finally, if we want both but it should conform to (1, height, width):
if len(dem.shape) == 3 and dem.shape[0] == 1:
    demc = dem.copy()
elif len(dem.shape) == 2:
    demc = dem.reshape((1,) + dem.shape)
else:
    raise AssertionError(f"Given DEM has shape '{dem.shape}'. Expected 2D array or 3D array with shape[0] == 1")

I vote for either 1 or 2. What do you guys think?

It could be implemented in a def _check_dem(dem) -> None: to avoid repeating code everywhere.

Make the coregistration approaches more modular.

There is no consensus on the optimal coregistration pipeline, so adding modularity would help explore this.

A structure suggestion

It could be done in scikit-learn's fashion for regression models:

from sklearn.linear_model import LinearRegressor, RANSACRegressor
import numpy as np

X = np.linspace(0, 1, num=50).reshape(-1, 1)
y = np.linspace(0, 2, num=50)

for regressor in LinearRegressor, RANSACRegressor:
    model = regressor()
    model.fit(X, y)
    print(model.score(X, y))

The point of the above example is to show how interchangeable the different regression models are.

The function sklearn.pipeline.make_pipeline() allows for custom regression pipelines to be created. If for example the samples should be normalized before, this could easily be done:

from sklearn.linear_model import LinearRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline

model = make_pipeline([StandardScaler(), LinearRegressor(fit_intercept=True)])

model.fit(X,y)
# etc...

A similar approach could be done for coreg.py:

from xdem import coreg

reference_dem = ....
dem_to_be_aligned = ....
other_dem = ....

model = coreg.AmauryCoreg()
model.fit(reference_dem, dem_to_be_aligned)

print(model.nmad)

model.transform(other_dem)

... or something similar.
If a custom pipeline is sought:

model = coreg.make_pipeline([coreg.BiasCorr(), coreg.OutlierRemoval(), coreg.ICPCoreg()])

model.fit(reference_dem, dem_to_be_aligned)

Basically, all coreg approaches should be subclasses of a BaseCoreg class or similar, and all should have the methods .fit() and .transform() (or similar terminology). The make_pipeline() could be adapted from the equivalent in scikit-learn.

Improve `CoregPipeline.apply()` and `.fit()` performance by merging matrices

If parts or all steps of a pipeline can be described as matrices, the current apply loop is inefficient and might induce resampling-related errors:

dem_mod = coreg.apply(dem_mod, transform)

A better approach would be to merge transformation matrices (when possible) and therefore do fewer resampling iterations, potentially lowering the error and the processing time.

A conceptual example is:

  1. A) BiasCorr (matrixable)
  2. B) ICP (matrixable)
  3. C) NuthKaab (matrixable)
  4. D) Deramp(degree=3) (not matrixable)

Currently, the pipeline would modify the DEM four times. By combining the matrices of A, B and C, only two modifications are needed:

  1. A-C
  2. D

This could be done by first implementing a property of Coreg, for example called is_rigid which is True if it can be represented as a rigid transform (4x4 matrix). Then, the following pseudocode could work:

dem_mod = dem.copy()

matrices: list[np.ndarray] = []
for coreg in self.pipeline:
    if coreg.is_rigid:
        # Append the matrix to the list of matrices to be applied at the same time.
        matrices.append(coreg.to_matrix())
    else:   # This marks when matrices can no longer be added, and need to be applied before the non-rigid apply is done
        # Use matrix-magic to combine the matrices
        combined_matrix = combine_matrix(matrices)
        # Use more matrix-magic to transform the DEM
        apply_matrix(dem_mod, combined_matrix)
        # Reset the matrices list
        matrices.clear()
        # Apply the non-rigid transform
        coreg.apply(dem_mod, transform=transform)

if len(matrices) > 0:
     # do the same as above to apply all the matrices

where the combine_matrix() and apply_matrix() functions need to be created.

Publish package on conda/conda-forge

I give up on my attempt to do this for now... I can package xdem and use it on a fresh conda env, but scikit-gstat and geoutils are not on conda/conda-forge, so they can't be included. I don't know what to do at this point. I'll dump my progress here and then move on to something else for now..

meta.yaml:

{% set setup_data = load_setup_py_data(setup_file='setup.py', from_recipe_dir=True) %}

package:
        name: xdem
        version: "{{ setup_data.get('version') }}"
build:
        noarch: python
        number: 3

requirements:
        run:
                - python
                - numpy
                - scipy
                - rasterio
                - geopandas
                - pyproj
                - tqdm
about:
        summary: Tools for DEM comparisons and registration

I've read that conda-forge packages can be published automatically from the PyPi version. That would be great!

Right now, my (probably inefficient) way of building and testing is:

conda build .
anaconda upload "$(conda build . --output)"  # Upload the result to the logged in channel (eriksh for me)

# Steps for setting up an environment
conda create -n clean_env python
conda activate clean_env
conda install -c eriksh xdem
pip install scikit-gstat git+https://github.com/GlacioHack/GeoUtils.git

# Now the command below should not result in an ImportError
python -c "import xdem"

Add function to get a transformation matrix betwen two PCs

For now, deramping and nuth_kaab algorithm do not return a transformation matrix. Returning a trasnformation matrix would make it easier to:

  • chain with other approaches like ICP, by simply multiplying the two transformations
  • update the associated cameras
  • save the applied transformation and possibly apply it to other PC/DEMs

So far, the best solution I found is this library: https://github.com/jewettaij/superpose3d
There is a lot of unused code there, so it could be worth just rewriting some of the code.

Convert coregistration results to 4x4 transformation matrices

@adehecq and I have discussed at multiple points in time that this would be a nice addition. For example, if one would want to use the results of the coregistration in another tool (ASP / Metashape etc.), 4x4 transformation matrices are the way to go.

For bias corrections, it would be as simple as:

[
    [1, 0, 0, 0],
    [0, 1, 0, 0],
    [0, 0, 1, bias],
    [0, 0, 0, 1]
]

for translations:

[
    [1, 0, 0, xcorr],
    [0, 1, 0, ycorr],
    [0, 0, 1, bias],
    [0, 0, 0, 1]
]

We also discussed that 1st degree deramps can be described as a bias, rotation and scale in the horizontal direction. 2nd degree and higher deramps are however impossible to describe as a 4x4 matrix, and could return a NotImplementedError or similar.

PDAL and opencv already provide matrices from their ICP approaches, so this would be simple to return as a np.ndarray.

This could be part of the coreg.py revision initiative (#24).

DEM should be possible to create from a Raster object

I could submit a PR, but my aggressive formatting would mess up your code, @rhugonnet ;)

If the user has a Raster and wants to convert this to a DEM, there is currently no easy way to do it. One hacky way is:

dem = xdem.dem.DEM.from_array(raster.data, raster.transform, raster.crs, raster.nodata)

A quite simple conversion could be done in a .from_raster() static method. I recently learned this trick, and it's great because no copying is done in memory whatsoever!:

class DEM(...):
    # all the old stuff
    def from_raster(raster: gu.georaster.Raster, vref_name=None, vref_grid=None, silent=False):
        dem = DEM.__new__(DEM)
        dem.__dict__ = raster.__dict__

        # user input
        dem.vref = vref_name
        dem.vref_grid = vref_grid
        dem.ccrs = None

        # trying to get vref from product name (priority to user input)
        # dem.__parse_vref_from_fn(silent=silent)
        return dem

This kind of works for me, but I just now realised that it won't work perfectly because it's supposed to be initialised through a SatelliteImage, so it doesn't have all the right attributes.

Anyway, this would be a really nice addition!

Add `.error()` method or similar to `Coreg` class

There should be a quick way to figure out how well the coregistration went. NMAD is one alternative, but it could hide potential biases. I like RMSE's, but they often look dauntingly big!

The .error() (or similar) method could alternatively take an argument to choose (metric="nmad").

Opinions?

Add weights in coreg functions

Romain's presentation just made me think that we should make sure that by design all our coreg function allow passing weights. This way, we can later implement weighted coregistration.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.