rosepearson / geofabrics Goto Github PK

A package for generating hydrologically conditioned DEMs and roughness maps from LiDAR and other infrastructure data. Check the wiki for install and usage instructions, and documentation at https://rosepearson.github.io/GeoFabrics/

License: GNU General Public License v3.0

Python 98.07% Jupyter Notebook 1.59% nesC 0.33%

automated conditioned dems hydrologically lidar roughness

geofabrics's People

Contributors

Stargazers

Watchers

Forkers

kinow martin20494 zjans aliceharang xandercai martinnguyen204 nicholasmetherall quinnhornblow robinthibaut drroad

geofabrics's Issues

Bugfix: correct depth calculations

Correct the river depth calculations such that area is properly preserved.

I was failing to remove the 'flood water area' when moving from using the threshold elevation to the water surface elevation.

Make NN optional

Currently as the final step in the DEM generation process Nearest Neighbour (NN) interpolation is performed prior to saving the DEM raster. Move to make this an optional step. The default will be NN is performed.

Investigating the difference in raster rotating 0 and 90 degrees

Hi Rose,

Please see my code for rotating.

Hoping to get this fixed before the conference - in 2 or 3 weeks

Support water and ground inclusion

Currently geofabrics supports either including all values, or just ground - move to ground and water as the other option.

Run the tests in GH actions?

Since the code is hosted here on GH, and GH actions is free, maybe it could be used to run the tests in the tests folder?

Probably a simple workflow, with the checkout action to clone the repo, then setup-python, then either pip install after installing the requirements, or conda env create -n environment.yml (might need another step to install conda, or use a Docker container)... then another step that runs the tests.

This way PR's can be validated automatically by GH Actions.

Cheers
Bruno

Support for creating DEMs from multiple LiDAR datasets

Specifically, support the creating of a DEM over a region covered by multiple potentially overlapping LiDAR datasets.

Dask lazy compute to reduce memory overhead

Currently compute is called in the dem module after the chunked dem is created. See code screen capture.

We could reduce memory load by not calling compute and instead calling dense_dem.to_netcdf(...). This would mean instead of all of the dense_dem contents being loaded into memory in one go, chunks could be loaded, processed and saved individually reducing the overall memory footprint (particularly important for large catchments for fine resolutions).

Considerations - saving the dense DEM before generating the offshore DEM values.

Areas to adderss from @rosepearson:

Move to lazy compute
Discuss how to print out to log at compute time (instead of when queuing the function calls)

Ensure any overlap between adjacent tiles is properly taken into account when binning

Currently LiDAR tiles are binned even when they only have partial coverage over a DEM bin. The last tile binned is the value in the final DEM. We need to either explore a way of merging bin values together - or ideally of combining the raw LiDAR and binning properly.

The OpenTopography LiDAR datasets all contain an GDAL-style tile index file (A ShapeFile defining the name and extents of each LiDAR tile).

Could be possible using TIndex app - but no python wrapping - https://pdal.io/apps/tindex.html
The downloaded TileIndex.zip file that exists on OpenTopography appears to be a GDAL style tile index file of the type reference in https://pdal.io/stages/readers.tindex.html so it may be that can be used directly to control which tile files are read in. Need to try replicate on of the examples in a notebook
There are some examples the mention tiling at: https://pdal.io/apps/index.html

There is a general blog about tiling here. It doesn't use PDAL - but it does reference LAX files (not supported by PDAL) and go through the concepts quite well.

Another option is to track the boundary values myself and accumulate all values before performing IDW at the end. Could consider using the all_touched = True option on rioxarray.clip

DEM and river network alignment

Update the REC2.5 digital network based on the generated DEM

Roughness calculation - vegetation density

Calculate the vegetation density in vertical voxels based on the paper 'Estimation of Vegetation-Induced Flow Resistance for Hydraulic Computations Using Airborne Laser Scanning data'.

Links to the paper:

General approach:
Divide space vertically into several voxels in each pixel cell. Count the pixels in each voxel and use to calculate the vegetation density in each voxel using the equations in teh images below.

Add support for LiDAR datasets without a tile index file - possibly by creating our own

I have explored using LINZ provided Lidar Tile Index files to define the tile information. I have ran into problems with either the tile extents not matching (https://data.linz.govt.nz/layer/53591-wellington-lidar-index-tiles-2013-2014/), or the tile names not matching (https://data.linz.govt.nz/layer/105448-west-coast-westport-lidar-index-tiles-2020/).

Ensure the DEM grid is aligned exactly with that specified prior to lidar binning

Currently PDAL writers.gdal is used and this takes the specified grid alignment and rounds it. This rounding can differ between environments and such a level of variability will not be acceptable in a stable version of GeoFabrics.

Python IDW implementation using a scipy.KDTree

Move from using PDAL writers.gdal to rasterise LiDAR tiles using IDW to using a Python implementation based on the scipy.spatial.KDTree.

See idw_function.py for example implementations.
See the comments in issue for comparative timings for PDAL writers.gdal vs scipy KDTree and SciKit-Learn KDTree.

Updating tests as I go. May move to clipping LiDAR extents to within the survey tiles as part of this work - or this may be tackled in a followup issue.

Specific changes:

Added an IDW python function - pass in xy pairs to evaluate at
Only perform IDW within the extents of the tile that is currently being added (a time saving over the previous approach)
Update extents to remove any holes and gaps around the edges
When a tile index file exists use this to clip the PDAL generated extents

Pull LINZ data down from their data protal

Add support for pulling vector data down from the LINZ data portal.

Details at: https://help.koordinates.com/query-api-and-web-services/vector-query/

~~Query LiDAR tile info for when LiDAR info is not in open topo but is in LINZ~~ Not done see issue 7
~~Query Bathy contours~~ Done - with test added
~~Query Coastline~~ Done - with test added
Update processor to utilise the ability to pull down LINZ vector data

This will require multiple searchers for large catchments to ensure the full catchment area is searched

Bathymetry river estimates

Description:Bring in the functionality in Extract river channel shapes.ipynb and extract_river_channel_widths_rec2_3.ipynb into geofabrics.

Aim: extend geofabrics so it can generate width and slope estimates along a river channel. The Bathymetry estimates are based on Neal et al. and Rupp and Smart

~~Catchment main channel identification (given a ocean segment to start from)~~
~~DEM generation to some width - and resolution - and interpolation~~
~~Cross sections and sampling~~
~~Slope estimation and save result~~
~~Width estimation and save results~~
~~Regularisation of the estimates~~
~~Create a fan to transition from the river mouth into the ocean~~
~~Allow estimates to be incorporated back into the DEM generation process.~~
~~Maintain cross sectional area during bathymetry estimation - note assumptions on shape of channel. This assumption can be addressed later for the interpolation of any channel.~~

Wiki pages or GH pages - Split readme & generate documentation

The readme is getting increasingly long as some sections can be separated out into separate pages.

Detection of blocked river channels

Either through image processing applied to DEMs and other imagery to detect culverts and bridges in DEMs that will block water flow or in combination with BGFLOOD to detect bridges/culverts that block the flow of water. May require manual oversight to confirm if they culvert/bridge does pass water.

Alan and Paulo of the data science team have created a tool (Computer Vision Annotaction Tool ) to help with segmentation and model generation for supervised ML for image recognition. I have been added as a beta tester and will use this to explore ML based drain detection or possibly blocked river channels.

Next step is to sit down with @CyprienBosserelle again and map out some suitable training tiles, and the scale of drain we want to try detect.

Setup a formatter

Another one from @jennan

Use a formater (https://github.com/psf/black for example, no options = no bikeshedding), put it in your CI/pre-commit/IDE config and never think again about how to format your code :-). This helps with the consistency (even if there is only one contributor), ease reading (once you get used to the formater style) and you don't need to tell people how to format their code, just they have to comply with the one you set up.

Ensure the correct vertical datum is used for LiDAR data

The NZ20_Westport LiDAR data appears to be in a vertical datum other than NZ2016 with offsets of 16m around the coast. Ensure all data is in the correct vertical and horizontal datum's.

Some good blogs are:

update pyproject.toml for conda-forge recipe

Plan to build the recipe directly from this GitHub repository using grayskull.
*

Note for loading the version from the pyproject.toml - https://stackoverflow.com/questions/71412385/can-python-snippets-be-placed-in-pyproject-toml - looks like this requires python 3.8: https://stackoverflow.com/questions/67085041/how-to-specify-version-in-only-one-place-when-using-pyproject-toml

Split out the data fetching capabilities into a separate repository called geoapis

Update GeoFabrics to depend on this new repository.

drain detection bugfix

Bugfixes resulting from some exploritory work into using CVaaS to automatically detect drains as laid out in this repo rosepearson/DrainDetection#1

create initial processing chain

replicate the broad functionality that exists in: https://github.com/rosepearson/Hydrologic-DEMs-scripts/blob/main/jupyter_notebooks/Waikanae_small_test_data_to_dem_separate_offshore.ipynb

geopandas geoms address warning

Address the warnings at indexing shapely files in the geometry module.

bugfix - PyPI dependency errors

pypi package errors on both windows and linux

Adding mean and median interpolation options in DEM generation process

Hi Rose,

It would be great if the DEM generation process can have the interpolation options among Inverse Distance Weighting (IDW), mean, median, and maybe some others. The reason is that if there is data problem, noise/ outliers, for example, the other options which will not be affected by this issue are still avaiable for usage.

Kind regards

Martin Nguyen

Create a DEM from many LAZ tiles

The initial pipeline uses only a single LAZ file to generate the DEM. We need to add support for using many tiles LAZ files. Perhaps all LAZ files within one or more folders.

River bathymetry followup

MVP for this branch:

Download the relevant Open Street Map waterway and use for initial alignment.
i. Add an additional parameter - OSM id of river feature
ii. Try maintain the old approach of aligning a channel as a side option
Split rivers to a separate source classification from drains, and split culverts from drains as well.

There are various issues to address in future:

Add support for specifying the gauged flow at the time LiDAR was flown instead of estimated river flow
Add support for creating a fan that is 10x the river mouth width long. Will need an estimate of the depth at 10x the rive width offshore.
Add support for dealing with no river width estimate at the river mouth.
Add support fro different river mouth behaviour depending on the river mouth classification.
Add support for braided rivers

maintain compatibility with geoapis

In geoapis 2.3 the interface for OpenTopography LiDAR changed. This issue is to update GeoFabrics to use the new interface. niwa/geoapis#7

Revisit asserts

Revisit asserts are:

might be removed during optimization if compiled
raising an error may be more apporpiate.

@luke noted that the following warning from SonarLint applies to all assert statements: "In production environments, the python -O optimization flag is often used, which bypasses assert statements."
So, ensure asserts are still covered by a check if moving to compiled code with optimisation

@jennan noted I have been using asserts where raising an error may be more appropriate! https://towardsdatascience.com/practical-python-try-except-and-assert-7117355ccaab#:~:text=The%20try%20and%20except%20blocks,an%20example%20of%20defensive%20programming.

NetCDF preparation for BGFLOOD

This is a bit of a catch-all issue for work relating to the requirements associated with the NetCDF output from GeoFabrics (and also input to BGFLOOD).

Requirements:

Store each geofabric dataset in one NetCDF file with each layer as different variables.
- Could also do multiple resolutions with each elevation/roughness set in a different group
- Could also include forcing information in the same netCDF file if we want all inputs in a single file
Include spatial_ref information like geotransform and crs - see section below.
- Done using rioxarray and write_crs() and write_transform
Include appropriate variable and coordinates attributes:
- units information: See link for conventions.
- long_name: A description of the variable/coordiante.
- standard_name: Must be included in the NUG table.
- vertical_datum: Non-standard attribute name to record the vertical datum of any elevation data.
Record the parameters of a run:
- Data sources should be documented in the attributes (i.e. land layer x, revision y)
- Capture the parameter information in a variable, or perhaps a json dump into a group attribute.
Different resolution geofabrics should be aligned and evenly divisible (to ensure that alignment)
~~DEMs and roughness should be defined over a full rectangular grid with no NaN values~~ NaNs have since been deemed ok!

NetCDF conventions

There are standards defining the conventions for attributes in netCDF files.

cfconventions.org - The recommended standard
UCAR - A repository with links to all netCDF standards
See CF conventions for some details about conventions around netCDF files.

The `spatial_ref` coordinate

This is where information associated with the coordinate system and projection (CRS and transform) are encoded.

Coordinate systems CF-1.6 <--> CRS

There is an optional grid mapping attribute called crs_wkt may be used to specify multiple coordinate system properties in so-called well-known text format (usually abbreviated to CRS WKT or OGC WKT) as detailed in the cfconventions.org page. With example mappings at the github page.

It looks like this information is sometimes encoded within a "spatial_ref" coordinate (see issue).

Python Libraries

There are various Python libraries for interacting with NetCDF files including netCDF4, xarray, and rioxarray. netCDF4 is an engine used by xarray to read and write netCDF files. xarray has some power constructs for constructing and interacting with data stored in netCDF files. rioxarray combines xarray with rasterio by providing access to the rasterio engine with the rio accessor.

xarray supports two main objects - DataArrays and DataSets. DataArrays work well for a single layer of data (possibly across many bands), and the DataSet class should be used for multiple variables that may have different dimension (i.e. different resolutions, or x,y vs time).

Other

There is potential for a translation layer between GeoFabrics and BG-FLOOD or also between the catchment generation code and either GeoFabrics and/or BG-FLOOD. This would be contained in a separate repository to either.

Refactor code so LiDAR can be loaded in parallel

Changes in preparation for using Dask to make use of a distributed environment that will not change the end results.

Remove the separate LiDAR module and instead read in each LiDAR file within a single function in the dem module.

Add support for a valid range of LiDAR point elevations

This is in response to an issue spotted by @CyprienBosserelle in the Wellington_2013 dataset as documented in spurious negative evaluations and shown in the image below. I plan to address this by adding a 'valid_elevation_range' attribute to the general section of the instruction file. If this is set this range is then used to filter the point cloud data prior to interpolation.

Improve LiDAR point cloud classification - Infrastructure

Create a module focused on improving the classification of downloaded LiDAR point clouds.

Probably start by classifying points near bridges as infrastructure.
Bridge deck - 17
Road - 10
Rail - 11

Also resources for later issues - improving gound classification

PDAL ground classification tutorial: https://pdal.io/tutorial/ground-filters.html
PDAL identify ground tutorial: https://pdal.io/workshop/exercises/analysis/ground/ground.html

Allow offshore LiDAR to be excluded

When using GeoFabrics to generate a DEM around Westport township - it becomes apparent that the LiDAR offshore is of the ocean surface - you can see the ocean waves. Add logic to the GeoFabricsGenerator so that all LiDAR information away from the foreshore is discarded.

Support selection of LiDAR dataset(s) to use

It would be great to select which dataset to use in the case of multiple datasets covering an area
We also will need to be able to generate the DEM from multiple LiDAR datasets :)

Roughness length followup

Revisit the roughness length calculations to make the estimated values more physically realistic.

Selected for this issue:

Revisit the equations - grass is too smooth. Also ensure appropriate limits on values - >0, and not too big <- add as instruction file input

Options:

Set the river zo values based on the reach classification - planning to take from the river network + flow + friction file
Adjust low roughness values based on the substrate - set for roads
Allow the offshore and waterway zo values to be set in the instruction file
Include the elevation gradient to correct the std across sloped grid cells

Use Spinx to generate docstrings

As suggested by @jennan and previously by @kinow

Actually do it this time :)

missed commits from - Revisit setup.py and specify a package entry point

Failed to commit some local changes related to migrating to a pyproject.toml file.

Multiple LiDAR datasets - filtering

Add support for downloading multiple LiDAR datasets within a region with a set of rules defining what order the datasets are combined in.

Currently the dataset must be specified by name and only one dataset per DEM is supported. There are instances where the desired LiDAR dataset only overs some of the ROI, but other LiDAR datasets cover the remaining areas. Add support for combining LiDAR datasets in the case that the primary LiDAR dataset does not cover the full region.

Initially allow the user to specify the datasets to consider and the order to consider them. In time (future) consider adding some other filtering (i.e. date) in addition to name.

Performance - make use of CPU clusters

Make use of Dask to parallelise the rasterisation of many LiDAR tile files into non-overlapping chunks that will form a single xarray. Each chunk will contain several LiDAR tile files - all that partially overlap with the chunk will be loaded. This will mean that some files are loaded 2-4 times, but overall we can expect a speed-up.

Changes required:

Switch from processing by LiDAR files to processing by chunks each containing many LiDAR files (this requires a tile index file)
Generate a map of dense data extents after chunking all LiDAR data prior to offshore calculations
Make use of the dask.delayed decorator and dask.arrays and dask.concatentate
Consider using the dask distributed back-end for a dashboard and more control

Resources:

Example dask ipynb - https://github.com/rosepearson/Hydrologic-DEMs-scripts/blob/main/jupyter_notebooks/dask_example.ipynb
Overlapping chunks - https://docs.dask.org/en/stable/array-overlap.html
Back-end schedulers - https://docs.dask.org/en/stable/scheduling.html

Performance and crash recovery

Investigate approaches to:

Improving performance when processing large or high resolution DEMs
- Tile overall nc file?
- Only read in some of the nc file?
Save out LiDAR DEM prior to generating offshore results
- Split into tiles and offshore portion of DEM

Aim to:

Support generating higher resolution DEMs
Support populating the offshore DEM separately from the on-land DEM.

Revisit setup.py and specify a package entry point

Comments from @jennan

More a question than a recommentdation: your package is declared in the "src" folder. I personally like to do the same, i.e. have a "src/package_name" organisation. Now, if you change your setup.py (I need to check how I do this) you can avoid importing from "src" (e.g. in your test code) and do it directly from geofabrics (assuming your test run in an environment where geofabrics is installed, for example via pip install -e .).

As you are using a pyproject.toml, you can move all the information in setup.py and setup.cfg there (and keep and nearly empty setup.py to allow for pip install -e . to work). This would centralize the configuration of your package and related tools (black, flake8, etc.) as, as far as I know, they all work with the pyproject.toml file configuration.

Improve offshore interpolation

Revisit in a systematic way the selecting of appropriate interpolation techniques in these regions. There is a notebook looking into this that could be reviwed.

Filter lidar point clouds by classification

The LiDAR point clouds have usually been classified into categories like "land", "water", etc - we need to filter the point clouds by these classifications prior to conversion to raster.

The classification codes are specified as part of the LAS format (see specification).
I am assuming the dimension name for the classification code is also specified as it always seems to be 'Classification'.
So we just need to add logic to the instruction file to only take the ground classifications. I will do this under the general section for now, but will also note that in the future we may want to reclassify (see #10) by dataset.

From https://www.asprs.org/wp-content/uploads/2010/12/LAS_1_4_r13.pdf

dense_dem_extents error

The issue is the dem extents are none and it is still trying to write them out.

Westport fixes - name, indexing, offshore chunking

When producing a DEM for Westport several hacks to the code needed to be made before it could be used by BGFlood. The issue aims to address these. Namely:

Give a name to the produced DEM
Ensure positive indexing of the produced DEM
Only try incorporate a reference DEM or offshore bathemetry data if it exists.
Apply chunking to offshore data to avoid memory performance issues.

Provide a benchmarking file for working out what chunk size and number of cores to use

Commit the local benchmark.py file that I have been using to generate plots of time vs chunk size for different numbers for cores.

This should read in a modified instruction file including the following 'benchmarking' options:
'name_stub', 'title', 'number_of_cores', and 'chunk_sizes'

Pull lidar down using API

Use the APIs supported by OpenTopography to download LiDAR data within a specific region.

https://portal.opentopography.org/dataCatalog?loc=New%20Zealand

I expect this to the the form of a new Package within this source - at least for now

Open street map drains

We could add support for pulling drains down from open street map within a catchment area and incorporating them into a DEM. The assumption is they would be well aligned and pretty small. A uniform width would then be assumed and any areas where the drains go back up would be removed.

Tasks:

Use OSMPythonTools to download streams and drains within a catchment
~~Keep waterways that aren't where other river / ocean bathymetry are~~ See decision below - deferred
Convert drain centrelines into polygons
Sample along drain and transform into bathymetry estimate
i. Figure out upriver direction
ii. Take minimums then fit some sort of monotonically decreasing curve downriver
Save results so they can be incorporated into DEM generation process

Note that the first three tasks were initially explored in a Jupyter Notebook found here.