Git Product home page Git Product logo

rosepearson / geofabrics Goto Github PK

View Code? Open in Web Editor NEW
25.0 25.0 10.0 10.16 MB

A package for generating hydrologically conditioned DEMs and roughness maps from LiDAR and other infrastructure data. Check the wiki for install and usage instructions, and documentation at https://rosepearson.github.io/GeoFabrics/

License: GNU General Public License v3.0

Python 98.07% Jupyter Notebook 1.59% nesC 0.33%
automated conditioned dems hydrologically lidar roughness

geofabrics's People

Contributors

github-actions[bot] avatar kinow avatar lukeparky avatar rosepearson avatar xandercai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

geofabrics's Issues

Bugfix: correct depth calculations

Correct the river depth calculations such that area is properly preserved.

I was failing to remove the 'flood water area' when moving from using the threshold elevation to the water surface elevation.

Make NN optional

Currently as the final step in the DEM generation process Nearest Neighbour (NN) interpolation is performed prior to saving the DEM raster. Move to make this an optional step. The default will be NN is performed.

Run the tests in GH actions?

Since the code is hosted here on GH, and GH actions is free, maybe it could be used to run the tests in the tests folder?

Probably a simple workflow, with the checkout action to clone the repo, then setup-python, then either pip install after installing the requirements, or conda env create -n environment.yml (might need another step to install conda, or use a Docker container)... then another step that runs the tests.

This way PR's can be validated automatically by GH Actions.

Cheers
Bruno

Dask lazy compute to reduce memory overhead

Currently compute is called in the dem module after the chunked dem is created. See code screen capture.
image
We could reduce memory load by not calling compute and instead calling dense_dem.to_netcdf(...). This would mean instead of all of the dense_dem contents being loaded into memory in one go, chunks could be loaded, processed and saved individually reducing the overall memory footprint (particularly important for large catchments for fine resolutions).

Considerations - saving the dense DEM before generating the offshore DEM values.

Areas to adderss from @rosepearson:

  • Move to lazy compute
  • Discuss how to print out to log at compute time (instead of when queuing the function calls)

Ensure any overlap between adjacent tiles is properly taken into account when binning

Currently LiDAR tiles are binned even when they only have partial coverage over a DEM bin. The last tile binned is the value in the final DEM. We need to either explore a way of merging bin values together - or ideally of combining the raw LiDAR and binning properly.

The OpenTopography LiDAR datasets all contain an GDAL-style tile index file (A ShapeFile defining the name and extents of each LiDAR tile).

There is a general blog about tiling here. It doesn't use PDAL - but it does reference LAX files (not supported by PDAL) and go through the concepts quite well.

Another option is to track the boundary values myself and accumulate all values before performing IDW at the end. Could consider using the all_touched = True option on rioxarray.clip

Roughness calculation - vegetation density

Calculate the vegetation density in vertical voxels based on the paper 'Estimation of Vegetation-Induced Flow Resistance for Hydraulic Computations Using Airborne Laser Scanning data'.

Links to the paper:

General approach:
Divide space vertically into several voxels in each pixel cell. Count the pixels in each voxel and use to calculate the vegetation density in each voxel using the equations in teh images below.

image
image

Python IDW implementation using a scipy.KDTree

Move from using PDAL writers.gdal to rasterise LiDAR tiles using IDW to using a Python implementation based on the scipy.spatial.KDTree.

  • See idw_function.py for example implementations.
  • See the comments in issue for comparative timings for PDAL writers.gdal vs scipy KDTree and SciKit-Learn KDTree.

Updating tests as I go. May move to clipping LiDAR extents to within the survey tiles as part of this work - or this may be tackled in a followup issue.

Specific changes:

  1. Added an IDW python function - pass in xy pairs to evaluate at
  2. Only perform IDW within the extents of the tile that is currently being added (a time saving over the previous approach)
  3. Update extents to remove any holes and gaps around the edges
  4. When a tile index file exists use this to clip the PDAL generated extents

Pull LINZ data down from their data protal

Add support for pulling vector data down from the LINZ data portal.

Details at: https://help.koordinates.com/query-api-and-web-services/vector-query/

  1. Query LiDAR tile info for when LiDAR info is not in open topo but is in LINZ Not done see issue 7
  2. Query Bathy contours Done - with test added
  3. Query Coastline Done - with test added
  4. Update processor to utilise the ability to pull down LINZ vector data

This will require multiple searchers for large catchments to ensure the full catchment area is searched

Bathymetry river estimates

Description:Bring in the functionality in Extract river channel shapes.ipynb and extract_river_channel_widths_rec2_3.ipynb into geofabrics.

Aim: extend geofabrics so it can generate width and slope estimates along a river channel. The Bathymetry estimates are based on Neal et al.โ€‹ and Rupp and Smart

  1. Catchment main channel identification (given a ocean segment to start from)
  2. DEM generation to some width - and resolution - and interpolation
  3. Cross sections and sampling
  4. Slope estimation and save result
  5. Width estimation and save results
  6. Regularisation of the estimates
  7. Create a fan to transition from the river mouth into the ocean
  8. Allow estimates to be incorporated back into the DEM generation process.
  9. Maintain cross sectional area during bathymetry estimation - note assumptions on shape of channel. This assumption can be addressed later for the interpolation of any channel.

Detection of blocked river channels

Either through image processing applied to DEMs and other imagery to detect culverts and bridges in DEMs that will block water flow or in combination with BGFLOOD to detect bridges/culverts that block the flow of water. May require manual oversight to confirm if they culvert/bridge does pass water.

Alan and Paulo of the data science team have created a tool (Computer Vision Annotaction Tool ) to help with segmentation and model generation for supervised ML for image recognition. I have been added as a beta tester and will use this to explore ML based drain detection or possibly blocked river channels.

Next step is to sit down with @CyprienBosserelle again and map out some suitable training tiles, and the scale of drain we want to try detect.

Setup a formatter

Another one from @jennan

  • Use a formater (https://github.com/psf/black for example, no options = no bikeshedding), put it in your CI/pre-commit/IDE config and never think again about how to format your code :-). This helps with the consistency (even if there is only one contributor), ease reading (once you get used to the formater style) and you don't need to tell people how to format their code, just they have to comply with the one you set up.

Adding mean and median interpolation options in DEM generation process

Hi Rose,

It would be great if the DEM generation process can have the interpolation options among Inverse Distance Weighting (IDW), mean, median, and maybe some others. The reason is that if there is data problem, noise/ outliers, for example, the other options which will not be affected by this issue are still avaiable for usage.

Kind regards

Martin Nguyen

Create a DEM from many LAZ tiles

The initial pipeline uses only a single LAZ file to generate the DEM. We need to add support for using many tiles LAZ files. Perhaps all LAZ files within one or more folders.

River bathymetry followup

MVP for this branch:

  1. Download the relevant Open Street Map waterway and use for initial alignment.
    i. Add an additional parameter - OSM id of river feature
    ii. Try maintain the old approach of aligning a channel as a side option
  2. Split rivers to a separate source classification from drains, and split culverts from drains as well.

There are various issues to address in future:

  • Add support for specifying the gauged flow at the time LiDAR was flown instead of estimated river flow
  • Add support for creating a fan that is 10x the river mouth width long. Will need an estimate of the depth at 10x the rive width offshore.
  • Add support for dealing with no river width estimate at the river mouth.
  • Add support fro different river mouth behaviour depending on the river mouth classification.
  • Add support for braided rivers

Revisit asserts

Revisit asserts are:

  1. might be removed during optimization if compiled
  2. raising an error may be more apporpiate.

@luke noted that the following warning from SonarLint applies to all assert statements: "In production environments, the python -O optimization flag is often used, which bypasses assert statements."
So, ensure asserts are still covered by a check if moving to compiled code with optimisation

@jennan noted I have been using asserts where raising an error may be more appropriate! https://towardsdatascience.com/practical-python-try-except-and-assert-7117355ccaab#:~:text=The%20try%20and%20except%20blocks,an%20example%20of%20defensive%20programming.

NetCDF preparation for BGFLOOD

This is a bit of a catch-all issue for work relating to the requirements associated with the NetCDF output from GeoFabrics (and also input to BGFLOOD).

Requirements:

  • Store each geofabric dataset in one NetCDF file with each layer as different variables.
    • Could also do multiple resolutions with each elevation/roughness set in a different group
    • Could also include forcing information in the same netCDF file if we want all inputs in a single file
  • Include spatial_ref information like geotransform and crs - see section below.
    • Done using rioxarray and write_crs() and write_transform
  • Include appropriate variable and coordinates attributes:
    • units information: See link for conventions.
    • long_name: A description of the variable/coordiante.
    • standard_name: Must be included in the NUG table.
    • vertical_datum: Non-standard attribute name to record the vertical datum of any elevation data.
  • Record the parameters of a run:
    • Data sources should be documented in the attributes (i.e. land layer x, revision y)
    • Capture the parameter information in a variable, or perhaps a json dump into a group attribute.
  • Different resolution geofabrics should be aligned and evenly divisible (to ensure that alignment)
  • DEMs and roughness should be defined over a full rectangular grid with no NaN values NaNs have since been deemed ok!

NetCDF conventions

There are standards defining the conventions for attributes in netCDF files.

  • cfconventions.org - The recommended standard
  • UCAR - A repository with links to all netCDF standards
    See CF conventions for some details about conventions around netCDF files.

image

The spatial_ref coordinate

This is where information associated with the coordinate system and projection (CRS and transform) are encoded.

image
image

Coordinate systems CF-1.6 <--> CRS

There is an optional grid mapping attribute called crs_wkt may be used to specify multiple coordinate system properties in so-called well-known text format (usually abbreviated to CRS WKT or OGC WKT) as detailed in the cfconventions.org page. With example mappings at the github page.

It looks like this information is sometimes encoded within a "spatial_ref" coordinate (see issue).
image

Python Libraries

There are various Python libraries for interacting with NetCDF files including netCDF4, xarray, and rioxarray. netCDF4 is an engine used by xarray to read and write netCDF files. xarray has some power constructs for constructing and interacting with data stored in netCDF files. rioxarray combines xarray with rasterio by providing access to the rasterio engine with the rio accessor.

xarray supports two main objects - DataArrays and DataSets. DataArrays work well for a single layer of data (possibly across many bands), and the DataSet class should be used for multiple variables that may have different dimension (i.e. different resolutions, or x,y vs time).

Other

There is potential for a translation layer between GeoFabrics and BG-FLOOD or also between the catchment generation code and either GeoFabrics and/or BG-FLOOD. This would be contained in a separate repository to either.

Refactor code so LiDAR can be loaded in parallel

Changes in preparation for using Dask to make use of a distributed environment that will not change the end results.

  • Remove the separate LiDAR module and instead read in each LiDAR file within a single function in the dem module.

Allow offshore LiDAR to be excluded

When using GeoFabrics to generate a DEM around Westport township - it becomes apparent that the LiDAR offshore is of the ocean surface - you can see the ocean waves. Add logic to the GeoFabricsGenerator so that all LiDAR information away from the foreshore is discarded.

Support selection of LiDAR dataset(s) to use

  1. It would be great to select which dataset to use in the case of multiple datasets covering an area
  2. We also will need to be able to generate the DEM from multiple LiDAR datasets :)

Roughness length followup

Revisit the roughness length calculations to make the estimated values more physically realistic.

Selected for this issue:

  1. Revisit the equations - grass is too smooth. Also ensure appropriate limits on values - >0, and not too big <- add as instruction file input

Options:

  • Set the river zo values based on the reach classification - planning to take from the river network + flow + friction file
  • Adjust low roughness values based on the substrate - set for roads
  • Allow the offshore and waterway zo values to be set in the instruction file
  • Include the elevation gradient to correct the std across sloped grid cells

Multiple LiDAR datasets - filtering

Add support for downloading multiple LiDAR datasets within a region with a set of rules defining what order the datasets are combined in.

Currently the dataset must be specified by name and only one dataset per DEM is supported. There are instances where the desired LiDAR dataset only overs some of the ROI, but other LiDAR datasets cover the remaining areas. Add support for combining LiDAR datasets in the case that the primary LiDAR dataset does not cover the full region.

Initially allow the user to specify the datasets to consider and the order to consider them. In time (future) consider adding some other filtering (i.e. date) in addition to name.

Performance - make use of CPU clusters

Make use of Dask to parallelise the rasterisation of many LiDAR tile files into non-overlapping chunks that will form a single xarray. Each chunk will contain several LiDAR tile files - all that partially overlap with the chunk will be loaded. This will mean that some files are loaded 2-4 times, but overall we can expect a speed-up.

Changes required:

  1. Switch from processing by LiDAR files to processing by chunks each containing many LiDAR files (this requires a tile index file)
  2. Generate a map of dense data extents after chunking all LiDAR data prior to offshore calculations
  3. Make use of the dask.delayed decorator and dask.arrays and dask.concatentate
  4. Consider using the dask distributed back-end for a dashboard and more control

Resources:

  1. Example dask ipynb - https://github.com/rosepearson/Hydrologic-DEMs-scripts/blob/main/jupyter_notebooks/dask_example.ipynb
  2. Overlapping chunks - https://docs.dask.org/en/stable/array-overlap.html
  3. Back-end schedulers - https://docs.dask.org/en/stable/scheduling.html

Performance and crash recovery

Investigate approaches to:

  • Improving performance when processing large or high resolution DEMs
    • Tile overall nc file?
    • Only read in some of the nc file?
  • Save out LiDAR DEM prior to generating offshore results
    • Split into tiles and offshore portion of DEM

Aim to:

  • Support generating higher resolution DEMs
  • Support populating the offshore DEM separately from the on-land DEM.

Revisit setup.py and specify a package entry point

Comments from @jennan

More a question than a recommentdation: your package is declared in the "src" folder. I personally like to do the same, i.e. have a "src/package_name" organisation. Now, if you change your setup.py (I need to check how I do this) you can avoid importing from "src" (e.g. in your test code) and do it directly from geofabrics (assuming your test run in an environment where geofabrics is installed, for example via pip install -e .).

  • As you are using a pyproject.toml, you can move all the information in setup.py and setup.cfg there (and keep and nearly empty setup.py to allow for pip install -e . to work). This would centralize the configuration of your package and related tools (black, flake8, etc.) as, as far as I know, they all work with the pyproject.toml file configuration.

Improve offshore interpolation

Revisit in a systematic way the selecting of appropriate interpolation techniques in these regions. There is a notebook looking into this that could be reviwed.

Filter lidar point clouds by classification

The LiDAR point clouds have usually been classified into categories like "land", "water", etc - we need to filter the point clouds by these classifications prior to conversion to raster.

  • The classification codes are specified as part of the LAS format (see specification).
  • I am assuming the dimension name for the classification code is also specified as it always seems to be 'Classification'.
    So we just need to add logic to the instruction file to only take the ground classifications. I will do this under the general section for now, but will also note that in the future we may want to reclassify (see #10) by dataset.

image

From https://www.asprs.org/wp-content/uploads/2010/12/LAS_1_4_r13.pdf

Westport fixes - name, indexing, offshore chunking

When producing a DEM for Westport several hacks to the code needed to be made before it could be used by BGFlood. The issue aims to address these. Namely:

  1. Give a name to the produced DEM
  2. Ensure positive indexing of the produced DEM
  3. Only try incorporate a reference DEM or offshore bathemetry data if it exists.
  4. Apply chunking to offshore data to avoid memory performance issues.

Open street map drains

We could add support for pulling drains down from open street map within a catchment area and incorporating them into a DEM. The assumption is they would be well aligned and pretty small. A uniform width would then be assumed and any areas where the drains go back up would be removed.

Tasks:

  • Use OSMPythonTools to download streams and drains within a catchment
  • Keep waterways that aren't where other river / ocean bathymetry are See decision below - deferred
  • Convert drain centrelines into polygons
  • Sample along drain and transform into bathymetry estimate
    i. Figure out upriver direction
    ii. Take minimums then fit some sort of monotonically decreasing curve downriver
  • Save results so they can be incorporated into DEM generation process

Note that the first three tasks were initially explored in a Jupyter Notebook found here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.