noaa-owp / gval Goto Github PK

A high-level Python framework to evaluate the skill of geospatial datasets by comparing candidates to benchmark maps producing agreement maps and metrics.

Home Page: https://noaa-owp.github.io/gval/

License: Other

Dockerfile 0.09% Python 17.97% Jupyter Notebook 81.94%

earth-science environment evaluation-framework flood-inundation gdal geospatial hydrology python research science

gval's People

Contributors

Stargazers

Watchers

gval's Issues

Create categorical contingency table data structure

Implement a standard schema for contingency table structure.

This issue depends on #38.

Current behavior

Some post processing is implemented to index cross-tabulated counts by unique values in candidate map (rows) and benchmark map (columns).
No ability to keep track of candidate or benchmark attributes or process parameters is in place.

Expected behavior

The above structure is great for readability but does not account for the various hierarchies of samples including sub-samples, band/features, maps, and catalogs.
A contingency table structure is required that reports cross-tabulated counts for each sample as well as their associated attributes.
Methods to aggregate/groupby crosstab counts by level or associated attributes is important.
Ways of melting and pivoting this structure to something more human readable like a cross-tabulation table indexed by unique values in candidate map (rows) and benchmark map (columns) would be nice to have as well.
Some method of tagging each column with metadata is necessary. Current metadata source (candidate, benchmark, or process) and hierarchy (catalog, map, band/feature, sub-sample).
- Consider this approach for multi-indexing.
What data structures are necessary to account for this?
What classes should be created for this?

Screenshots

Update Test coverage to 95% or greater

Short description explaining the high-level reason for the new issue.

Current behavior

Coverage for only 60ish %

Expected behavior

Coverage for 95% of codebase

Steps to replicate behavior (include URLs)

Screenshots

Constrain netcdf args in `xr.open_rasterio()`

NetCDF args in open_rasterio.

Current behavior

Netcdf

Expected behavior

We are not currently supporting netcdf related open args: variable, group, decode_times, and decode_timedeltas
We should avoid exposing arguments to user.

Dealing with element-wise comparisons that have NDV and a value pairing.

Current behavior

Element-wise pairing with value-value pairs yields a value.
Element-wise pairing with NDV in either candidate or benchmark yields NDV.
This functionality is done in gval.compare.compute_agreement_xarray() along with the comparison_function arg.

Expected behavior

User's may want to denote situations where candidate or benchmark has a value and the other is NDV.
This represents a "value mis-alignment" that user's may want represented in the agreement map.

Multi-band example in Jupyter notebook

Current behavior

Test data is read in ./data

Expected behavior

Read test data from S3.

Steps to replicate behavior (include URLs)

Screenshots

Current behavior

All test data is being generated manually
All test data is being uploaded to S3 manually
All test data is being loaded in to a root directory in a flat structure

Expected behavior

Test data will be generated from base cases or directly from parameterized tests
Test data will automatically be uploaded to s3
Data will be organized in directories based on function
Metadata will be listed

Handle cases where negative_categories could be None.

Current behavior

ValueError is thrown when negative_categories is None.

Expected behavior

Should handle cases when no negative class is defined.

DF Schema's to handle multiple possible dtypes for a given column.

Current behavior

Certain columns defined in the schema's such as band, tp, counts, candidate_values, etc could possibly be more than one possible dtype including str, int, or float.
These are all currently set to str or float and only one allowed per column.

Expected behavior

Allow for multiple types of dtypes for these columns.
This is like done by subclass pander.dtypes and creating your own datatypes for pandera.

Installation Instructions on Readme.md

Making installation instructions on readme.md accurate.

Current behavior

Current instructions for installation on readme use Pypi and dockerhub based services.

Expected behavior

Since we are currently not supporting pypi or docker hub, we should:

Support these
Remove them and move the contents under the "packaging" section to installation which would enable users to install the package for user purposes. We can add git clone to these instructions.

Steps to replicate behavior (include URLs)

Screenshots

Extend contingency table to accept band/variable names/numbers from both candidate and benchmarks.

Extends bands/variables supported in contingency table.

Current behavior

Only a single band number or variable name is taken from the either the candidate or benchmark maps.
Candidate or benchmarks could have different bands with different numbers or names.

Expected behavior

Allow for the creation of two columns in contingency table that denote the band name/value for both candidate and benchmark maps.

`_is_not_natural_number()` changes

Current behavior

_is_not_natural_number() returns -2 or a ValueError.
Negative 2 has not actual purpose other than forced to return a not a None type from Numba.

Expected behavior

return bool or raise a custom exception based on raise_exception: bool argument.
Make sure there are test cases for the exception and bool cases.

Write draft note to Xarray community on planned GVAL accessor

Open up an issue on xarray repo introducing gval and ask for any tips on generating the accessor.

Related to hands on testing work done in #55.

TODO:

Read Extending xarray.
Read GH1080.
Look for any open or closed issues on xarray repo pertaining to accessors.
Look through related projects for ones with accessors.

Create predicates for DataArrays and Datasets

Current behavior

There are several informal predicates within gval/compare.py that test for matching array sizes and dimensionality.

Expected behavior

These checks should all be consolidated into some sort of module(s) or class(es).
If relevant, predicates can have a flag that will raise an exception whether builtin or custom if necessary.
This is expected to reduce code reuse and better organize predicate behavior in general.
Examples of predicates with associated, optional custom exceptions that could be implemented are _is_2d_DataArray, _is_3d_DataArray, _is_DataArray, _is_Dataset, _has_rasterio_accessor, and _DataArrays_of_same_shape.

New Value Introduced with Reproject Match

Current behavior

A new value is being introduced when reprojecting candidate_map_1 to benchmark_map_1.

Steps to Reproduce

candidate_map_1.rio.reproject_match(benchmark_map_1)

Map Counts of Agreement Encodings to Statistical Metrics

As of now there is no mapping of permuted agreement encodings to categorical statistic.

Current behavior

Functionality not available yet

Expected behavior

For all available parameters in metrics, e.g. (tp, tn, fp, fn) have a permutation of all available arguments

Example of doing so:

 import itertools
 from gval.statistics.categorical_statistics import CategoricalStatistics
 # Example higher abstraction calling all metric functions
 cat_stat = CategoricalStatistics
 params = cat_stat.get_all_parameters()
 len_params = len(params)

 # Hypothetical counts
 counts = [120, 30, 11, 13, 20]

 # Permute through all combinations of tp, tn, fp, and fn given counts
 # Given 4 params and 5 choices there will be 120 different combinations
 tiled_counts = np.tile(counts, (len_params, 1))
 arg_dicts = [{key: val for key, val in zip(params, combo)}
              for combo in itertools.product(*tiled_counts)
              if len(np.unique(combo)) == len_params]

Design Jupyter Notebook for second tier user functionality

There is no public facing API for user functionality. While this issue does not encompass the creation of a gval accessor. It does cover the creation of a Jupyter notebook aimed to be released for preliminary user testing and feedback.

Current behavior

Key components of pipeline are fragmented private functions.
The pipeline is not complete as metric tables, agreement maps, and other outputs are fully defined.

Expected behavior

We need to create a Jupyter Notebook that shows how user interaction is completed with a second tier of public functions.
The public API should be an Xarray accessor called gval.
Current examples include: rxr.open_rasterio() (for candidate and benchmark), candidate.gval.spatial_alignment(benchmark, ...), candidate.gval.compute_agreement_map(benchmark), crosstab_table = candidate.gval.crosstab(benchmark), crosstab_tab.gval.compute_metrics(metrics, etc) (need pandas accessor), include any relevant plotting, and lastly we need to show how to serialize this information.
Try to include plenty of comments to make user aware of what is happening.
Have this notebook ready for in-house user feedback by NLT April 15.

Investigate the feasibility of developing a GVAL accessor for xarray.

Investigate how a GVAL accessor for Xarray would work.

Investigate here.
How would accessor methods work in their return types? Under what circumstance would we lose the gval accessor?
What methods and attributes would we want to the accessor to have??
Should we let them know based on their suggestion ("To help users keep things straight, please let us know if you plan to write a new accessor for an open source library.")
Investigate extended projects that have accessors already to see how they are written.
Investigate tools that make documenting accessors easier.

Handle cases where candidate and benchmark maps may have differing positive and negative values

Current behavior

Compute_categorical_metrics only handles cases where the positive and negative values in both maps align.

Expected behavior

Handle cases where candidate and benchmark maps may have differing positive and negative values

Szudzik and Cantor Pairing Function Warnings with np.nan

Current behavior

Szudzik and cantor pairing functions are raising RuntimeWarnings for np.nan.

Expected behavior

Should not raise.

Steps to replicate behavior (include URLs)

>>> from gval.comparison import pairing_functions
>>> import numpy as np
>>> pairing_functions.szudzik_pair_signed(1,np.nan)
<stdin>:1: RuntimeWarning: invalid value encountered in szudzik_pair_signed
nan

Compare functions should be parametrized with strings

Assigning comparison functions with strings.

Current behavior

gval.compare.compute_agreement_xarray() currently accepts the comparsion_function argument as callable or a string for the "pairing_dict" case.

Expected behavior

This argument should accept strings for the cantor and szudzik cases.
This would involve creating a registry of pairing functions.

Steps to replicate behavior (include URLs)

Screenshots

Docker Compose

Consider using docker compose for stability and convenience.

Current behavior

Mount points for docker run -v <src>:<trgt> have to be defined by user.

Expected behavior

- Consider using `docker compose` yaml to provide easy runtime instructions as to avoid having users/developers do this with every `docker run` instance and to enforce target mount points inside containers.
- See [this](https://docs.docker.com/get-started/08_using_compose/) for more information on docker compose.
- Account for any changes to bind points that may arise from #12.

Add pairing function and dictionary usage to Jupyter notebook

Add in pairing function and dictionary usage to jupyter notebooks.

Decide how to encode NDV in agreement map

Decide how to encode NDV in agreement map given candidate and benchmark NDV's that maybe different or not.

Related to #22 because it deals with np.nans as NDV.

Current behavior

No NDV management being considered right now for output of gval.compare.compute_agreement_xarray().
How does one use rxr.rio.encode_nodata() to write ndv to output xarray or file?

Expected behavior

Decide on method whether user parameter or use of pairing functionality.

Deprecate `_load_xarray()` and `_load_raster_as_xarray()`

Current behavior

_load_raster_as_xarray() is just a wrapper for rxr.open_rasterio
_load_xarray does have added benefit of loading from s3. Could use more descriptive name and used with an alias?

AWS Config Keys

Consider redoing AWS config behavior.

Current behavior

_set_aws_environment_variables function within tests/utils.py is set to run prior to every test script reading keys from a CSV.
Keys are stored within repo in data dir which is kept private by .gitignore

Expected behavior

Consider using an officially formatted AWS config file.
Consider requiring users to put this file outside of the repository to avoid an accidental commits.
- Developers can access file by using a separate volume -v defined at docker run time.
- Consider using to AWS_CONFIG_FILE environment variable and setting within Dockerfile to point to volume mount location.
The above procedure could avoid security issues and deprecate the _set_aws_environment_variables function.

Steps to replicate behavior (include URLs)

Screenshots

Revisit `rasters_intersect()`

Current behavior

rasters_intersect() currently checks for intersection without use of shapely geometry which maybe causing problems.
Additionally it assumes rasters have the same projection.

Expected behavior

Verify if this is having problems or not then patch.
It might be necessary to combine transform_bounds and rasters_intersect to avoid issues with different projections.

Steps to replicate behavior (include URLs)

Screenshots

Make Repository Public

Make repo public.

Current behavior

Currently our repository is private, only being accessible to internal users at OWP.

Expected behavior

Repository will be public, and GitActions/GitPages will be available

Put tests in place for multi-band 3D rasters loaded as `xr.DataArray` or `xr.Dataset` for tests_spatial_alignment.py

Current behavior

No tests are in place for the testing of 3D multi-band DataArrays and Datasets within test_spatial_alignment.py.

Expected behavior

We need some of these tests to see how spatial_alignment behaves with those cases.

Default masking and scaling behavior.

Current behavior

Default behavior in xr.open_rasterio() does not mask and scale data.

Expected behavior

Let's constrain opening to keeping mask_and_scale=True.

Dask array test coverage

Coverage of Dask arrays.

Current behavior

There is currently no testing for Dask arrays in place.

Expected behavior

Dask arrays should be covered in tests.
Dask arrays are employed through the chunks argument of rxr.open_rasterio()
Related arguments within rxr.open_rasterio() that could affect test status include cache and lock.
If we believe that these arguments could affect test status for functionality within gval, they should be covered within tests.

Multi-band support within `crosstab_xarray()`

Current behavior

crosstab_xarray() does not currently allow for multiple bands either as 3D arrays or as xr.Dataset.

Expected behavior

implement ability to allow for xr.DataArray with multiple bands or xr.Dataset with multiple variables.
Only support 3D datasets. So a check must be put in place that does this for both DataArrays and Datasets.
Some ability to preserve bands or not in computation of cross tabs must be included.

Steps to replicate behavior (include URLs)

Screenshots

Implement library of categorical statistics.

Generate a sub-package, module, or class of basic level categorical statistics.

This issue is related to by independent to #17.

Current behavior

No functions to compute categorical statistics are being computed.

Expected behavior

We need to have a sub-package, module, or class of basic level categorical statistics.
These functions should be two or multi class agnostics only accepting TP, TN, FP, and FNs=None as input.
Some basic examples include MCC, CSI, TPR, F1-Score, Precision, Recall, Accuracy, FAR, etc.
How do we account for functions that are mathematically equivalent but go by differing names?
How do we account for cases where users can pass strings to denote which statistics to compute?
Including references for every statistics.
Further functionality should be later generated to apply these functions to the contingency table structures proposed in #17.

Implement Pipeline on GitActions

Develop build process to test build, unit tests, lint, and assess coverage.

Current behavior

No such pipeline exists

Expected behavior

Develop pipeline and have badges for successful build, passing unit tests, and unit test coverage.

Apply categorical metrics to contingency tables.

Developed categorical metrics in #18 need to be applied to contingency table structure developed in #17.

Current behavior

There is no application of metrics computation of them on contingency tables.

Expected behavior

How do we apply functions developed in #18 to compute agreement metrics at the varying sample hierarchies thus preserving hierarchy identifiers and their attributes?
How do we account for two-class vs multi-class cases?
How do we account for one-vs-one and one-vs-all methods of computing metrics for multi-class cases?

Redo testing functionality

Current Behavior

Current testing functionality is very limited.
It does not use pytest very well.
It does not have a good structure.
It does not begin to test all of the variation of test cases that could be presented to implemented functionality.

Proposed Work

Do in-depth research for PyTest functionality.
Restructure tests to better mimic package structure.
Make more test data.
- Features and nuances to test:
  - xr.open_rasterio() args:
    - xr.Datasets with singular and multiple variables
    - xr.DataArrays with multiple bands or Variables with multiple bands.
    - xarray with masking and/or mask_and_scale (np.nan).
    - Accounting for sub-objects of xarrays' in testing including coordinates, attrs, encodings, etc.
      - Testing with xr.testing.assert_equal, xr.testing.assert_identical, or np.testing.assert_equal
    - Investigate how using dask, chunking, caching, and windowing affects fidelity of tests.
  - Parameters passed to gval.compare.crosstab_xarray() and gval.compare.compute_agreement_xarray()
    - No testing is available for these parameters.
    - No capabilities accounting for allow or deny listing of values is passed for gval.compare.compute_agreement_xarray().
  - Benchmarking
    - General computational performance across files with varying amount of samples to understand performance limitations.

Decide how to better parameterize tests.

Current behavior

Expected behavior

Consider using pytest-cases

Steps to replicate behavior (include URLs)

Screenshots

Decide how to manage np.nans

np.nans can take float or float64 form

Current behavior

When loading unmasked arrays, there are no np.nan
When loading masked arrays via open_rasterio argument masked and mask_and_scale np.nan takes np.float64 form.

Expected behavior

Need to deal with data type discrepancy as tests fail with masked arrays.

Steps to replicate behavior (include URLs)

Screenshots

Investigate behavior of `xr.open_rasterio()`

Investigate behavior of xr.open_rasterio().

xr.open_rasterio() args:
xr.Datasets with singular and multiple variables
xr.DataArrays with multiple bands or Variables with multiple bands.
xarray with masking and/or mask_and_scale (np.nan).
Investigate how using dask, chunking, caching, and windowing affects fidelity of tests

Datatype Issues between 32 bit types and 64 bit types

When usin pairing functions RuntimeWarnings arise from overflow and invalid values in the math between datasets with int or float32 ndthose with in or float64.

Current behavior

RuntimeWarnings are thrown and questionable output is made

Expected behavior

Align datatypes to 64 (highest precision) to avoid these RuntimeWarnings so none are thrown

Steps to replicate behavior (include URLs)

Screenshots

Location of references.bib

What is the best location for references.bib?

Current behavior

Current file is located in root directory.

Expected behavior

Is a bib file able to render with a sphinx/readthedocs project?
- If so, can we move this to docs so that it eventually turns into a references page within the documentation website?

Steps to replicate behavior (include URLs)

Screenshots

Add additional datasets to S3

Current behavior

There is only one pair of candidate and benchmark maps on S3.

Expected behavior

Need to account for a variety of test cases:
- multi-band rasters
- adding COGs and compressed datasets for performance
- adding non-homogenized and homogenized datasets
- varying datatypes (int, float, etc)
- grouping datasets by use case and adding meta-data. Try to leverage current module structure.
- generating datasets with all possible combinations of parameters.

Example candidate dataset: s3://fernandoa-bucket/foss_fim/test_cases/ble_test_cases/12100202_ble/testing_versions/20210902_C892f8075_allBle_n_6_MS/100yr/inundation_extent_12100202_MS.tif

Example benchmark dataset: s3://fernandoa-bucket/foss_fim/test_cases/ble_test_cases/validation_data_ble/12100202/100yr/ble_huc_12100202_extent_100yr.tif

Steps to replicate behavior (include URLs)

Screenshots

Live Documentation GitPages or ReadTheDocs

Live documentation is necessary for a public release to keep interested users informed on how to use the package.

Current behavior

None exist

Expected behavior

Live documentation

Agreement Rasters Do not Have a CRS or nodata when returned from compute_agreement_xarray

Rasters returned from ompute_agreement_xarray do not have a crs as well as not having nodata

Current behavior

CRS and no data not in agreement rasters

Expected behavior

Having CRS and nodata in agreement rasters

Pip Install Permissions

Short description explaining the high-level reason for the new issue.

Current behavior

Dockerfile conducts pip installs as root.
Installing pip installs as root in a virtual environment seems to generate pers

Expected behavior

Users should be able to install with pip in active containers without permissions issues.

Steps to replicate behavior (include URLs)

Enter interactive session of current image, docker exec -it gval bash
Run $VENV/bin/pip install xarray-spatial
The screenshot below shows how permissions are being managed.

Screenshots

Package everything in src directory

Packaging and unit tests will work more seamlessly with code existing in src directory. The local package are not accessible as is pending updates to Jupyter notebooks.

Current behavior

Local installs are not always successful and are not accessible from notebooks

Expected behavior

Local installs will be successful and accessible from notebooks

Allow listing within `gval.compare.compute_agreement_xarray()`

Allow & deny listing within gval.compare.compute_agreement_xarray()

Current behavior

No functionality exists to permit allow listing within gval.compare.compute_agreement_xarray().
Some functionality for this is included in gval.compare.crosstab_xarray().

Expected behavior

Expose ability to do allow listing within gval.compare.compute_agreement_xarray() similar to what's available in gval.compare.crosstab_xarray().

Make a pairing function module and registry of pairing functions.

Current behavior

The pairing functions supported are currently in gval/compare.py. This module has grown too large and not specific enough for pairing functions.
There is no clear way for user's to access pairing functions.

Expected behavior

User's need a specific location to access pairing functions from.
User's should also be able to access pairing functions using a string identifier that accesses a registry of paring functions.
Additionally, user's should be able to register new pairing functions as needed.

Explore dictionary based pairing function

Explore alternative pairing functions such as dictionary based pairing function.

Related to #24 and #25.

Current behavior

Only supported pairing functions for categorical data types is cantor and szudzik.

Expected behavior

Explore alternative pairing function based on dictionary of pairing and expected output such as:

{
    (c0, b0) : a0, 
    (c0, b1) : a1, 
    (c1, b0) : a2, 
    (c1, b1) : a3, 
}

where cX represents a unique value in candidate map, bX unique value in benchmark map, and aX unique value in agreement map.
This should be a numba vectorized/jit function for performance.