gwmod / nlmod Goto Github PK

View Code? Open in Web Editor NEW

28.0 7.0 1.0 45.76 MB

Python package to build, run and visualize MODFLOW 6 groundwater models in the Netherlands.

Home Page: https://nlmod.readthedocs.io

License: MIT License

Python 100.00%

flopy geopandas hydrogeology hydrology modflow python xarray groundwater groundwater-modelling

nlmod's Introduction

nlmod

Python package to build, run and visualize MODFLOW 6 groundwater models in the Netherlands.

nlmod was built to allow users to write scripts to quickly download relevant data from publicly available sources, and build and post-process groundwater flow and transport models at different spatial and temporal scales to answer specific geohydrological questions. Scripting these steps, from downloading data to building groundwater models, makes models more reproducible and transparent.

The functions in nlmod have four main objectives:

Create and adapt the temporal and spatial discretization of a MODFLOW model using an xarray Dataset (nlmod.dims).
Download and read data from external sources, project this data on the modelgrid and add this data to an xarray Dataset (nlmod.read).
Use data in an xarray Dataset to build modflow packages for both groundwater flow and transport models using FloPy (nlmod.sim, nlmod.gwf and nlmod.gwt for Modflow 6 and nlmod.modpath for Modpath).
Visualise modeldata in Python (nlmod.plot) or GIS software (nlmod.gis).

More information can be found on the documentation-website: https://nlmod.readthedocs.io/.

Installation

Install the module with pip:

pip install nlmod

nlmod has the following required dependencies:

flopy
xarray
netcdf4
rasterio
rioxarray
affine
geopandas
owslib
hydropandas
shapely
pyshp
rtree
matplotlib
dask
colorama
joblib

There are some optional dependecies, only needed (and imported) in a single method. Examples of this are bottleneck (used in calculate_gxg), geocube (used in add_min_ahn_to_gdf), h5netcdf (used for hdf5 files backend in xarray), scikit-image (used in calculate_sea_coverage). To install nlmod with the optional dependencies use:

pip install nlmod[full]

When using pip the dependencies are automatically installed. Some dependencies are notoriously hard to install on certain platforms. Please see the dependencies section of the hydropandas package for more information on how to install these packages manually.

Getting started

If you are using nlmod for the first time you need to download the MODFLOW executables. You can easily download these executables by running this Python code:

import nlmod
nlmod.download_mfbinaries()

After you've downloaded the executables you can run the Jupyter Notebooks in the examples folder. These notebooks illustrate how to use the nlmod package.

nlmod's People

Contributors

Stargazers

Watchers

Forkers

tomvansteijn

nlmod's Issues

Functionaliteit voor toevoegen GWT model

Function to go from `model_ds` to `gwf`/`sim` and backwards

I would like a function to create a simulation object (sim) from a model_ds object. The ds_model is easy to store and backup, while the gwf/sim is needed to use the plot functions of flopy. I can also imagine plenty of usecases for the reversed. Probably a lot of code is already part of the package. Thanks!

color logs based on log level

In Jupyter Notebooks all logs are highlighted in red. This make the logs look like errors or warnings while often they are just information about what is happening. It would be nice if we can highlight the errors based on the log level. It might be possible to do this using the coloredlogs package -> https://coloredlogs.readthedocs.io/en/latest/

check data_variables type in `dataarray_to_shapefile`

if the keyword argument data_variables in the function dataarray_to_shapefile is a str an unclear error is raised. Fix function for str input

Tasks

Some stuff that I still want to do:

add dev branch, protect master branch.
automate the testing of the master branch.
add test function to create a small unstructured model, use this model to test anything unstructured instead of the big model that is used now.
create different levels of testing, 1. create full models, 2. test existing models.
add dependencies and version numbers to the requirements

nlmod can't be imported in the new structure.

Add to the example notebook before import nlmode:

import sys

module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
sys.path.append(module_path)

to make sure "import nlmod" works in the current folder structure.

recharge is zero in some cells in example notebook 3

The figure of the recharge in example notebook 3 about local grid refinement show that the recharge is sometimes zero.

change default directory of mfbinaries

The default directory to save mfbinaries when calling nlmod.util.download_mfbinaries() is the parent directory of the nlmod package. This makes sense when you have the nlmod github repo because the parent directory of the nlmod package is still part of the repo. When you install nlmod using pip, the parent directory of the nlmod package is the site-packages directory. It is not a good idea to store the mfbinaries there. Therefore we should add the mfbinaries to the nlmod package directory instead.

Add optional arguments to functions besides model_ds

Many functions have model_ds (a model dataset) as an argument. In these functions data arrays and attributes from the model dataset are used. When you call this function it is very unclear which data arrays or attributes are necessary to run the function. We should change these functions to include all the data variables and/or attributes as function arguments. Then each function can be called using model_ds or by providing all the other function arguments.

Current function ghb_from_model_ds in nlmod.mfpackages.mfpackages

def ghb_from_model_ds(model_ds, gwf, da_name):
    """get general head boundary from model dataset.

    Parameters
    ----------
    model_ds : xarray.Dataset
        dataset with model data.
    gwf : flopy ModflowGwf
        groundwaterflow object.
    da_name : str
        name of the ghb files in the model dataset.

Proposal for new function

def ghb_from_model_ds(gwf, 
                      model_ds=None, 
                      ghb_cond=None, 
                      ghb_stage=None,
                      gridtype=None, 
                      only_first_active_layer=True, 
                      first_active_layer=None,
                      only_active_cells=False, 
                      idomain=None):
    """get general head boundary from model dataset.

    Parameters
    ----------
    gwf : flopy ModflowGwf
        groundwaterflow object.
    model_ds : xarray.Dataset or None, optional
        dataset with model data. If model_ds is None the function arguments
        ghb_cond, ghb_peil and gridtype should be defined. Default is None
    ghb_cond : xarray.DataArray or None, optional
        dataarray with the conductivity of the ghb. All non-zero cells are used
        as ghb cells. If None the function argument model_ds should be defined.
        Default is None
    ghb_stage : xarray.DataArray or None, optional
        dataarray with the stage of the ghb. If None the function argument
        model_ds should be defined. Default is None
    gridtype : str or None, optional
        can be 'structured' or 'vertex'. If None the function argument model_ds
        should be defined. Default is None
    only_first_active_layer : bool, optional
        if True only the ghb is only added to the first active layer in the
        model. Default is True
    first_active_layer : xarray.DataArray or None, optional
        dataarray with for each cell the first active layer. Only necesarry
        when use_first_active_layer is True. Default is None
    only_active_cells : bool, optional
        if True ghb cells are only to cells with an idomain of 1. Default is
        False.
    idomain : xarray.DataArray or None, optional
        dataarray with the idomain. Only necesarry
        when only_active_cells is True. Default is None

Rename bot to botm

Flopy uses botm, let's stay close to the flopy naming convention.

Improve `add_geotop_to_regis_hlc`

The function to combine regis and geotop add_geotop_to_regis_hlc is very rigid. Is assumes there are no layers above the Holoceen in Regis. This usually works but I think a general function in which you can choose more layers that you want to replace with geotop is better.

Expliciet maken waar data vandaan komt

In de functie nlmod.read.rws.get_surface_water wordt een shapefile ingelezen met grote oppervlaktewaterlichamen. Bij het aanroepen van de functie is niet duidelijk waar deze data vandaan komt. Zorg dat bij het aanroepen van de functie expliciet wordt vermeld welke data wordt ingelezen. Geef ook de mogelijkheid om bijv. zelf een shapefile op te geven.

Dit geldt bijvoorbeeld ook voor de conversietabel voor geotop die gebruikt wordt.

Extrapolate_regis become part of get_combined_layer_models

https://github.com/ArtesiaWater/nlmod/blob/4431929bfa46f49a747805841734fe5ff813787b/nlmod/read/regis.py#L419

Should we make extrapolate_regis part of get_combined_layer_models? And add a keyword argument to optionally turn it off?

It sounds like a sane default to have it part of get_combined_layer_model and makes it so that users have to remember one line less.

Thanks!

netcdf conventions for grids

https://ugrid-conventions.github.io/ugrid-conventions/

inspired by imod-python

nan values in surface water

In the notebook 02_surface_water the river stress period data is built from a geodataframe with:
riv_spd = nlmod.mfpackages.surface_water.build_spd(celldata, "RIV", model_ds)

The logs show that there are a lot of nan values in the rbot data. It does not result in an error but I was wondering if this is expected behavior. Also if this is the expected behavior should we remove the warning? Maybe we can only make a warning if some cells were removed because of missing/erroneous data.

extent schalen

In de functies regis.get_regis_dataset en geotop.get_geotop_dataset kan je een willekeurig extent opgeven. Echter bij sommige extents gaat het helemaal mis en krijg je in sommige cellen wel een bot maar geen kh of iets anders.

Dit is op te lossen door de extent eerst te schalen naar regis. Dit kan met de functie regis.fit_extent_to_regis. Maar dit moet je wel handmatig doen. Als je dat niet doet krijg je foutmeldingen. Voor nu heb ik een check ingebouwd in de eerder genoemde functies maar dit is een beetje houtje/touwtje.

Het mooiste zou zijn als je uiteindelijk ieder willekeurig extent kan meegeven in de functies regis.get_regis_dataset en geotop.get_geotop_dataset.

get_vertices requires rdp package

tasks

Some stuff that came up and I want to fix later:

rename add_knmi_to_model_dataset to get_recharge.
move find_sea_cells from jarkus module to rws module
Use uniform names for all the get dataset methods in the read module.
rename _check_model_ds function in the cache module to make the dimension and coordinate check more uniform
There are two functions to calculate thickness from the top and bottom, get_thickness_from_topbot and calculate_thickness. Remove one and make more uniform
the function gdf_to_bool_data_array contains the code da = xr.zeros_like(model_ds['top']). This is pretty ugly as it assumes that top is a 2d variable and all the attributes are copied to the new dataarray.
the function nlmod.mgrid.create_vertex_grid(), the ahn module and the knmi module have their own caching implementation. Move all caching logic to the caching modules in decorators

rename 'time_units' and 'start_time' attributes of time coordinate to 'units' and 'start'

It seems easy but the 'units' attribute of the time coordinate seams to be locked and I get this error when I try to write it to a netcdf file: ValueError: failed to prevent overwriting existing key units in attrs on variable 'time'. This is probably an encoding field used by xarray to describe how a variable is serialized. To proceed, remove this key from the variable's attributes manually. It is also mentioned here: pydata/xarray#3739. I don't have time now to dive into this. so I will revert the commit for now and create an issue for this.

Originally posted by @OnnoEbbens in #58 (comment)

Add units to DataArray

It would be nice if the DataArrays that are part of model_ds would have the attribute units. The units are then automagically used in the labels when plotting the DataArray.

first create model_ws and mfversion, then add time discretization

Now you first create a model_ds_time object, and then use that to initialize a MFSimulation.

More logical to separate those steps somehow?
model_ds = nlmod.get_empty_model_ds()

followed by:
model_ds = nlmod.set_model_ds_time(...)

I'll commit something along these lines today.

Check if boundary condition falls in inactive idomain

Upon configuring flopy packages (several functions within nlmod/mfpackages) check if boundary condition is placed in inactive idomain to prevent errors such as placing drains/lakes in inactive cells.

rename 'unstructured' gridtype to 'vertex'

Using the same definitions as flopy makes sense.

move constant head logic out of mfpackages module

The mfpackages module is only meant to create modflow packages from the model dataset. Therefore the chd_at_model_edge_from_model_ds should be split into a part that creates the 'chd' data variable for model dataset and the part that create the chd mf6 package.

downloaden en vergridden scheiden

Op dit moment worden de download en vergrid functies vaak in één keer aangeroepen. Bijv. in de functies nlmod.read.ahn.get_ahn() en nlmod.read.knmi.get_recharge(). Het zou handig zijn om deze functies van elkaar te kunnen scheiden. Dus dat je in het project eenmaal de data die je nodig hebt download en daarna naar deze data verwijst en deze vergrid bij het maken van een specifiek model.

Je kan dan bijv. het AHN van heel je projectgebied op fijne schaal downloaden (eenmalig en langzaam) en deze voor verschillende modellen vergridden (snel).

Toevoegen functionaliteit voor vergelijking met metingen en kalibratie

Unexpected behavior for cachedir

If you call the util.get_cache_netcdf function with a function (as the get_dataset_func argument) which in turn calls the util.get_cache_netcdf function unexpected errors may occur. The cachedir argument that was passed to the original util.get_cache_netcdf function is not passed onto the get_dataset_func function. Thus when util.get_cache_netcdf is called for a second time it is assumed that cachedir=None. If cachedir is None an attempt is made to create a temporary cache directory. Creating or accessing a temporary cache directory is not always possible and can result in an error. Also it is unexpected that a temporary directory is used for caching while a cache directory was specified in the original call of util.get_cache_netcdf.

some thoughts:

there is no good reason to call the util.get_cache_netcdf function with a function (as the get_dataset_func argument) that calls the util.get_cache_netcdf. Why cache the same thing twice? However it is also hard to check that this occurs.
The same error occurs when you use cachedir=None when calling the util.get_cache_netcdf function. A solution will be to raise an error when cachedir=None.

mf6: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.29' not found (required by mf6)

The tests in github actions that try to run a modflow model encounter the following error:

mf6: /lib/x86_64-linux-gnu/libm.so.6: version 'GLIBC_2.29' not found (required by mf6)

This has probably something to do with compiling the mf6 executable in the linux test environment. I try to use the same steps as flopy does in there automated testing described here:
https://github.com/modflowpy/flopy/blob/develop/.github/workflows/ci.yml

Drop support for regurlar grid (dis)

I think we should choose a specific grid-type, so we do not have to maintain a lot of code to support multiple grid-types. For almost every calculation some sort of grid-refinement will be needed, for example a coarser grid at the boundaries of the model.

Therefore I would propose to only support the disv-package (so what we call unstructured), even if the grid has constant column width and row height. In this way we can optimize our code for this special case, and generate methods to quickly transform and display irregular data on a regular grid, and vice versa.

Add vertices to model_ds

The plotting functions of nlmod require vertices. It would be nice if those were added to model_ds during grid discretisations (via get_vertices)

Install requirements not working properly

Hi All,
It looks like the install requirements are not properly configured. Only the packages listed in setup.py > install_requires are installed when installing nlmod via pip install -e .. The list of packages shown in Requirements.txt is much longer. Geopandas is one of those packages that is not installed with pip install -e ..

For a possible solution:
https://stackoverflow.com/questions/14399534/reference-requirements-txt-for-the-install-requires-kwarg-in-setuptools-setup-py

split up `nlmod.mlayers.update_model_ds_from_ml_layer_ds`

The function nlmod.mlayers.update_model_ds_from_ml_layer_ds has a lot of steps that are quite unclear. Especially the add_northsea part should be a separate function.

delr and delc not working in `gen_model_structured()`

If i pass delr=delc=50.0, I get a flopy error about delr and delc not being arrays or something along those lines... I'll add the error here later. But we might need to take a look at that.

Seaborn not installed

Seaborn package is not in the setup.py install requirements, nor is it in the requirements.txt

wrong nan value in `get_heads_array`

if fill_nans=True in the function nlmod.util.get_heads_array it automatically assumes that the nan_value is the maximum value. Sometimes the nan value is -1.e+30. Make the nan_value an optional argument of this function.

add function to clear cache from python

For convenience:
nlmod.utils.clear_cache(model_ws)

TypeError in example notebook 04_modifying_layermodels

When running the nlmod.mdims.mlayers.combine_layers_ds function in the example notebook 04_modifying_layermodels I get this error:

TypeError: Could not convert tuple of form (dims, data[, attrs, encoding]): (9, 10, 11, 12, 13) to Variable.

I think it has something to do with commit 17ed300 because the error is exactly in the modified line.

Functies voor aanmaken rec_list -> naamgeving verbeteren en werking verduidelijken

Er zijn een aantal functies in de mgrid module met namen als data_array_3d_to_rec_list en data_array_2d_vertex_to_rec_list waarvan het vrij ondoorzichtig is wat ze doen en waarom ze zo'n lange naam hebben. Kijken of de structuur van deze functies verbeterd kan worden en de naamgeving logischer kan. Evt. met uitleg notebook.

Als algmene regel kan worden afgekort:

DataArray -> da
Dataset -> ds

Error geting executables with get_exes.py

When I run the get_exes.py script on windows with pymake = 1.2.1 I get the following error:

Updating MODFLOW 6 executables from the nightly-build repo
Attempting to download the file:
    MODFLOW-USGS/modflow6-nightly-build/win64.zip
open request attempt 1 of 10
open request attempt 2 of 10
open request attempt 3 of 10
open request attempt 4 of 10
open request attempt 5 of 10
open request attempt 6 of 10
open request attempt 7 of 10
open request attempt 8 of 10
open request attempt 9 of 10
open request attempt 10 of 10
Cannot open request from:
    MODFLOW-USGS/modflow6-nightly-build/win64.zip


Traceback (most recent call last):

  File "C:\Anaconda3\envs\dev\lib\site-packages\pymake\utils\download.py", line 208, in _request_get
    req = requests.get(

  File "C:\Anaconda3\envs\dev\lib\site-packages\requests\api.py", line 76, in get
    return request('get', url, params=params, **kwargs)

  File "C:\Anaconda3\envs\dev\lib\site-packages\requests\api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)

  File "C:\Anaconda3\envs\dev\lib\site-packages\requests\sessions.py", line 516, in request
    prep = self.prepare_request(req)

  File "C:\Anaconda3\envs\dev\lib\site-packages\requests\sessions.py", line 449, in prepare_request
    p.prepare(

  File "C:\Anaconda3\envs\dev\lib\site-packages\requests\models.py", line 314, in prepare
    self.prepare_url(url, params)

  File "C:\Anaconda3\envs\dev\lib\site-packages\requests\models.py", line 388, in prepare_url
    raise MissingSchema(error)

MissingSchema: Invalid URL 'MODFLOW-USGS/modflow6-nightly-build/win64.zip': No schema supplied. Perhaps you meant http://MODFLOW-USGS/modflow6-nightly-build/win64.zip?


During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "C:\Users\onno_\02_git_repos\nlmod\tests\get_exes.py", line 160, in <module>
    test_download_nightly_build()

  File "C:\Users\onno_\02_git_repos\nlmod\tests\get_exes.py", line 131, in test_download_nightly_build
    pymake.download_and_unzip(url, exe_pth, verbose=True)

  File "C:\Anaconda3\envs\dev\lib\site-packages\pymake\utils\download.py", line 320, in download_and_unzip
    req = _request_get(

  File "C:\Anaconda3\envs\dev\lib\site-packages\pymake\utils\download.py", line 218, in _request_get
    req.raise_for_status()

UnboundLocalError: local variable 'req' referenced before assignment

However the test_download_and_unzip function did run and the executables were installed on my machine. Therefore I was wondering if we need the nightly build function or we can just skip it?

Also the list_exes() function in the get_exes.py is only made for linux and fails in windows. I think we can remove this function as well?

change tests to use models as fixtures

hopefully make testing a bit more efficient?

Use OC-package info in run-store function

I would be interested in an additional function in the util.py file or maybe it can be an extension of write_and_run_model. From the OC-package we know whether head and budget are stored. The function would read and store the outputs of the model in model_ds.

Additionally, I would like to see a option in the write_and_run_model function to run the model in a temporary directory. Thus within
a with statement.

with tempfile.TemporaryDirectory() as tmpdirname:
    # alter model_ws to tmpdirname
    # write packages
    # run model
    # read and store outputs in model_ds using info from the OC-package
    # revert model_ws

`get_ahn_within_extent` writes file to an unexpected directory

The function get_ahn_within_extent writes a tif file to a temporary directory. It would be better to write to the cache directory. This is complex because the cache directory is only accessed via the cache decorator.

Some (suboptimal) ideas to solve this:

write the data to the current working directory and clean this up after the function is finished
add an extra argument to the get_ahn function with a cachedir_ahn

Not all cache is reused when running a notebook the second time

When I run the Bergen model twice, not all cached data is reused, as can be seen below.

Intersecting oppwater with grid: 100%|██████████| 9/9 [00:02<00:00,  4.17it/s]
Loading gridded surface water data from cache.
INFO:nlmod.mdims.mlayers:get active cells (idomain) from bottom DataArray
INFO:nlmod.mdims.mgrid:get first active modellayer for each cell in idomain
INFO:nlmod.mdims.mlayers:using top and bottom from model layers dataset for modflow model
INFO:nlmod.mdims.mlayers:replace nan values for inactive layers with dummy value
INFO:nlmod.mdims.mlayers:add kh and kv from model layer dataset to modflow model
INFO:nlmod.mdims.mlayers:nan values at the northsea are filled using the bathymetry from jarkus
INFO:nlmod.cache:cache was created using different numpy array values, do not use cached data
INFO:nlmod.cache:cache was created using different dictionaries, do not use cached data
INFO:nlmod.mfpackages.mfpackages:creating modflow SIM, TDIS, GWF and IMS
INFO:nlmod.cache:caching data -> sea_model_ds.nc
INFO:nlmod.cache:cache was created using different numpy array values, do not use cached data
INFO:nlmod.cache:cache was created using different dictionaries, do not use cached data
INFO:nlmod.cache:caching data -> bathymetry_model_ds.nc
INFO:nlmod.mdims.mgrid:get first active modellayer for each cell in idomain
INFO:nlmod.cache:cache was created using different numpy array values, do not use cached data
INFO:nlmod.cache:cache was created using different dictionaries, do not use cached data
INFO:nlmod.mfpackages.mfpackages:creating modflow SIM, TDIS, GWF and IMS
INFO:nlmod.cache:caching data -> surface_water.nc```

Could this be that there are NaN values in the array and a different way of comparing two arrays should be used?
Is an optimization used to estimate certain values, so thtat there is always a small random error? Could we use np.isclose here?

Can't we easily make a test of this? Run a notebook twice and make sure nothing is downloaded the second time from the internet..

Best regards,
Bas

Notebooks are not tested during CI

I thought this was the case before. Please have a look at the dtscalibration package if you experience any issues with the implementation.

art_tools from DatasetcrossSection

the package DataSetCrossSection from art_tools is not available through pip or conda. It looks like a tool from Deltares. Please provide reference on how to access and install the package.

utils / read heads function

I think it is one of the more recent functions in flopy. So that the code of the get_heads_array and get_heads_dataarray can be written a lot leaner:

ml = sim.get_model(sim.name)

# collect heads
heads = ml.output.head().get_alldata()[:, :, 0, :]
heads[heads == ml.hdry] = np.nan
heads[heads == ml.hnoflo] = np.nan

model_ds['heads'] = xr.DataArray(
    data=heads,
    dims=('time', 'layer', 'cid'),
    attrs={'units': 'mNAP'})

where ml is the grondwater flow model, which is in other examples denoted with gwf. For simple cases this also works for reading the budgetfile, but for example not for the Bergen case..

The units are also stored in the flopy objects:

length_unit = ml.disv.length_units.get_data()
time_unit = sim.tdis.time_units.get_data()

It would be nice to have all all dataarrays with units. Especially with the budgetfile this is important.

Improve documentation

I added documentation, available via rtd. There are quite some improvements to be made:

Theres a mix of Dutch and English in some docstrings. We should probably pick one and apply everywhere.
Check if all functions have a docstring
Add introduction to index.rst
Add more info on dependencies and nlmod usage in getting_started.rst
Improve notebook titles for examples section in documentation
Maybe reorganize API-docs into sections instead of one long page

topbot = np.concatenate([model_ds.top.data[np.newaxis], model_ds.bot.data], axis=0)
thickness = -1 * np.diff(topbot, axis=0)
idomain = xr.where(thickness < 1e-5, -1, model_ds["idomain"])