Git Product home page Git Product logo

ngen's Introduction

Next Gen Water Modeling Framework Prototype

July 2020 webinar recording

July 2020 webinar slide deck

Next Gen Github Pages Documentation

Description:
As we attempt to apply hydrological modeling at different scales, the traditional organizational structure and algorithms of model software begin to interfere with the ability of the model to represent complex and heterogeneous processes at appropriate scales. While it is possible to do so, the code becomes highly specialized, and reasoning about the model and its states becomes more difficult. Model implementations are often the result of taking for granted the availability of a particular form of data and solution -- attempting to map the solution to that data. This framework takes a data centric approach, organizing the data first and mapping appropriate solutions to the existing data.

This framework includes an encapsulation strategy which focuses on the hydrologic data first, and then builds a functional abstraction of hydrologic behavior. This abstraction is naturally recursive, and unlocks a higher level of modeling and reasoning using computational modeling for hydrology. This is done by organizing model components along well-defined flow boundaries, and then implementing strict API’s to define the movement of water amongst these components. This organization also allows control and orchestration of first-class model components to leverage more sophisticated programming techniques and data structures.

  • Technology stack: Core Framework using C++ (minimum standard c++14) to provide polymorphic interfaces with reasonable systems integration.
  • Status: Version 0.1.0 in initial development including interfaces, logical data model, and framework structure. See CHANGELOG for revision details.

Structural Diagrams

Catchments

Catchments: Catchments represent arbitrary spatial areas. They are the abstraction used to encapsulate a model. The three marked catchments could use three different models, 3 copies of the same model, or some combination of the previous options.

Realizations

Realizations: Different kinds of catchment realizations can be used to encapsulate different types of models. These models will have different types of relations with neighbors. When a relation exists between two adjacent catchments synchronization is necessary.

Complex Realizations

Complex Realizations: An important type of catchment realization is the complex catchment realization. This allows a single catchment to be represented by a network of higher detail catchment realizations and their relationships. This allows the modeled area to be represented at multiple levels of detail and supports dynamic high resolution nesting.

Dependencies

See the Dependencies.

Installation

See INSTALL.

Configuration

To view the compile-time configuration of an pre-compiled NextGen binary use the --info flag, as in ngen --info. for more info see: #679

Usage

To run the ngen engine, the following command line positional arguments are supported:

  • catchment_data_path -- path to catchment data geojson input file.
  • catchment subset ids -- list of comma separated ids (NO SPACES!!!) to subset the catchment data, i.e. 'cat-0,cat-1', an empty string or "all" will use all catchments in the hydrofabric
  • nexus_data_path -- path to nexus data geojson input file
  • nexus subset ids -- list of comma separated ids (NO SPACES!!!) to subset the nexus data, i.e. 'nex-0,nex-1', an empty string or "all" will use all nexus points
  • realization_config_path -- path to json configuration file for realization/formulations associated with the hydrofabric inputs
  • partition_config_path -- path to the partition json config file, when using the driver with distributed processing.
  • --subdivided-hydrofabric -- an explicit, optional flag, when using the driver with distributed processing, to indicate to the driver processes that they should operate on process-specific subdivided hydrofabric files.

An example of a complete invocation to run a subset of a hydrofabric. If the realization configuration doesn't contain catchment definitions for the subset keys provided, the default global configuration is used. Alternatively, if the realization configuration contains definitions that are not in the subset (or hydrofabric) keys, then a warning is produced and the formulation isn't created. ./cmake-build-debug/ngen ./data/catchment_data.geojson "cat-27,cat-52" ./data/nexus_data.geojson "nex-26,nex-34" ./data/example_realization_config.json

To simulate every catchment in the input hydrofabric, leave the subset lists empty, or use "all" i.e.: ngen ./data/catchment_data.geojson "" ./data/nexus_data.geojson "" ./data/refactored_example_realization_config.json ngen ./data/catchment_data.geojson "all" ./data/nexus_data.geojson "all" ./data/refactored_example_realization_config.json

Examples specific to running with with distributed processing can be found here.

How to test the software

The project uses the Google Test framework for creating automated tests for C++ code.

To execute the full collection of automated C++ tests, run the test_all target in CMake, then execute the generated executable. Alternatively, replace test_all with test_unit or test_integration to run only those tests. For example:

cmake --build cmake-build-debug --target test_all -- -j 4
./cmake-build-debug/test/test_all

Or, if the build system has not yet been properly generated:

git submodule update --init --recursive -- test/googletest
cmake -DCMAKE_BUILD_TYPE=Debug -DNGEN_WITH_TESTS:BOOL=ON -B cmake-build-debug -S .
cmake --build cmake-build-debug --target test_all -- -j 4
./cmake-build-debug/test/test_all

See the Testing ReadMe file and wiki/Quickstart for a more thorough discussion of testing.

How to debug the software

This is all developed via CMake, so a specific setting must be active within the root CMakeList.txt file:

target_compile_options(ngen PUBLIC -g)

This will ensure that ngen and all of the code that is compiled with it has debugging flags enabled. From there, the application may be run via gdb, lldb, or through your IDE.

If you do not have administrative rights on your workstation, there's a chance you do not have access to gdb or lldb, meaning that you cannot step through your code and inspect variables. To get around this, you can use GitPod to start an editor (based on VSCode) in your browser and edit and debug to your heart's content. You can access an individualized GitPod environment through: https://gitpod.io/#https://github.com//ngen. Entering it for the first time will generate a new git branch.

There are a few things required, however. When you first enter, gitpod will ask you if you want to set up your environment. Let it create a .yml configuration file. It will then ask if you want it to create a custom docker image. Say yes, then choose the default image. At the end, you should have a .gitpod.yml and .gitpod.dockerfile at the root of the project.

Next, you will need to add the above target_compile_options(ngen PUBLIC -g) just about anywhere in the CMakeLists.txt file within the root of your project.

Next, you will need to make sure that all dependencies are installed within your environment. The image GitPod supplies uses an application name HomeBrew to allow you to install dependencies. You will need to run:

brew install boost

to proceed further. Now clear all of your previously built binaries and build your application (ngen or any test routine that you're interested in, such as test_all).

A debugging extension should be installed into your workspace. Select the bottom icon on the left hand side of your screen; it should look like a box with a square in it. CodeLLDB is a good extension to use.

Lastly, a debugging configuration must be set up. There is an icon on the left hand side of your screen that should be a bug with a slash through it, somewhat like a 'No Parking' sign. If you click it, it will open a debugging tab on the left hand side of your screen. Within it, you should see a play button next to a drop down menu that says 'No Configurations'. Click on that, then click on the option named "Add Configuration...". This will create a file named launch.json. Within it, add a configuration so that it looks like:

{
  // Use IntelliSense to learn about possible attributes.
  // Hover to view descriptions of existing attributes.
  "version": "0.2.0",
  "configurations": [
    {
        "name": "ngen",
        "type": "lldb",
        "request": "launch",
        "program": "${workspaceFolder}/<your build directory>/ngen",
        "args": [],
    }
  ]
}

You will now have the configuration named ngen after saving your launch.json file. You may now add a break point within your code by clicking to the left of the line number within your code. This should make a red circle appear. Now, when you run it by clicking the play button in the debugging window, your code will stop on the line where you put your break point, as long as it executes code. It will not stop on whitespace or comments.

Known issues

Document any known significant shortcomings with the software.

Getting help

Instruct users how to get help with this software; this might include links to an issue tracker, wiki, mailing list, etc.

Example

If you have questions, concerns, bug reports, etc, please file an issue in this repository's Issue Tracker.

Getting involved

This section should detail why people should get involved and describe key areas you are currently focusing on; e.g., trying to get feedback on features, fixing certain bugs, building important pieces, etc.

General instructions on how to contribute should be stated with a link to CONTRIBUTING.


Open source licensing info

  1. TERMS
  2. LICENSE

Credits and references

  1. Projects that inspired you
  2. Related projects
  3. Books, papers, talks, or other sources that have meaningful impact or influence on this project

ngen's People

Contributors

aaraney avatar adunkman avatar ajkhattak avatar ben-choat avatar champham avatar christophertubbs avatar dblodgett-usgs avatar donaldwj avatar hellkite500 avatar jdmattern avatar jmframe avatar joshkotrous avatar madmatchstick avatar mattw-nws avatar philmiller avatar program-- avatar robertbartel avatar snowhydrology avatar stcui007 avatar trupeshkumarpatel avatar zacharywills avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ngen's Issues

Request: Link GitHub.io page in readme

It would be beneficial as a user and potential contributor if a link to the project specific GitHub Pages were linked in the About or root level readme. I would be happy to open a PR adding this to the readme if you all see fit.

make_unique<>() not supported on gcc 4.8.5

RHEL/CentOS 7 gcc tool chain is 4.8.5 and even with --std=c++1y (for c++14) make_unique is not supported.

Current behavior

Build fails using gcc 4.8.5 with c++14 standards

Expected behavior

Buildable on RHEL/ContOS 7

##Proposed
add a custom make_unique<>() implementation in utilities with a macro gaurd to enable it for gcc < 4.9

Follow up for formulation manager

Need to review comments on #147 and open issues to address some of the refactoring and cleanup that was mentioned in that PR. The PR was merged to allow other work to continue in development, but some of the noted changes need to be addressed.

basin_id?

I don't want to butt into #142 so I'll ask over here.

What's a basin? I didn't think we were working with that concept.

Latest updates do not compile on GCC 4.8.5

I have GCC 4.8.5. I get this error at build:

ngen/src/NGen.cpp:229:33: error: use of deleted function ‘std::basic_ofstream& std::basic_ofstream::operator=(const std::basic_ofstream&)’
nexus_outfiles[feat_id] = std::ofstream("./"+feature->get_id()+"_output.csv", std::ios::trunc);

Expected behavior

My build worked fine with Donald's recent PR before pulling the latest updates from June 30 and July 1.

Refactor Pdm03

In PR #27 a new flux/physics kernel module was added, Pdm03.h
This kernel needs to be updated to comply with naming conventions and the file itself should get a more descriptive name.

Current behavior

Pdm03.h is not up to standard for naming conventions.

Expected behavior

More clear file, interface, and variable names.

Variable naming consistency in schaake_partitioning

Schaake_parenthetical_term = (1.0 - exp ( - Schaake_adjusted_magic_constant_by_soil_type * timestep_d));

Ic = column_total_soil_moisture_deficit_m * Schaake_parenthetical_term;

Px=water_input_depth_m;

To be consistent with variable naming conventions, all variables should be lowercase

Utility.hpp needs to be updated or removed.

The code in model/kernel/utility.hpp is legacy code used to support the testing of some translated kernels. This has memory allocation using malloc() which should be replaced with more c++ centric memory management, i.e. vectors for arrays of double/float.

Make PR template more appropriate

Opening this issue to draw attention to the PR template. The template needs better alignment with the project; the template copied at repo init don't appropriately capture the nature of this work. Specific attention needed in the PR checklist, Accessibility, and Other sections.

Update README.md to reflect first sprint changes

Short description explaining the high-level reason for the new issue.

Current behavior

README.md exists as per the template but doesn't reflect first sprint changes.

Expected behavior

README.md should reflect the project at the end of each sprint.

Steps to replicate behavior (include URLs)

Screenshots

Unnessicary int* in Forcing object

int *forcing_vector_index_ptr = new int;

This slipped by in review, but the index pointer here doesn't need to be a literal pointer, just the int index that keeps track of where in the forcing vector the object is looking at the moment.

Just make this an int on the stack int forcing_vector_index_ptr; initialize to 0 in constructor, increment as designed (minus the pointer deference).

The catchment/waterbody/COMID mapping and relationships need cleanup

std::string example_catchment_id = "wat-88";

Here is an example where a waterbody ID is being treated as a catchment id in order to associate a COMID with a catchment to read its GIUH data. This is technically the wrong semantics, and should not propagate further.

One potential solution is to add the catchment id to the crosswalk.json file. This has been raised on hygeo as a potential fix.

dblodgett-usgs/hygeo#15

Bugs with excess calculation in Reservoir::response_meters_per_second

The response_meters_per_second function in the Reservoir class does not handle calculating excess water properly in all cases.

Current behavior

  • In some cases, the function does not initialize the excess_water_meters parameter (passed in as a reference and used to indirectly return the excess amount), which it appears to be designed to do, even if the value should be 0.0.
  • In cases when there is excess - i.e., when the reservoir's state.current_storage_height_meters is greater than parameters.maximum_storage_meters - the function calculates the excess amount after resetting state.current_storage_height_meters to be equal to the max, which will thus always make the excess value 0.0.

Expected behavior

The pass excess reference should always be initialized to either 0.0 or the amount of the reservoir's current storage that is greater than its max capacity, at the end of of each call to response_meters_per_second.

More Reservoir Functional Unit Testing Needed

More reservoir unit testing needed to directly test particular reservoir functions.

Expected behavior

We should have one or more TEST_F(ReservoirKernelTest, Test****) for each function in a class. More than one function should be used to test different asusmptions/code paths. i.e. TEST_F(ReservoirKernelTest, TestReservoirAddOutlet_1) might test that all assertions are good when reservoirs are added in an already sorted order. TEST_F(ReservoirKernelTest, TestReservoirAddOutlet_2) might test that the outlet states are all valid when an outlet with too large of activation is added...

We might want to refactor the testing at some point so we have something like
TEST_F(ReservoirKernelTest, TestReservoirAddOutlet)
{
NoOutletReservoir2->add_outlet(ReservoirLinearOutlet);
NoOutletReservoir2->add_outlet(0.3, 0.5, 0.0, 100.0);
ASSERT_TRUE(true);
}

@jdmattern-noaa will address.

HYMOD README.md should reflect changes made to HYMOD

HYMOD changes should be described here:
https://github.com/NOAA-OWP/ngen/blob/master/models/hymod/include/Hymod.h

Current behavior

I copied the HYMOD README.md from the repository but we changed some things in our implementation. These should be reflected by changing the README.md from the original to reflect any changes.

Expected behavior

HYMOD should have a README that describes our version used for ngen

Steps to replicate behavior (include URLs)

Screenshots

Tests failing for last merge

Short description explaining the high-level reason for the new issue.

Current behavior

Tests run but fail

[----------] 1 test from ForcingTest
80
[ RUN ] ForcingTest.TestForcingDataRead
81
/home/Z/actions-runner/_work/_temp/1ac2add5-d50d-471a-994c-51763b638d59.sh: line 1: 5313 Segmentation fault (core dumped) ./cmake_build/test/test_all
82
##[error]Process completed with exit code 139.

Expected behavior

Tests should pass

Steps to replicate behavior (include URLs)

https://github.com/ZacharyWills/ngen/runs/671741082

Screenshots

Add instructions for gitpod

Since NGen is being initially targeted for unix machines, developing on windows can be troublesome. A workaround for this is gitpod. Gitpod isn't perfect, but can provide a space to develop, build, and run NGen from a convenient location.

Instructions for how to get started should be added to the contributing documentation and possibly the readme.

Support realizations instantiated from different "hydrofabric" definitions

Support for dynamic hydrofabric realization construction.
Relative to (tentative) sprint 5 task 4.

Current behavior

Hydrofabric definitions are statically linked to a single input file/representation.

Expected behavior

A realization is defined by a type in realization_config.json input. Realizations will have to be connected to parameters, either in this file or in another file linked by identity.

One parameter of a realization could be its hydro-fabric definition, so when the driver factory creates a realization, it passes the realization the location of its hydro-fabric allowing each realization to be instantiated from from potentially different input.

Simple use case: two realizations reading the same hydro-fabric defined in two different independent input files.

Index needs bounds checks

return this->outlets[outlet_index]->get_previously_calculated_velocity_meters_per_second();

This function must bound check the index at minimum to prevent seg faults. Might also be worth implementing an unordered map "name" outlets. i.e. unoderedmap<string, int> outlet_map then outlet_map[name] = index. Could then lookup by name outlet_map['lateral_flow'] so this function could take a string name arg.

Improve CSV Reader Class Handling of Missing Files

Improve how the CSV reader class (and associated usages of it) handle cases when an expected file doesn't exist. Right now the CSV reader itself doesn't fail, but in such cases it doesn't read anything either, leading to inconsistent behavior downstream in program execution.

A particular example was seen in unit testing for forcings. If the unit test executable is run from the main directory, rather than when inside the build directory, then a relative path needed for the ForcingTest.TestForcingDataRead test will not be correct, leading to a crash of the unit test executable.

Adjustments getting reservoir response in Tshirt model

The usage of the Reservoir::response_meters_per_second function in the tshirt_model class is not entirely correct/consistent, especially regarding parameters obtained from Schaake partitioning. These need to be corrected.

More generally, it should be considered whether the function's parameter for how much new water is being introduced should be a depth-type value in the form of meters (thereby implicitly meters per timestep), or a volumetric value with units of meters per second (as is currently the case).

Simplify the process of creating realizations from configuration

The current process for creating a set of realizations from the configuration involves reading the JSON, loading specifications into Realization_Config objects, then later calling get_realization() from the config object, which then calls a separate get_X function for the proper realization type, which then reads a static array of required fields, then constructs the parameters and calls the constructor for the appropriate realization.

Instead, the JSON reading should yield the JSON objects that can then be used to create the realizations and bypass Realization_Config altogether.

The arrays of parameter names can then be moved to the realizations themselves and each realization can have a constructor that takes the JSONProperties as a parameter.

404 on github page

In order for the github pages to work, a _config.yml must be provided. It's in the master branch, but not gh-pages.

Without it, I believe that it tries to read the source documents from an incorrect directory, preventing data from being properly deployed and resulting in a 404.

FeatureBase.geometry() is not implemented

Each geojson class has a function called geometry() that returns their respective geometry from within its variant. It's not implemented on its parent class, however. As a result, if you try to call geometry() on a FeatureBase instance (not a properly case PolygonFeature, PointFeature, etc), you get an error on compilation because it can't find the symbol.

Refactor nexus

Need to have HY_HydroNexus as a purely topological entity that manages neighboring relationship/identities.

NexusRealizations use these relationships, implementing also HY_HydroLocation. Specific transfer of fluxes/information become formulations attributed to these classes.

@donaldwj @christophertubbs @robertbartel @dblodgett-usgs

Boost minimum version should be 1.72

Builds fail with versions of boost < 1.72.

Current behavior

1.72 is only recommended, but builds fail with 1.69

Expected behavior

Minimum version should be enforced in cmake as 1.72 and explicitly required in dependencies.md

Unitialized basin_id

Forcing.h gets repeated warnings on build because one of the constructors sets basin_id(basin_id), but, since no basin_id is passed, the constructor just sets the basin_id as the value it already was (which wasn't anything).

Would anyone be opposed to me setting that as 0 to ensure that it's at least getting the default value or just taking it out of the signature?

Supporting HY_FlowPath concepts in the framework to connect hydrologic dependencies

The Nexus read/inspection API can be used to couple FIM/Waterbody models to FlowPaths to provide information to a waterbody.

Need to differentiate Realization stuff from Formulation stuff in the NGEN class hierarchy, and build an explicit HY_FlowPath realization -> formulation middle ware.

This will help connect hydrologic dependencies in a consistent, automatic way.

We may need a hybrid realization type which does area stuff and flow path stuff, i.e. is aware of hydrologic channel routing properties and connected explicitly with hydrologic land surface components.

This may also require implementation of some/all of the HY_HydroLocation concepts.

@dblodgett-usgs @christophertubbs @BrianAvant-NOAA any additional thoughts or comments are welcome!

Request to remove attributes from GIUH.json file

Request to simplify GIUH output file.

Current behavior

The GIUH.json file contains attributes that may not be used in ngen and removing them could greatly reduce file size. Current attributes include cumulative frequency distribution ('CumulativeFreq'), incremental runoff per hour ('hrHydro') and incremental runoff per minute ('minHydro') time series. It looks like only 'CumulativeFreq' is being used in the demo. Do we need these other attributes? We should be able to derive incremental runoff for any time step from the cumulative frequency time series. @hellkite500 @robertbartel

Expected behavior

Remove redundant attributes from GIUH.json

Steps to replicate behavior (include URLs)

Screenshots

Forcing.h is unguarded

Forcing.h is unguarded, so it is possible to run into errors where a subclass might define it more than once and run into errors.

There should just be an #ifndef guard around it.

Automatically Run NGen on PR

We currently have the data needed to run the driver packed into the repo. As of this writing, if you clone the repo and try to run it, it should be able to run. Unfortunately, some of the sample data was moved, invalidating some of the paths in the driver. It's a small, easy change, but it would have been nice if a message was created on the PR generation stating "Hey, this will error out".

It will probably be helpful to run this in the testing action and assert that the exit code was correct in order to make sure it was able to run until completion.

There's no way to naturally link IDs to geojson features

The only way to link features to their ids and neighbors is to parse the locations, iterate through them and read the ids from their proper attributes or properties, set the id to that value, then call the FeatureCollection function to link everything. If you're lucky and have a geojson with regular id member values, a third of the work may be done during parsing. You'll still need to call functions afterwards.

Instead, reading and linking everything should be more natural, requiring only one or two functions to parse, ID, and link everything.

Compilation Bug for Unit Tests in Certain Environments

Current behavior

Compiler version: g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)
CMake version: cmake3 version 3.14.6

Bug not present for other users with previous compiler versions.

Errors from:
ngen/test/geojson/JSONGeometry_Test.cpp
ngen/test/geojson/Feature_Test.cpp

Multiple instances of this line in both files cause compilation errors:
stream = std::stringstream();

Compiles fine when corrected to:
stream.str("");

Expected behavior

PR to follow with this fix.

Steps to replicate behavior (include URLs)

  1. Clone repo: git clone https://github.com/NOAA-OWP/ngen.git
  2. cd ngen
  3. Update googletest: git submodule update --init --recursive -- test/googletest
  4. Create build directory: cmake -DCMAKE_BUILD_TYPE=Debug -B cmake-build-debug -S
  5. Compile with CMake: cmake --build cmake-build-debug --target test_unit -- -j 4

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.