eagles-project / haero Goto Github PK

A toolbox for constructing performance portable aerosol packages

License: Other

CMake 26.84% C++ 56.09% Shell 17.08%

haero's Introduction

Haero: A High-Performance Aerosol Library

Haero is a library that contains parameterizations that describe the dynamics of aerosols in the atmosphere. Rather than providing an aerosol package to be coupled in a specific way with a host models, it provides direct access to individual aerosol parameterizations tied to specific governing equations. This low-level approach allows an atmospheric "host model" to use its own coupling and time integration logic with these parameterizations.

The short-term goal of Haero is to provide the capabilities of the MAM4 package to E3SM's state-of-the-science cloud-resolving atmospheric model, SCREAM.

Supported Platforms

You can build and run Haero on a Mac (Intel, not M1) or Linux laptop or workstation. We also support a limited number of platforms on which the Haero model is built and tested:

NERSC Cori
Compy, Constance, Deception at PNNL

Required Software

To build Haero, you need:

CMake v3.12+
GNU Make
reliable C and C++ compilers
a working MPI installation (like OpenMPI or Mpich), if you're interested in multi-node parallelism.

You can obtain all of these freely on the Linux and Mac platforms. On Linux, just use your favorite package manager. On a Mac, you can get the Clang C/C++ compiler by installing XCode, and then use a package manager like Homebrew or MacPorts to get the rest.

For example, to download the relevant software on your Mac using Homebrew, type

brew install cmake openmpi

Building the Model

To configure Haero:

Make sure you have the latest versions of all the required submodules:
```
git submodule update --init --recursive
```
Create a build directory by running the setup script from the top-level source directory:
```
./setup build
```
Change to your build directory and edit the config.sh file to select configuration options. Then execute config.sh to configure the model. If you're on a machine that requires modules to get access to compilers, etc, use source config.sh to make sure your environment is updated.
From the build directory, type make -j to build the library. (If you've configured your build for a GPU, place a number after the -j flag, as in make -j 8).
To run tests for the library (and the driver, if configured), type make test.
To install the model to the location indicated by PREFIX in your config.sh script, type make install. By default, products are installed in include, lib, bin, and share ѕubdirectories within your build directory.

Making code changes and rebuilding

This project uses build trees that are separate from source trees. This is standard practice in CMake-based build systems, and it allows you to build several different configurations without leaving generated and compiled files all over your source directory. However, you might have to change the way you work in order to be productive in this kind of environment.

When you make a code change, make sure you build from the build directory that you created in step 1 above:

cd /path/to/haero/build
make -j

You can also run tests from this build directory with make test.

This is very different from how some people like to work. One method of making this easier is to use an editor in a dedicated window, and have another window open with a terminal, sitting in your build directory.

The build directory has a structure that mirrors the source directory, and you can type make in any one of its subdirectories to do partial builds. In practice, though, it's safest to always build from the top of the build tree.

Generating Documentation

Documentation for Haero can be built using mkdocs. In order to build and view the documentation, you must download mkdocs and its Material theme:

pip3 install mkdocs mkdocs-material

Then, run mkdocs serve from the root directory of your Haero repo, and point your browser to http://localhost:8000.

At this time, Haero's documentation includes an extensive design document describing the design approach used by Haero, including high-level descriptions of its aerosol parameterizations.

FAQ

Building and Rebuilding

When I run config.sh, I see an error complaining about a bad fd number! You probably typed sh config.sh to run the configuration script. It's actually a bash script. Just type ./config.sh.
How do I "reconfigure my build"? If you want to change a compile-time parameter in your model, you must reconfigure and rebuild it. To do this, edit your config.sh and change the parameter as needed. Then rerun it with ./config.sh. After the script finishes, executing you can type make -j to rebuild the model.
A pull request has the reconfig required label. What does this mean? A pull request with the reconfig required label has made a change to the structure of the config.sh script, so you must rerun setup <build_dir> to regenerate your config.sh script. Once you've regenerated this script, you can reconfigure and build as usual.

Testing

Where are testing results stored when I type make test? All testing results are logged to Testing/Temporary/LastTest.log within your build directory. A list of tests that failed is also written to LastTestsFailed.log in that same directory.

Source Control and Repository

Git thinks I have modifications in my submodules? Git submodules are annoying! Help! We agree. These warnings about "dirty" modifications are irritating and useless. You can get rid of them by setting the following config parameter for Git:
```
git config --global diff.ignoreSubmodules dirty
```
Why must I clone the submodules for libraries that I already have installed locally? Git submodules are annoying! Help! We agree. The submodule mechanism is a leaky abstraction and doesn't allow us to easily select which submodules should be cloned, so we just clone them all to keep things simple. We'll address this issue when a solution becomes more obvious.

haero's People

Contributors

Stargazers

Watchers

Forkers

omarkahmed pclaeroparams mahf708

haero's Issues

Create init, run, finalize subroutines for new nucleation module

This issue tracks discussion and progress related to the transplant of MAM4's nucleation process to the Haero library.

The MAM4 approach to nucleation has been documented in the Processes chapter of the Haero design document.
Source code for MAM4's nucleation module can be found here: https://github.com/eagles-project/mam_refactor/blob/master/core/modal_aero_newnuc.F90
The nucleation subroutines are called by subroutines in MAM's microphysics module: https://github.com/eagles-project/mam_refactor/blob/master/core/modal_aero_amicphys.F90
Configuration options that control various MAM4 processes are here: https://github.com/eagles-project/mam_refactor/blob/master/core/modal_aero_microp_control.F90

This module has been mostly refactored and transplanted into Haero, in the mam_nucleation module (in haero/processes/mam_nucleation.F90).

Working branch: https://github.com/jeff-cohere/haero/tree/jeff-cohere/nucleation-transplant

Remaining work

Address any changes in the representations of quantities that differ between MAM are HAERO
Verify that units are consistent within this process
Set up some verification tests to make sure that nucleation does essentially what we think it should

Verification Tests

Initialize an atmospheric column that stretches from a planetary surface to above the planetary boundary layer, and initialize an aerosol system containing no sulfate. Initialize gas species including sulfuric acid. Run a time step, and verify that nucleation has occurred, producing sulfate aerosol at the expense of sulfuric acid gas. Check that no mass was created, and that mass destroyed is "reasonable". Check also that nucleation above and below the planetary boundary layer differs by the PBL adjustment factor or something close to it.
Same test, but with NH4 and NH3 instead of sulfate.
Same test, with both sulfate and NH4/NH3, to check that both kinds of nucleation occur.

Create a unit test to demonstrate host model nested parallel dispatches

As a first step to prove to ourselves that our single-column library approach can work on the machines we're targeting, it would be nice to demonstrate a nested parallel dispatch for a GPU build. Specifically:

The outer loop dispatches a parallel for loop that calls a virtual function on some polymorphic object with a specific "column index"
The virtual function performs a parallel for loop, using the column index to do something

We believe that the outer loop here executes on the host (CPU) and not, say, on the device (GPU). If this is true, we don't need to worry about instantiating aerosol process objects on GPUs--they can live on the CPU, and their run/update methods will perform work on a GPU, as long as the views they manipulate are stored there.

This task requires that we can configure Haero to build on GPU-equipped systems properly.

Define driver dynamics as State

Implement the 1d dynamics model in code to provide atmospheric state and tendencies to parameterizations.

Add Kokkos device parameters to the build system

To date, we've been focusing exclusively on CPU code, to straighten out basic stuff--data structures, interfaces, etc. Now that we're starting to think more seriously about how Haero works on GPUs and the like, it's time to teach our build system to handle cases where our "device" (DefaultDevice in the code) is, for example, a CUDA GPU.

I'm adding the following build options (which will be available in any config.sh newly generated by the top-level setup script):

DEVICE (HAERO_DEVICE in CMakeLists.txt): indicates where the number crunching happens. Can be set to CPU or CUDA.
DEVICE_ARCH (HAERO_DEVICE_ARCH in CMakeLists.txt): indicates the architecture of the device. Many options will be listed in config.sh--you just need to uncomment the one that you want.

This is a prerequisite for #69.

FYI @overfelt

Define Haero's units and state variables

To take advantage of the fact that Haero is a library that provides a well-defined interface to a host model, let's lay down some rules for host models that use Haero. Here's a proposal based on a conversation between @jeff-cohere , @huiwanpnnl , and @singhbalwinder .

Haero Units

All quantities in Haero are expressed in SI units, the same system used in a contemporary undergraduate physics course. We may choose to make some exceptions for certain quantities, but for now, let's keep it simple.

Haero Aerosol Prognostic State Variables

Haero defines its aerosol populations in terms of:

q_{m, s}: The mass mixing ratio of aerosol species s within mode m [kg species s/kg dry air]
n_m: The number concentration of mode m [# particles/kg dry air]

Meanwhile, gases (which don't belong to modes) are defined by

q_g: The mass mixing ratio of gas species g [kg species g/kg dry air]

Haero Aerosol Diagnostic State Variables

Since Haero's diagnostic variables depend on the specific processes selected for a given model, the only thing we can say about them is that they use SI units (which you already knew, of course!).

Haero Non-Aerosol Atmospheric State Variables

This is under discussion in #57 . We'll move the discussion here as the material there matures.

How Is This Information Used?

When a host model calls a Haero process, it is assumed that the correct input quantities are expressed in Haero units, and the output quantities are also in Haero units. If a host model uses a different representation of an atmospheric quantity (or an aerosol, if that's ever the case), it must convert these (input) quantities to Haero units before calling any Haero process. Likewise, it must convert output back from Haero units to its own units after the call.

Many of the aerosol processes we have inherited from MAM use various units and quantities that differ from those above. A process implementation is free to use any representations it likes, as long as input is converted from Haero units first, and output is converted to Haero units afterward.

Adopting a consistent system of units/quantities for Haero allows us to simplify existing process code and make it clearer where unit conversions and transformations between thermodynamic variables occur.

@huiwanpnnl @singhbalwinder @pbosler @mschmidt271 @overfelt

Describe the PrognosticProcess and DiagnosticProcess interfaces in design doc

Redo indexing of aerosol data to simplify the use of Views and Packs

A conversation with @overfelt on Tuesday got me thinking. Our current indexing scheme for referencing modal aerosol data is a rank-4 indexing space (m, i, k, s), where

m identifies a particular aerosol mode
i identifies an atmospheric column of aerosols
k identifies a vertical level within the ith column
s identifies an aerosol species within mode m in the ith column and kth vertical level

One of the difficulties we face in representing modal aerosol systems is that different modes can have different numbers of aerosol species. This means that a View that stores modal aerosol data looks like a "ragged array", in which different modal indices (m and m1, say) refer to subviews with different dimensions.

Kokkos views are rectangular by necessity. Accordingly, we have represented the above "rank-4" indexing scheme by a vector of rank-3 rectangular arrays, sized appropriate for a mode m. If we want to manipulate aerosol data using rectangular arrays, it might be nice to adopt a different indexing scheme in which (m, s) can be mapped to a single index in a rectangular rank-3 array with the following indexing scheme:

(m, s) identify a species s within mode m in a single index (l, say)
i and k identify the ith column and kth vertical level as above

Then, given that aerosol physics is spatially local, we would change the ordering of the indexes to move the spatial indices i and k to "slower-changing" positions:

u(i, k, l) -> data for the modal species l -> (m, s) within the ith column at the kth vertical level.

We could map (m, s) to l using e.g. a compressed row representation and add helper methods to our Prognostics and Diagnostics containers. This would let us use rectangular arrays for modal aerosol data, no matter how much the number of species varied in different modes, and without wasting memory.

Perhaps we can even think of a way to use this index mapping to make Packs/SIMD operations more transparent. But that's speculative at this point.

@pbosler @overfelt @singhbalwinder : your thoughts are welcome. If you're interested in seeing what I mean by a "compressed row representation", I can put something together.

1D Dynamics to Design Doc

Plots of solutions, plus function definitions into design doc.

Add aerosol parameterizations

Decoupled processes

Aerosol nucleation & transport, unrelated to microphysics

Coupled to microphysics

Droplet activation to cloud condensation nuclei (interact with condensation parameterization)

etc...

Dependencies:

1D dynamics #14
Simple microphysics #15

Possible issues

May need a transport scheme
May need vertical mixing

Define Driver dynamics tendencies

Driver dynamics is implemented to provide state data, but does not provided tendencies yet.

They will be necessary if we want to test parameterizations' time integration.

Keep design docs in-sync with source code changes

Summary

The current design doc includes source code listings to help illustrate its discussion. These source code snippets are written within the .tex files because the source code files are too long (with private functions, doxygen comment markups, etc.) to include in the design doc.

This issue marks as a to-do task to find an automated way to keep the source code snippets up-to-date with source code changes.

Update NcWriter to new indexing

Now that we have (relatively) stable indexing strategy, we need to update NcWriter to work with the following classes:

Diagnostics: with view types ColumnView, SpeciesColumnView, and ModalColumnView
Prognostics: with view types SpeciesColumnView, ModalColumnView

new netcdf dimensions: species (corresponding to index s), mode (corresponding to index m)

Assumption: All of these data are defined at level midpoints, not level interfaces.

@jeff-cohere : let me know if I've missed anything.

Translate 4-mode aerosol chemical mechanism input file to ChemKin format and run using TChem

Here's an interesting aerosol chemistry mechanism:

https://github.com/E3SM-Project/E3SM/blob/master/components/eam/chem_proc/inputs/modal_aerosols_4mode_mom.in

TChem uses the Chemkin format, and (hopefully soon) will provide a YAML-based input format as well. For now, let's

Translate the above input file into Chemkin format
Run the Chemkin-formatted version with TChem's driver
Work through any issues encountered that prevent the successful run of the mechanism
Produce diagnostics (plots, etc) to convince ourselves that TChem can handle a system like this one.

Describe Mode, Species, and Model types in design doc

Create an input specification for the haero driver.

The haero driver is crucial to our testing approach, so it needs to be sophisticated enough for us to set up relevant test problems for verification.

We've got a start on an initial input specification, but it's not recorded in our design doc. Time to write it up.

Describe Prognostics and Diagnostics containers in design doc

Identify a KPP-based testbed to use for evaluating chemistry solvers

At this point, it looks like we can't use KPP as part of our library (since it's licensed with the GPL). However, we can and probably should find a tool that uses it that can produce output for comparison with CAMP and TChem.

"KPP for GEOS-Chem" is an implementation of KPP that works within the GEOS-Chem framework, so it's been cleaned up and customized for that purpose. We're probably more interested in using the code in the first link for the "KPP box model." This task involves figuring out how to use KPP in that way.

From Kai

(Imported from an issue in the mam_refactor repo)
I feel the chemical-preprocessor-generated Fortran files are probably not as mighty as we fear. I am not an atmosphere chemistry expert, but looking at the code under

https://github.com/kaizhangpnl/E3SM_20190426/tree/master/components/cam/src/chemistry/pp_linoz_mam4_resus_mom_soag

which is the set used for E3SMv1, I find most of the subroutines are only less 100 lines and most of them are readable (although some of them look like “stupid” human-made code). The only one that is pretty long (~600 lines) is

https://github.com/kaizhangpnl/E3SM_20190426/blob/master/components/cam/src/chemistry/pp_linoz_mam4_resus_mom_soag/mo_imp_sol.F90

but I think it’s readable (a lot of get-index type operations) and with full_ozone_chem = .false. and reduced_ozone_chem = .false. (to double check with LLNL Linoz expert), a bunch of code can be removed.

Add documentation for condensation process to Processes chapter of design doc

The design document lives in docs/design, and is built whenever you build Haero. In this task, we move any existing documentation to the Processes section of this document (and add any new terms to the glossary, etc). See the code branch mentioned in #49 for an example of how to do this.

Add a warm-rain (Kessler 1969) simple cloud microphysics parameterization to 1d model

Required params:

Evaporation of cloud water to water vapor
Condensation of water vapor to cloud water
Autoconversion of cloud water to rain water
Accretion of cloud water by rain water
Rain fall speed
Sedimentation of rain

Dependencies:

1D dynamics #14

Tests:

Cumulus cloud
No-cloud

Construct a TChem AerosolProcess for aerosol chemistry

When we've satisfied ourselves that TChem can properly obtain solutions to aerosol chemical kinetics systems, the next step in incorporating TChem into Haero is to attempt to construct a C++ class.

Add aerosol config support in NcWriter

Unpack views before writing, or write maps to .nc files?

The Model class defines the maps between mode and species names and indices. Should our netcdf files write these maps, and the rank-2 mode/species data? Or should the maps be unpacked so that each mode has its own variable in a netcdf file?

@jeff-cohere , @huiwanpnnl , @singhbalwinder : Preferences?

Implement the terminator "toy model" test using TChem

We've established a working relationship with some folks at Sandia that develop the TChem software package. As a next step, we'll implement the simple chemistry model described in the the following reference:

Geosci. Model Dev., 8, 1299–1313, 2015
geosci-model-dev.net
doi:10.5194/gmd-8-1299-2015

This test can be implemented outside of the haero repo--this issue just tracks progress on the issue and provides a context for discussion.

Investigate CAMP as a workbench for investigating atmospheric chemical mechanisms and evaluating solvers.

CAMP is an aerosol chemistry package embedded in partmc, a project associated with some folks we're interested in collaborating with. For this task, we want to demonstrate that we can successfully build CAMP and run its built-in tests.

Matt Dawson has pointed us at MusicBox, a box model used by the MUSICA project that uses CAMP under the hood. @mschmidt271 is going to try to throw some EAGLES-related chemistry problems at it.

Fully implement the ratified input spec for the driver.

We've got a draft for a driver input spec (#19). When we've discussed it and ratified it, we must implement the creation of SimulationInput objects that store the relevant information. Currently, the driver partially implements a good deal of this spec, but the initial conditions need some work, and we haven't discussed perturbations yet.

Incorporate TChem as a third-party library for Haero (part 1)

As part of our evaluation of TChem, we need to be able to build it within the Haero project. For starters, let's just provide a directory in which TChem is already built (using build arguments and/or machine files). We can get fancier later when we're more sure of TChem as a solution.

For details on how this approach works, see (for example) the NETCDF_INCLUDE_DIR, NETCDF_LIBRARY_DIR, NETCDF_LIBRARY items in the setup shell script at the top of the Haero repo, and the related logic in cmake/set_up_platform.cmake. If TChem can't be found, we bypass building the related PrognosticProcess subclass.

Redo the modes_and_species section in the design doc

When we figure out more about our data structures, we need to update this section.

Add documentation for calcsize process to the design document

The design document lives in docs/design, and is built whenever you build Haero. In this task, we move any existing documentation to the Processes chapter of this document (and add any new terms to the glossary, etc). See the code branch mentioned in #49 for an example of how to do this.

Make a 1D Euler solver based on the theta model

Tests:

Stationary steady state
Prescribed velocity, hydrostatic balance
Active dynamics, with and without thermodynamic static stability

Refactor the Condensation parameterization Fortran code

Create init, run, finalize subroutines for new Condensation module

Once the code is refactored, most of it can be moved to the "run" subroutine as defined in the FPrognosticProcess class in haero/process.hpp.

Haero install does not send .hpp files to include target

low priority, but eventually we'll want make install to put the header files in ${CMAKE_INSTALL_PREFIX}/include

Verify CAMP's software license

At the moment, PartMC (CAMP's host software package) uses the GNU Public License, which makes it tricky for us to incorporate it into Haero as a third party library in the usual manner. Matt Dawson has offered to suggest to the authors of PartMC that they change to another open source software license to make it easier for collaborators to work with them.

PartMC's GitHub repo is here: https://github.com/compdyn/partmc

Evaluate existing aerosol chemistry models within CAMP

CAMP is an aerosol chemistry package, so it would be good to understand what it can do, and how its capabilities compare to our chemistry needs. When this task is completed, we'll have a written evaluation of CAMP as a chemical mechanism candidate.

Physical constants should use ekat::units

Single-column Haero: demonstrate parallel dispatches from host model

If we adopt the single-column approach advocated for in #65 , we must allow a host model to perform a parallel dispatch of a "league" of thread teams across multiple columns. Each thread team is then dedicated to a single column and uses the Haero library interface to run a process or update a relevant diagnostic variable (etc).

In order to call a virtual function on a process from the GPU, we have to engage in some shenanigans when we create the instance for the class in question. In this case, we call virtual functions on aerosol processes only. So perhaps our design can hide said shenanigans from library users altogether.

We have to demonstrate that this process works, as well as how it works. Here's what I envision:

If Haero is configured with a GPU architecture as the "device", CMake variables are set that indicate a) that Haero is configured for a GPU, and b) the architecture of the GPU.
If Haero is built for a GPU, a model instantiates its processes on the GPU during its construction time.
If Haero is built for a GPU, a unit test is built and run with a model backed by C++ processes instantiated on the GPU. The unit test mimics a parallel dispatch on a host model from a CPU.

Maybe we could try this with some simple test processes that are designed to work on GPUs.

Finish stub diagnostic process

As part of our proof of concept, we're implementing some phony "stub" prognostic and diagnostic processes in Fortran, to illustrate that data can be transferred between processes and the model correctly.

Finish stub prognostic process

Create init, run, finalize subroutines for new calcsize module

Once the code is refactored, most of it can be moved to the "run" subroutine as defined in the FPrognosticProcess class in haero/process.hpp.

Implement a basic representation of non-aerosol atmospheric state variables

Atmospheric temperature, pressure, density, etc are commonly used in aerosol physics. I'm wondering how we might best store them in calls to the aerosol processes.

Currently, we store prognostic aerosol state variables in Prognostics objects, and "diagnostic" state variables (interpreted various ways) in Diagnostics objects. Non-aerosol atmospheric state variables clearly don't belong in Prognostics objects because Haero doesn't "own" them and therefore doesn't evolve them. These variables could be shoehorned into Diagnostics objects, but arguably they're not "diagnostic" variables, either.

Do we need one more container for non-aerosol atmospheric state variables like pressure, temperature, relative humidity, etc? This would be passed to the run/update methods for processes as a const entity, to be used but not modified by each aerosol process.

@huiwanpnnl, @pbosler, @singhbalwinder : you've all got experience with E3SM and have expressed opinions on state variables before, so I'm tagging you on this question.

Atmospheric State Variables (incomplete--please add more!)

Temperature at level midpoints [K]
Pressure at level midpoints [Pa]
Relative humidity at level midpoints [-]
Height at level interfaces [m] (?)

Haero (via EKAT) is stuck on an old version of Kokkos

Those who have been following along might be aware that Haero uses the EKAT library, which provides a vetted version of Kokkos also used by SCREAM. Unfortunately, the version of Kokkos that EKAT provides is old (v2.9.99 from January 2020, as of the time this issue was created).

Meanwhile, we are evaluating TChem as a chemistry solver, which also needs Kokkos to achieve its parallelism. However, TChem seems to require a more recent version of Kokkos than we have. From my attempt to build TChem has a third-party library (in the mschidt271/tchem branch, with at least one workaround in place):

HDF5, netcdf-c, zlib submodules should be pulled only when necessary.

Haero only needs to do I/O when it builds its driver. And even when the driver is enabled, it only needs to pull submodules for libraries that aren't present on a system. In particular:

If you have edited your config.sh file or are using a machine file that provides one of these libraries, the build system should not update the corresponding submodule.

Implement a strategy for physical constants

MAM doesn't have a recognizable strategy for using physical constants--it embeds into its equations numbers based on constants, with varying degrees of accuracy. This makes it difficult to reformulate the equations, and it also makes them mathematically inconsistent with equations that use different values of constants.

I think Haero should provide its own set of physical constants for use when a host model is not specified, or when a host model is specified that doesn't provide its own physical constants. When a host model comes with its own set of constants, it should be possible to have Haero use that set instead of its own.

NOTE: There's actually a C++ project that provides NIST values of constants: NISTConst

We can support such a system by adding a couple of parameters to the build system (to be set in config.sh):

The name of a C++ header file that defines all physical constants required by Haero, with names that match Haero's specification.
The name of a C++ namespace in which these constants are embedded. Can be blank, if they are in the global namespace (if that's how the host model wants to do its business).

This implies that we need to make a comprehensive list of physical constants needed by Haero. I think we need this anyway. Seems like this would be a good section in our design document.

NOTE: #20 mentions using EKAT's units to annotate these physical constants. I guess we'll have to think about how we would do that for host-supplied physical constants.

Add documentation for the nucleation process to the design doc

This task is completed when the Haero design document contains a section on nucleation that includes a description of the MAM4 implementation of the process.

Decide whether to support PACK_SIZE > 1 for Fortran

Currently, we don't impose the Pack abstraction upon people implementing processes in Fortran. This abstraction is by far the hardest for people to understand, so we're treading lightly for the first round of Fortran implementations.

This means that we must build with PACK_SIZE == 1 to enable Fortran implementations. This is a bit restrictive, but it doesn't prevent us from comparing Fortran and C++ implementations. It does prevent us from supporting fully-optimized C++ implementations in the same build as Fortran implementations, though.

If we want to lift this constraint, we need to decide how we support the use of packs with size > 1. This issue tracks this conversation.

Simplify build process and document in README.md

As @huiwanpnnl has observed, our build process is still too complicated for people who aren't software engineers. Her comments appear below. My thoughts on this are:

We should get rid of ImageMagick as a dependency. It's a flaw in the CMake LateX module to require it.
The --recursive flag to git clone shouldn't be required. It's not part of a standard workflow and will only confuse people. In our case, submodules are handled within the configuration process in our generated config.sh script. FIXED
Detailed instructions for installing software (e.g. Homebrew or MacPorts) on Macs don't work for more than a few configurations, and can be a cause of a lot of wasted time if folks aren't willing to familiarize themselves a bit with these systems. Neither Homebrew nor Macports is good enough as is to "just work" as expected (though Homebrew is in a much better state of repair than the moribund Macports).

This issue tracks our attempts to make the installation process for non experts as simple as possible, without going into details that might cause more confusion.

Hui's proposed additions to README:

I suggest adding the following contents:

To compile the documentation, you will also need LaTex and ImageMagick. Mac users can get LaTex by downloading and installing MacTex from https://www.tug.org/mactex/ and installing ImageMagick using

brew install imagemagick

Also, to help dumb and lazy people like me, I suggest adding the following little subsection:

Cheatsheet for Mac users (tested on macOS 10.15, Catalina):

The following steps can help set up the software needed for building and testing HAERO on your Mac:

Install XCode through App Store
Install homebrew using

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)”

Install CMake, GFortran, and MPI using

brew install cmake gfortran openmpi

Install ImageMagick using

brew install imagemagick

Download MacTex.pkg from https://www.tug.org/mactex/ and follow their straightforward instructions to install

Create a Docker container for Haero to ease testing

Question: Should Haero offer a single-column interface?

As we start getting closer to exposing aerosol processes that can be used within an atmospheric host model, it occurs to me that we want to pay some attention to how such a host model might make use of an aerosol process.

Because aerosol processes interact with several other physical processes in the atmosphere, we decided to create a library instead of implementing these processes directly in, say, SCREAM. The interface we expose now lets a host model execute an aerosol process on a set of columns within multidimensional views. I wonder if a host model might benefit from the ability to execute such a process on a single column instead of several at once.

Maybe we should come up with some specific cases for which we expect a host model to invoke an aerosol process. We started our conversation about the library structure of Haero in the context of "clear-sky" processes, "cloudy" processes, and radiation-aerosol interactions. If we can write some "pseudocode" that accurately reflects how the host model expects to provide input to and get output from Haero, it would let us know whether a single-column interface would be useful.

If we do decide to expose a single-column interface, it wouldn't necessarily make our lives more difficult. We could always implement a multi-column interface on top of the single-column interface, using a parallel dispatch to send thread teams to each column in the set. In fact, we could even allow developers to override the parallel dispatch if the host model is doing something fancy. The big question I'd like to answer for now is: "does it makes sense for us to implement aerosol processes in terms of a single atmospheric column instead of multiple columns?"

Refactor the nucleation Fortran code into a module

Refactor the Calcsize Fortran code into a module

Input variables, dependencies, and redundancies

Summary

@huiwanpnnl, @singhbalwinder As I work on the dynamics driver, I encounter the issue of converting from the dynamics prognostic variables to the input quantities required by our aerosol parameterizations.

Different parameterizations require different input variables that may be related, for example, one parameterization may be best expressed in terms of the water vapor mixing ratio qv, while another may be best expressed in terms of relative humidity, s.

Both qv and s express the same information --- the local quantity of water vapor --- but are calculated very differently. And, in the case of qv, it's a prognostic variable of HOMME, the main dycore we need to support.

In this case, qv is an "exact" input from the perspective of the physical parameterization. Relative humidity requires extra calculation, and potentially additional approximations, to compute from the mixing ratio.

Objective 1: List all input variables required by aerosols

We need to make a list of all the input variables that our aerosol parameterizations require, and identify relationships between them.
Then, we need to decide how to handle the implied redundancies. For example, do we keep both qv and relative humidity? Do we write a kernel that computes one from the other, so that we only keep one?

(@jeff-cohere has started a Wiki page for this)

Objective 2: Document the implicit approximations, and possible inconsistencies

What approximations are used to convert from HOMME's prognostic variables to the input variables required by a parameterization (these need to be documented in the Haero design doc).
Are the same approximations used across all parameterizations? If not, we need to document these inconsistencies.

Objective 3:

Are there opportunities to improve the parameterizations, as they relate to SCREAM, by removing some of these inconsistencies, or replacing a parameterization with a different one that uses input variables better matched to HOMME's prognostic variables?

eagles-project / haero Goto Github PK

haero's Introduction

Haero: A High-Performance Aerosol Library

Supported Platforms

Required Software

Building the Model

Making code changes and rebuilding

Generating Documentation

FAQ

Building and Rebuilding

Testing

Source Control and Repository

haero's People

Contributors

Stargazers

Watchers

Forkers

haero's Issues

Remaining work

Verification Tests

Haero Units

Haero Aerosol Prognostic State Variables

Haero Aerosol Diagnostic State Variables

Haero Non-Aerosol Atmospheric State Variables

How Is This Information Used?

Decoupled processes

Coupled to microphysics

Dependencies:

Possible issues

Summary

From Kai

Required params:

Dependencies:

Tests:

Unpack views before writing, or write maps to .nc files?

Tests:

Atmospheric State Variables (incomplete--please add more!)

Hui's proposed additions to README:

Cheatsheet for Mac users (tested on macOS 10.15, Catalina):

Summary

Objective 1: List all input variables required by aerosols

Objective 2: Document the implicit approximations, and possible inconsistencies

Objective 3:

Recommend Projects

Recommend Topics

Recommend Org