Git Product home page Git Product logo

mam4xx's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

mam4xx's Issues

calcsize: use conversions functions where possible

Our conversions functions (mam4xx/conversions.hpp) can be used in a few spots in calcsize.hpp (see #72):

  1. Around line 131, in the calculation of dgncur
  2. Around line 525, in the calculation of dgncur and v2ncur
  3. Around line 929, in the calculation of common_factor_nmodes

Cuda build fails

GCC 10.1 / Cuda 11.4.2 and Cuda 11.7

Haero builds, installs, and tests pass. But mam4xx fails to build with these 2 errors. @overfelt , @jeff-cohere what compilers/cuda versions are working for you?

/home/pabosle/mam4xx/src/gasaerexch.hpp(244): error: identifier "mam4::modes" is undefined in device code

/home/pabosle/mam4xx/src/gasaerexch.hpp(98): error: identifier "mam4::GasAerExch::num_mode" is undefined in device code

Figure out a strategy for emissions tagging

(Imported from an issue in the mam_refactor repo)

This feature was brought up the context of our recent discussion with the aerosol folks. To date, the tagging mechanism used preprocessor macros (NBC, NPOA, NSOA) which don't seem to be present in our box model. See eagles-project/haero#47 for our original conversation about these macros.

Here are two papers which used tagging for NBC, where NBC is greater than 1 (default is 1):
https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2014JD022297
https://agupubs.onlinelibrary.wiley.com/doi/full/10.1002/2017JD027244

These manuscripts do not have much on the implementation details but they do have details on how to use this capability for scientific investigations.

When we implemented tagging in the code, we deliberately choose not to litter these CPP macros in the code. We are using these macros only once in the code where they are set using compile time constants (in modal_aero_data.F90). So they are basically compile time constants. In rest of the code, we used "loops" which go from 1 to nbc, for example.

Again, nbc, nsoa or npoa >1 means, we are adding new species and we have to run chemistry preprocessor for each new configuration.

Switch to env CC and CXX

In build-haero.sh the c and cxx compilers are specified explicitly, which leads to MPI build failures on my linux box (CMake attempts to build an MPI-C program, and it fails).

unit_tests (Failed) in Mac

Hi all,

All unit tests are failing in my Mac.
1 - utils_unit_tests (Failed)
2 - mam4_nucleation_unit_tests (Failed)
3 - mam4_gasaerexch_unit_tests (Failed)
4 - mam4_coagulation_unit_tests (Failed)
5 - mam4_calcsize_unit_tests (Failed)
6 - mam4_rename_unit_tests (Failed)
7 - mam4_aging_unit_tests (Failed)
9 - kohler_unit_tests (Failed)
10 - aero_modes_unit_tests (Failed)
11 - aero_config_unit_tests (Failed)
12 - mode_averages (Failed)
13 - conversions_unit_tests (Failed)

With this error:
/bin/sh: build/mam4xx/bin/test-launcher: Permission denied

I am using the main branch.

My mac:
macOS Monterey version: 12.6.1
compiler: AppleClang 14.0.0.14000029

Consolidated set of test utilities

Now that we're getting more processes converted, we're starting to ask more of our unit tests than just compiling, building, and running without crashing --- we're beginning to want actual results for regression, and if possible, verification.

PRs #42, #56, #70 and #72 all introduce specified initial states for both aerosols and atmosphere for the purpose of testing nonzero values. It might be worthwhile to extract these into a single test AeroConfig, Prognostics, Diagnostics, and Atmosphere source for all tests to use.

building mam4xx in weaver

Hi all,

I am trying to build mam4xx in weaver (sandia machine) using the following modules:

Currently Loaded Modulefiles:

  1. cuda/11.4 2) binutils/2.30.0 3) gcc/9.3.0 4) cmake/3.24.2 5) git/2.31.1

I am able to build haero (and the other libraries) in both CPUs and GPUs. Mam4xx builds fine in CPUs. However, I am getting compilation errors with mam4xx for GPUs.

Here is for example one compilation error:

mam4xx/src/mam4xx/mode_wet_particle_size.hpp(174): error: identifier "mam4::modes" is undefined in device code

And warning messages:

mam4xx/src/mam4xx/mode_wet_particle_size.hpp(176): warning: a host variable "mam4::modes" cannot be directly read in a device function.

mam4xx/src/tests/mode_averages_unit_tests.cpp(188): warning: calling a host function("mam4::mode_hygroscopicity(const ::mam4::Diagnostics &, const ::mam4::Prognostics &, int)") from a host device function("C_A_T_C_H_T_E_S_T_0()::[lambda(int) (instance 4)]::operator () const") is not allowed

has someone built mam4xx in weaver?

Aerosol diagnostics: compute once and store, or recompute when needed?

@pbosler raised this issue in the review for PR #72 , which I think is another thing we should discuss:

Redundant data storage. Within the Atmosphere class, we've made the decision to record only one measure of water vapor content (its dry mixing ratio); when other measures (specific humidity, relative humidity) are needed, we compute them from the dry mixing ratio. We don't store multiple sources of the same information; we use conversions. Should we do the same within mam4xx, for example, with particle size (store either diameter or volume, but not both)? I vote yes. GPUs are supposed to make the computations required by conversions "free" and, more importantly, then we don't have to worry about keeping the two values consistent with each other.

Single precision build fails in Kohler verification test.

PR #30 is just about ready to be merged, but first we need to fix a single-precision build issue, which results from a static assertion that the Kohler solver only be built in double precision:

In file included from /home/runner/work/mam4xx/mam4xx/src/tests/kohler_verification.hpp:4,
[20](https://github.com/eagles-project/mam4xx/actions/runs/3221053709/jobs/5268530368#step:7:21)
                 from /home/runner/work/mam4xx/mam4xx/src/tests/kohler_verification.cpp:1:
[21](https://github.com/eagles-project/mam4xx/actions/runs/3221053709/jobs/5268530368#step:7:22)
/home/runner/work/mam4xx/mam4xx/src/kohler.hpp: In instantiation of ‘struct mam4::KohlerPolynomial<float>’:
[22](https://github.com/eagles-project/mam4xx/actions/runs/3221053709/jobs/5268530368#step:7:23)
/home/runner/work/mam4xx/mam4xx/src/tests/kohler_verification.hpp:33:33:   required from here
[23](https://github.com/eagles-project/mam4xx/actions/runs/3221053709/jobs/5268530368#step:7:24)
/home/runner/work/mam4xx/mam4xx/src/kohler.hpp:127:29: error: static assertion failed: double precision required.
[24](https://github.com/eagles-project/mam4xx/actions/runs/3221053709/jobs/5268530368#step:7:25)
  127 |                    double>::value,
[25](https://github.com/eagles-project/mam4xx/actions/runs/3221053709/jobs/5268530368#step:7:26)
      |                             ^~~~~
[26](https://github.com/eagles-project/mam4xx/actions/runs/3221053709/jobs/5268530368#step:7:27)
make[2]: *** [src/tests/CMakeFiles/mam4xx_tests.dir/build.make:76: src/tests/CMakeFiles/mam4xx_tests.dir/kohler_verification.cpp.o] Error 1

verification/validation tests for ndrop explmix

After #172 goes in, we'll have ndrop ported and mostly tested, but we'll still need a validation test for explmix.

For this particular function, there's also a verification test that can also be done (from @pbosler):

This function can actually be verified -- I believe it should solve the 1D heat equation with 2nd order accuracy as the vertical level spacing is decrease (nlev is increased). We can build a verification test around that.

@jaelynlitz , I leave it to you to decide whether you want to verify that explmix solves the heat equation accurately (with whatever help/support you need from other team members).

Printing quantities for debugging

@kaizhangpnl mentioned at our monthly meeting today the importance of printing out diagnostic quantities for helping us understand problems we encounter. We haven't really discussed how to do this yet. I have some ideas about how we might do this in a GPU environment. If anyone else is interested in this topic, please add your thoughts to this issue.

@kaizhangpnl, maybe we should schedule a meeting to discuss this. I can put together some simple examples of how we might support "debugging prints", and you can see whether you think they are sufficient.

Consolidate modal averages

Computations like modal volume, hygroscopicity, and particle size require averages over modal mass- and/or number-mixing ratios. These quantities are used in many aerosol processes (they have already shown up in gasaerxch, calcsize, and water uptake, for example). Instead of repeating the same computations in several places, we should consolidate such things to a single set of functions.

Names of prognostic and diagnostic quantities

@pbosler has been urging us to consider better variable names for the quantities we store in our Prognostics and Diagnostics types.

From #40:

For the time being, we should consider renaming the variables in Prognostics. In the atmospheric texts, the variable q is used for specific humidity, while w is used for vapor mixing ratio, so using q for aerosol mixing ratio in Prognostics is doubly ambiguous: both aerosols vs. water, and mixing ratio vs. specific concentration are not clear. Using w is also problematic, since it's also used for vertical velocity.

To be more descriptive, e.g., instead of q_aero_i, perhaps aerosol_mass_mixing_ratio_interstitial, aero_mass_mix_ratio_clear, with similar renames for clouds and gases?

From #72:

Variable names. In our main data structures, AeroConfig, Prognostics, and Diagnostics, I'd prefer to keep variable names readable. Within a process impl, it's less important and we can stay closer to the original fortran. This means that we all know/agree on the translation of variables from fortran to mam4xx and can keep abreast of changes to our main data structures as more processes are added... Is that realistic?

Let's discuss names of prognostic and diagnostic quantities here instead of in pull requests. We can proceed with the "suboptimal" names we've been using until we agree on better ones, and then do one (or more) renaming PRs.

@pbosler @overfelt @singhbalwinder @odiazib @mjs271

Add distinction between interstitial and cloudborne dry geometric mean diameter to `Diagnostics` class

In calcsize, some of the quantities computed are the dry geometric mean diameter for modes in both the interstitial and cloudborne cases. To address this, we added the variables dgn_cur_i and dgn_cur_c to the Diagnostics class, but it looks like a better option would be to either replace Diagnostics.dry_geometric_mean_diameter with two other variables indicating _i and _c, or add interstitial and cloudborne sub-variables to Diagnostics.dry_geometric_mean_diameter. @singhbalwinder, @pbosler Does anyone have thoughts on this?

In favor of not holding up #72, we choose not to do this immediately but will fix in a subsequent PR. Pete's PR comment about this is below.

  1. Double-check that dgncur_i and dgncur_c need to be added to Diagnostics; I think they're already there as Diagnostics.dry_geometric_mean_diameter. We may need to update Diagnostics to account for the c/I separation for these diagnostics. If so, let's make another issue and do that in a separate PR since this one is already large.

Originally posted by @pbosler in #72 (review)

Compilation issues due to `-Werror` flag in `tests/CMakeLists.txt`

While working on the current (Jan 10, 23) main branch my builds have been failing with a handful of errors of the flavor:

error: 'static void Kokkos::OpenMP::partition_master(const F&, int, int)' is deprecated [-Werror=deprecated-declarations]

This appears to be coming from the target_compile_options(mam4_<test>_unit_tests PRIVATE -Werror) lines in tests/CMakeLists.txt because my current workaround is to comment these lines out, and my builds succeed.

I'll note that it appears the intent is for these -Werror flags to be scoped only to the tests/ directory and below, but it's affecting my entire build for whatever reason.

This is when I've been working on either Pete's RHEL workstation (gcc 10.1.0/cmake 3.21.1), and I've also hit it when working from my intel mac with gcc 12.? and cmake somewhere around 3.25 (don't currently have access).

Should Diagnostics split into interstitial and cloud, to match Prognostics?

Recent PRs #44 and #36 split Prognostics fields for number and mass mixing ratios into distinct cloud (c) and interstitial (i) categories. The same likely needs to be done for Diagnostics variables such as particle size and hygroscopicity, but it may be trickier than it looks at first.

  1. Wet particle size should likely be separated, as it depends on two different processes: water uptake for interstitial aerosols, droplet activation for clouds. On the other hand, the water microphysics package might take over all-things "activated," so maybe we don't need to track a separate size for wet cloud particles.

  2. Hygroscopicity is an immutable property of the species that make up a mode, so maybe only one value is needed, but the modal mass mixing ratios change the modal average hygroscopicity, so the average hygroscopicity will be different depending on whether the interstitial vs. cloudborne mass mixing ratios are used to compute it, so perhaps two are needed. If a previously wet cloud particle dries out, its composition will be different than aerosols that had always been interstitial, which is another argument for two hygroscopicity categories.

  3. Dry particle size seems likely at first to be an interstitial-only property, but the droplet activation parameterization includes water evaporation from previously-activated aerosols... Since these would be cloudborne before they dry out, their composition will be different than aerosols that were not previously activated, and they would have a different dry size.

The answers likely depend on how such things are used by other processes in MAM4; as we learn/convert more of processes, hopefully the answers will become clear.

MAM4 box model assumes nsoa == nsoag == npoa == 1

There is dynamic indexing logic that rearranges the gas and aerosol array indexing when different numbers of secondary aerosols and gases are present (see modal_aero_microp_species.F90 in mam_refactor, lines 154-173), but none of this is tested in the box model. In fact, the box model driver fails if any of the variables mentioned in this issue's heading are not 1 (see driver_all.F90 in mam_refactor, line 159).

Given that variable numbers of secondary organic aerosols/gases are not tested in the MAM4 box model, I believe our first pass at porting the MAM4 microphysics need not support a variable number of these aerosols/gases.

Unit mismatch in mam4xx gases

In aero_modes.hpp, the MAM4 gases are specified with molecular weights in units of [g/mol], which is how MAM4 defines things. However, the Haero convention is to use SI units everywhere, and leave it to the client (in this case, mam4xx) to convert as needed for its parameterizations.

Currently, the instantiated haero::GasSpecies in aero_modes.hpp's static gas_species array don't match the units of the Haero documentation for haero::GasSpecies, which expects molecular weights in units of [kg/mol].

The fix is to multiply the molecular weights by 1e-3 in their constructor calls, but this may also require adding the inverse conversion to parameterizations that use gases, such as gasaerexch and nucleation. @overfelt @jeff-cohere

Aerosol microphysics process coupling: how to handle intermediate quantities

This issue is related to the porting of the high-level aerosol microphysics interface routine from Fortran to C++.

From #106 (edited for clarity):

James: Well, hmmm. Looks like tendencies are created by gas/aerosol exchange and used by other processes. But it also looks like Coagulation computes some tendencies for the same prognostic variables and computes it's own tendencies for them even though both gas/aerosol exchange and Coagulation use the same prognostic arrays which they modify in place.

The Aging process then takes the tendencies from both gas/aerosol exchange and from Coagulation along with the one set of prognostic values that have been updated by both processes as input. See the call to mam_pcarbon_aging_1subarea in mam_refactor. It would appear that we need a different Tendency instance for each process and aging needs to take multiple tendency instances. Might need to think about this more.

Jeff: I still think we might want to store these "process-specific" tendencies as diagnostics so they can be passed between processes. I don't mind the Diagnostics type being used as a porting dumping ground, because it's essentially a scratch pad for writing down intermediate calculations.

This means we'll likely end up with quite a few diagnostic fields, but in the future when we are allowed to think about better process coupling, we can figure out how we'd rather do things. The processes currently assume these tendencies have already been computed because they assume a fixed process ordering. To be clear, I am advocating that we store these specific tendencies as diagnostics so that we can separate the process logic a bit more effectively.

Kyle: I thought about diagnostics as well, but in my mind that should be used for physically well defined quantities that are derived from the model state (prognostic variables).

As an example, the droplet effective radius in most cloud microphysical schemes is not a prognostic variable of the model, and is diagnosed based on microphysical assumptions of a particular microphysics scheme, but is required input for radiative transfer, so this would be something I would put in Diagnostics. However, these tendencies we are talking about (which in my mind are temporaries) are only needed locally by other aerosol processes so I don't feel like they should go into diagnostics. I worry once something gets put into Diagnostics no one will ever take it out.

One caveat though, I do wonder if these intermediate process rate tendencies aren't something aerosol process scientists would love to have as part of standard output? If that's the case, maybe diagnostics is best? Maybe one view of Diagnostics would be that it is for stuff that you may want to one day output from the model?

Jeff: I agree with you about the definition of a "diagnostic" variable. I don't think these tendencies are diagnostics by that definition either. I'm just trying to figure out how we untangle these dependencies, which reflect a fixed process ordering that we would like not to assume for all time.

Our call signature to a "process" is fixed (see the compute_tendencies method), so we can't just pass whatever we want willy-nilly to each aerosol process. The Diagnostics type lets us get away with passing these tendencies as temporaries. When we figure out how to "do things right", we can move the tendencies to where they belong. Otherwise, we lose our assumption of aerosol process independence, and we devolve back into the "pile of stuff" design pattern, where everything is welded to everything else.

Kyle: In my mind a third data type may be necessary then. So we have a set of data types like

  • Prognostic Variables (variables that are time integrated and are passed back to E3SM)
  • Diagnostic Variables (variables that are not time integrated and may be used for IO and passed back to E3SM)
  • "Scratch" (possibly a terrible name) variables that are local and not passed back to E3SM and are only available to the aerosol processes

Jeff: Your "scratch" variables would have to be available to all processes in order for us to pass intermediates between them. I would prefer not to etch a transient pattern into our architecture, but I'm not going to fight super hard if you don't want to contaminate diagnostics with temporaries. But I do worry that we won't be able to get rid of the "scratch" variables once we introduce them, and this is not a great pattern if we are trying to untangle aerosol processes. It will definitely create confusion down the line.

Kyle: Fair enough. I guess the difficulty is that we are porting microphysics that are entangled. Although, is anyone tracking the IO fields? There is a fair chance the process-specific tendencies are also being computed for output purposes and this is all a moot point.

James: modal_aero_amicphys.F90 has all of the logic to call the aero processes correctly and how to calculate tendencies and other calculations needed in order to interpret the results of the aerosol processes. Even better it does a single box-cell at a time so only needs small static sized arrays to hold all the intermediate results between processes. I think we have all the grid-cell level functions it calls ported to C++. If the driver is written this way then Kokkos scratch views for intermediate results between processes are not needed.

Autotest Build Failing for Double/Debug Configuration--Unclear Cause

After spending a while debugging the autotesting for #139, I still see the double/debug configuration failing during build time for no clear reason. Rerunning with the debug logging on produces the output below (full report here).

[ 70%] Built target mam4xx_tests
[ 71%] Building CXX object src/tests/CMakeFiles/kohler_unit_tests.dir/kohler_unit_tests.cpp.o
[ 73%] Building CXX object src/tests/CMakeFiles/mode_averages.dir/mode_averages_unit_tests.cpp.o
[ 74%] Building CXX object src/tests/CMakeFiles/conversions_unit_tests.dir/conversions_unit_tests.cpp.o
[ 74%] Built target utils_unit_tests
g++: fatal error: Killed signal terminated program cc1plus
compilation terminated.
make[2]: *** [src/tests/CMakeFiles/aero_config_unit_tests.dir/build.make:76: src/tests/CMakeFiles/aero_config_unit_tests.dir/aero_config_unit_tests.cpp.o] Terminated
make[2]: *** [src/tests/CMakeFiles/mam4_gasaerexch_unit_tests.dir/build.make:76: src/tests/CMakeFiles/mam4_gasaerexch_unit_tests.dir/mam4_gasaerexch_unit_tests.cpp.o] Terminated
make[2]: *** [src/tests/CMakeFiles/mam4_nucleation_unit_tests.dir/build.make:76: src/tests/CMakeFiles/mam4_nucleation_unit_tests.dir/mam4_nucleation_unit_tests.cpp.o] Terminated
make[2]: *** [src/validation/gasaerexch/CMakeFiles/exe_skywkr_gasaerexch_timestepping.dir/build.make:76: src/validation/gasaerexch/CMakeFiles/exe_skywkr_gasaerexch_timestepping.dir/skywkr_gasaerexch_timestepping.cpp.o] Terminated
make[2]: *** [src/tests/CMakeFiles/mam4_coagulation_unit_tests.dir/build.make:76: src/tests/CMakeFiles/mam4_coagulation_unit_tests.dir/mam4_coagulation_unit_tests.cpp.o] Terminated
make[1]: *** [CMakeFiles/Makefile2:1261: src/tests/CMakeFiles/mam4_nucleation_unit_tests.dir/all] Terminated
make: *** [Makefile:146: all] Terminated
make[2]: *** [src/tests/CMakeFiles/mam4_aging_unit_tests.dir/build.make:76: src/tests/CMakeFiles/mam4_aging_unit_tests.dir/mam4_aging_unit_tests.cpp.o] Terminated
make[2]: *** [src/tests/CMakeFiles/conversions_unit_tests.dir/build.make:76: src/tests/CMakeFiles/conversions_unit_tests.dir/conversions_unit_tests.cpp.o] Terminated
make[2]: *** wait: No child processes.  Stop.
make[2]: *** Waiting for unfinished jobs....
make[2]: *** wait: No child processes.  Stop.
make[2]: *** [src/tests/CMakeFiles/mam4_nucleate_ice_unit_tests.dir/build.make:76: src/tests/CMakeFiles/mam4_nucleate_ice_unit_tests.dir/mam4_nucleate_ice_unit_tests.cpp.o] Terminated
make[2]: *** [src/tests/CMakeFiles/mam4_calcsize_unit_tests.dir/build.make:76: src/tests/CMakeFiles/mam4_calcsize_unit_tests.dir/mam4_calcsize_unit_tests.cpp.o] Terminated
make[2]: *** [src/tests/CMakeFiles/mam4_rename_unit_tests.dir/build.make:76: src/tests/CMakeFiles/mam4_rename_unit_tests.dir/mam4_rename_unit_tests.cpp.o] Terminated
make[2]: *** [src/tests/CMakeFiles/kohler_unit_tests.dir/build.make:76: src/tests/CMakeFiles/kohler_unit_tests.dir/kohler_unit_tests.cpp.o] Terminated
make[2]: *** [src/validation/nucleation/CMakeFiles/nucleation_driver.dir/build.make:76: src/validation/nucleation/CMakeFiles/nucleation_driver.dir/nucleation_driver.cpp.o] Terminated
make[2]: *** [src/tests/CMakeFiles/aero_modes_unit_tests.dir/build.make:76: src/tests/CMakeFiles/aero_modes_unit_tests.dir/aero_modes_tests.cpp.o] Terminated
make[2]: *** [src/validation/aging/CMakeFiles/aging_driver.dir/build.make:76: src/validation/aging/CMakeFiles/aging_driver.dir/aging_driver.cpp.o] Terminated
make[2]: *** [src/validation/gasaerexch/CMakeFiles/gasaerexch_driver.dir/build.make:76: src/validation/gasaerexch/CMakeFiles/gasaerexch_driver.dir/gasaerexch_driver.cpp.o] Terminated
make[2]: *** [src/validation/calcsize/CMakeFiles/calcsize_driver.dir/build.make:76: src/validation/calcsize/CMakeFiles/calcsize_driver.dir/calcsize_driver.cpp.o] Terminated
make[2]: *** [src/validation/coagulation/CMakeFiles/coagulation_driver.dir/build.make:76: src/validation/coagulation/CMakeFiles/coagulation_driver.dir/coagulation_driver.cpp.o] Terminated
make[2]: *** [src/validation/nucleate_ice/CMakeFiles/nucleate_ice_driver.dir/build.make:76: src/validation/nucleate_ice/CMakeFiles/nucleate_ice_driver.dir/nucleate_ice_driver.cpp.o] Terminated
make[2]: *** [src/validation/rename/CMakeFiles/rename_driver.dir/build.make:76: src/validation/rename/CMakeFiles/rename_driver.dir/rename_driver.cpp.o] Terminated
Error: Process completed with exit code 143.
##[debug]Re-evaluate condition on job cancellation for step: 'Building MAM4xx (Debug, double precision)'.
##[debug]Skip Re-evaluate condition on runner shutdown.
##[debug]Finishing: Building MAM4xx (Debug, double precision)

Decision: moving away from Packs

After discussions with the SCREAM/EAMxx team and amongst several of our own team members, we have decided not to use Packs for CPU vectorization within MAM4xx. The MAM code has too many branches in it for Packs to deliver optimum performance, and these same branches make Packs miserable to use. If we find "hotspots" within specific parameterizations that can benefit from the use of Packs, we can always reintroduce Packs in those places.

The Bad News

A few of us have to go back and redo some Pack code in terms of Reals. In particular, Haero has to be modified so that a ColumnView contains floating point numbers and not Packs. In mam4xx, the nucleation, gas-aerosol exchange, and calcsize parameterizations have to be reworked, and some utility functions have to be narrowed to use only Reals (arguably).

The Good News

All of the above work probably won't take too long, because Packs are difficult to use and understand in this kind of technical setting, and using floating point numbers is much more straightforward. And for our trouble, we'll have a much easier time porting MAM4 Fortran code to C++.

I'm starting no-pack branches in Haero and mam4xx to take care of this work. I'll be in touch with others of you who are affected. In the meantime, please let me know if you have questions or concerns.

mode_averages unit test broken with single-precision builds.

This test was broken in single-precision builds during the Great Unpacking of mam4xx. Here's the failure:

5/30 Test: mode_averages
Command: "/usr/bin/sh" "-c" "/home/jeff/projects/sandia/mam4xx/single/bin/test-
Directory: /home/jeff/projects/sandia/mam4xx/single/src/tests
"mode_averages" start time: Nov 09 11:49 PST
Output:
----------------------------------------------------------
Calling initialize_kokkos
 ExecSpace name: OpenMP
 ExecSpace initialized: yes
 active avx set:
 compiler id: GCC
 FPE support is enabled, current FPE mask: 0 (NONE)
 #host threads: 1

Starting catch session on rank 0 out of 1
[2022-11-09 11:49:00.359] [modal averages tests] [info] accumulation mode has m
[2022-11-09 11:49:00.359] [modal averages tests] [info] aitken mode has mean pa
[2022-11-09 11:49:00.359] [modal averages tests] [info] coarse mode has mean pa
[2022-11-09 11:49:00.359] [modal averages tests] [info] primary_carbon mode has
[2022-11-09 11:49:00.359] [modal averages tests] [info] dry particle size tests
[2022-11-09 11:49:00.360] [modal averages tests] [info] accumulation mode has h
[2022-11-09 11:49:00.360] [modal averages tests] [info] aitken mode has hygrosc
[2022-11-09 11:49:00.360] [modal averages tests] [info] coarse mode has hygrosc
[2022-11-09 11:49:00.360] [modal averages tests] [info] primary_carbon mode has
[2022-11-09 11:49:00.361] [modal averages tests] [info] hygroscopicity tests co
[2022-11-09 11:49:00.362] [modal averages tests] [info] relative humidity range
KERNEL CHECK FAILED:
   counter <= max_iter
   NewtonSolver: max iterations

Backtrace:
                  [0x55a3f0fee465]
                  [0x55a3f0fe536e]
                  [0x55a3f0fe540e]
                  [0x55a3f0e88435]
                  [0x55a3f0eab538]
                  [0x55a3f0e9e69e]
                  [0x55a3f0e936c6]
                  [0x55a3f0e8cb9c]
                  [0x55a3f0e8cfe2]
                  [0x55a3f0e82d3a]
                  [0x55a3f0e877a0]
                  [0x55a3f0e87e91]
    GOMP_parallel [0x7fdd3a356a16]
                  [0x55a3f0e87568]
                  [0x55a3f0e868dc]
                  [0x55a3f0e85a2c]
                  [0x55a3f0e8537a]
                  [0x55a3f0e8440c]
                  [0x55a3f0f0ebce]
                  [0x55a3f0f0de09]
                  [0x55a3f0f081c6]
                  [0x55a3f0f07eeb]
                  [0x55a3f0f06977]
                  [0x55a3f0f09b30]
                  [0x55a3f0f0af39]
                  [0x55a3f0f0ac2f]
                  [0x55a3f0f22ba4]
                  [0x7fdd3a123d90]
__libc_start_main [0x7fdd3a123e40]
                  [0x55a3f0e82835]

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mode_averages is a Catch v2.13.8 host application.
Run with -? for options

-------------------------------------------------------------------------------
modal_averages
  wet particle size
-------------------------------------------------------------------------------
/home/jeff/projects/sandia/mam4xx/src/tests/mode_averages_unit_tests.cpp:135
...............................................................................

/home/jeff/projects/sandia/mam4xx/src/tests/mode_averages_unit_tests.cpp:135: F
due to a fatal error condition:
  SIGABRT - Abort (abnormal termination) signal

===============================================================================
test cases:   1 |   0 passed | 1 failed
assertions: 577 | 576 passed | 1 failed

Aborted (core dumped)
NO RESOURCE SPECS
RUN: OMP_PROC_BIND=spread OMP_PLACES=threads ./mode_averages --use-colour no
FROM: /home/jeff/projects/sandia/mam4xx/single/src/tests
<end of output>
Test time =   0.22 sec
----------------------------------------------------------
Test Failed.
"mode_averages" end time: Nov 09 11:49 PST
"mode_averages" time elapsed: 00:00:00

Since I'm not super familiar with the structure of this test and the particle size diagnostics, I'm assigning this to @pbosler to take a look (sorry for the extra work, Pete!). It's not urgent, but it would be nice to get this working again in the new code without Packs.

Default pack constructor initializes to nan

summary

To protect users from accessing pack padding, the default ekat::Pack constructor initializes its values to ekat::ScalarTraits<scalar>::invalid(), which is usually a quiet nan. Hence, our use of default ColumnView constructors causes all new views to be initialized to nan, rather than 0, as is the Kokkos custom.

This can cause problems with unit tests.

fix

This will be fixed by Haero PR #408 and some unit testing additions to come after that Haero PR.

SIMD (Pack) logic in conversions.hpp needs to be hardened/handled

From @overfelt :

For those little functions in conversions.hpp where division takes place, there will be a divide by zero for SIMD types that do not fill up all the values and contain some garbage. There are times when trying to track down the origin of a NaN where this is inconvenient. If there were some way in a debug build where efficiency is not so critical that NaNs could be avoided, it might be useful.

Testing discrepancies in mode/species indexing and the need for a Testing Czar

@odiazib wrote in Slack this morning:

In the commit "Add unit test for gas_aerosol_uptake_rates_1box (#85)" from 12/16/2022 the order of aerosol species in src/mam4xx/aero-modes.hpp was modified. This change affected the validation tests from calcsize because the yaml input and python output files produced by @singhbalwinder followed the previous order, which I believe is the order of the aerosol species in the e3sm code.
For example, in the accumulation mode:
Previous aerosol species order:
AeroId::SO4, AeroId::POM, AeroId::SOA, AeroId::BC, AeroId::DST, AeroId::NaCl, AeroId::MOM
New aerosol species order:
AeroId::SOA, AeroId::SO4, AeroId::POM, AeroId::BC, AeroId::NaCl, AeroId::DST, AeroId::MOM
When we read an input file from @singhbalwinder the array of values uses the species order from e3sm where species 0 corresponds to AeroId::SO4, but in the current aerosol mode configuration mam4xx assigned this value to AeroId::SOA which is the species 0.
We could fix this issue changing the order of the values when they are assigned in our validation tests, but I believe it would be easier if we keep the same order for the aerosol species in mam4 and mam4xx.
What is the best way of fixing this issue? We could modify our validation tests to match the new configuration or we could keep using the former configuration. is there a reason of the change of the order of aerosol species in aero-modes.hpp?

A couple of weeks back, @overfelt, @huiwanpnnl and I (@jeff-cohere) met to discuss some differences we had discovered with mode-species indexing in the box model and mam4xx. The changes that @odiazib refers to were changes we decided to make to bring mam4xx in line with the box model, given the set of reduced gas and aerosol species.

Unfortunately, we were not thinking "holistically" about this, so we didn't anticipate that this change would disrupt our E3SM-based testing workflow. For a little while, I have been advocating for a Testing Czar to keep track of the inputs/outputs stored in our mam_x_validation repo (an "owner" for that repo, really). Such a person would responsible for keeping these kinds of issues from occurring. I think it's time for us to appoint a Testing Czar.

In addition, I think it would be good for all of the affected parties to meet in the new year to discuss how we want to proceed: to decide upon a blessed mode/species and gas species indexing scheme for mam4xx, and to incorporate that indexing into our validation tests.

@overfelt @odiazib @mjs271 @pressel @huiwanpnnl @singhbalwinder

Add code coverage analysis to CI builds

We have been talking/worrying a lot recently about the code coverage in our MAM4 parameterizations, and it turns out there are some simple tools we can use to analyze this. I will add support for codecov.io to our CI workflow.

Dry air vs. moist air and tracer units

DECISION: we are using dry air mixing ratios as prognostics in mam4xx, just like mam4 and many other aerosol packages. Conversions will be done between dry and moist air quantities in EAMxx.

Summary

The dynamics of Eamxx are defined with respect to moist air so that, for example, the water vapor tracer qv is given as water content per unit of total air, e.g., density water vapor / density total air. Many of our aerosol parameterizations are defined with respect to dry air, so we have to be alert to this inconsistency both for bug detection in the existing code and to make sure we don't introduce new bugs in Mam4xx.

Fortunately, atmospheric scientists give us two terms that distinguish between definitions for water vapor:

  1. Water vapor mixing ratio, usually given the symbol w, is defined with respect to dry air as kg h2o vapor / kg dry air or density h2o vapor / density dry air.
  2. Specific humidity, usually given the symbol q, is defined with respect to moist air as kg h2o vapor / kg moist air (or densities).

For other tracers, such as aerosols mass and number concentrations/mixing ratios etc., we'll need to be careful to know whether or not they are defined with respect to dry air or moist air.

The file conversions.hpp contains functions that convert to and from each (moist or dry) representation for water vapor; for other tracers we'll need to write analogous methods.

This web page matches the definitions from standard meteorology texts and shows how to convert between moist and dry water vapor content definitions.

To-do:

  • Write conversion function for mass mixing ratio defined with respect to moist air to the same mass ratio with respect to dry air
  • Write the inverse conversion
  • Write unit tests

`calcsize` Give the ratio of number mixing ratio to volume (N/V) a more descriptive name

Note: I'm creating this issue because I don't want to step on toes by unilaterally changing the name of a primary variable.

Currently in calcsize, one of the most commonly used variables is v2n{min,nom,max}. This quantity is defined to be a ratio of number mixing ratio to volume (e.g., N/V) and I would argue, based on personal experience, that the variable name invites confusion. It is described in the mam_refactor repo as:

! note
! v2nyy = voltonumblo_amode is proportional to dgnumlo**(-3),
! and produces the maximum allowed number for a given volume
! v2nxx = voltonumbhi_amode is proportional to dgnumhi**(-3),
! and produces the minimum allowed number for a given volume

and in the E3SM repo as "volume to number" or "volume".

Given this, I propose giving it a more descriptive/intuitive name, for example:

num2vol_ratio_{min,nom,max}, n2v_ratio_{min,nom,max}, {min,nom,max}_number4volume, num_over_volume_{min,nom,max}

I would argue that, for those familiar with the old code, any of these would call to mind the original variable name, and as such this would not be much of a divergence from the original code.

So, I'm interested in hearing the thoughts of others.

Single Precision Overflow in `calcsize` Unit Test

Oscar and I recently ran across failing single precision unit tests from the autotester for PR #72. We tracked it down to the calculation of the geometric mean of volume-to-number ratios for aitken and accumulation modes:

auto v2n_geomean = haero::sqrt(voltonum_ait * voltonum_acc)

Therein, voltonum_ait and voltonum_acc are O(10^22) and O(10^20), respectively, and their multiplication overflows single precision, and the square root ends up NaN.

Speaking with Balwinder, these numbers are not out of the ordinary, so we wanted to put it on the radar for others. @odiazib is currently implementing a fix involving scaling the numbers into feasible ranges and then inverting the scaling, and we'd welcome any input/alternatives.

Determine a strategy for evaluating differences between C++ and Fortran code

We have been talking about using Python scripts that import data modules from our C++ and Fortran Skywalker-style tests and return a PASS or FAIL result indicating whether the results "match".

What does it mean for a C++ dataset to "match" its Fortran counterpart? It depends.

@odiazib pointed out that, from a numerical perspective, we're not changing any algorithms (with a handful of possible exceptions), so defining a machine-level-precision tolerance might be a good starting point.

@pressel argued that from a physics perspective, it would be wonderful if we could engage some scientists to do some "error propagation" for quantities of interest in tests, taking into account the structural uncertainties.

@overfelt has some code (likely in Haero, but perhaps in the mam-x-validation repo) that computes L norms of differences in quantities. This would be a good starting point, with the two above points informing our selection of error tolerances.

PNNL CI Kokkos Cuda arch compatibility

Currently, our PNNL CI pipelines have 3 gpu partitions available (dl, dlt, dlv) which each have a different cuda arch.

  • P100 (cuda_arch 60) (dl/dl_shared)
  • V100 (cuda_arch 70) (dlv/dlv_shared)
  • RTX 2080 Ti (cuda_arch 75) (dlt/dlt_shared)

Kokkos errors out if different partitions are selected for building haero vs running mam4xx

Example: haero build on dl (cuda arch 60) then testing on dlt (cuda arch 75) and failing https://code.pnnl.gov/e3sm/eagles/mam4xx/-/jobs/8678

Suggested fix:
Rebuild haero stage should build for all three partitions, then at the testing stage, the haero path should be selected based on the current partition

Workaround for now:
Hardcode the partition to be dl for tests to run

cc @cameronrutherford @jeff-cohere

Failure to find mpi.h

I see this compilation issue on an M1 Mac and a Linux box:

$ make -j
[ 25%] Building CXX object src/tests/CMakeFiles/mam4xx_tests.dir/kohler_verification.cpp.o
[ 25%] Building CXX object src/tests/CMakeFiles/mam4_nucleation_unit_tests.dir/mam4_nucleation_unit_tests.cpp.o
[ 37%] Building CXX object src/tests/CMakeFiles/mam4_gasaerexch_unit_tests.dir/mam4_gasaerexch_unit_tests.cpp.o
[ 50%] Linking CXX static library libmam4xx_tests.a
[ 50%] Built target mam4xx_tests
[ 62%] Building CXX object src/tests/CMakeFiles/mam4_kohler_unit_tests.dir/mam4_kohler_unit_tests.cpp.o
In file included from /home/jeff/haeroic/include/ekat/logging/ekat_log_mpi_policy.hpp:4,
                 from /home/jeff/haeroic/include/ekat/logging/ekat_logger.hpp:5,
                 from /home/jeff/projects/sandia/mam4xx/src/tests/mam4_kohler_unit_tests.cpp:13:
/home/jeff/haeroic/include/ekat/mpi/ekat_comm.hpp:6:10: fatal error: mpi.h: No such file or directory
    6 | #include <mpi.h>
      |          ^~~~~~~
compilation terminated.

To be clear, the need to accommodate MPI in column physics is, to me, a bug. No MPI is ever needed in a column physics software package unless the columns need to exchange data. The fact that we must care about MPI in a software environment that doesn't require it is a clear failure to separate design concerns. Unfortunately, I have not convinced others of this fact, so EKAT still requires us to have an MPI strategy within HAERO and MAM4xx, which imposes a large maintenance burden.

I will keep pressing on this issue (and maybe even prepare a fix myself for EKAT, just to show people that it can be done correctly), but in the meantime I'll try to chase this down. Until this is fixed, we might have to disable this Kohler solver test to put out the fire.

CI Testing

It would be great to have basic testing of mam4xx in GitHub actions CI on an ubutnu image, along with testing in a mirrored GitLab instance at PNNL for GPU based testing.

https://github.com/LLNL/hiop/blob/develop/.github/workflows/test.yml - this is an example of building with CMake in CI, although this might not be the easiest to implement since we require haero to build at the moment. Perhaps would be doable with some steps to also configure and build haero with OpenMP as a backend.

https://github.com/LLNL/hiop/blob/develop/.github/workflows/push_mirror.yml - if a mirror is set up at PNNL, we would have a GitHub action that triggers the pipeline at PNNL, and then queries the pipeline while it is running for the result. This integration works automatically with some tiers of GitLab, and some documentation is also outlined here https://ecp-ci.gitlab.io/docs/guides/build-status-gitlab.html

This will probably have to wait until haero and other submodules are made open source and are publically available, as it would probably otherwise be a nightmare to manage.

cc @jeff-cohere @pressel @singhbalwinder

code coverage analysis on Macs

Hi all,

I am able to compile mam4xx/jeff-cohere/coverage (PR #125 ) with COVERAGE=ON (after installing LCOV and Capture::Tiny (Dependency)); however, I am getting an error after make coverage:

Capturing coverage data from .
geninfo cmd: '/usr/local/bin/geninfo . --output-filename unfiltered_coverage.info --parallel 1 --memory 0'
Found LLVM gcov version 14.0.0, which emulates gcov version 4.2.0
Using intermediate gcov format
Writing temporary data to /tmp/geninfo_datnWwc
Scanning . for .gcda files ...
Found 30 data files in .

Processing ./src/validation/aging/CMakeFiles/aging_driver.dir/mam_pcarbon_aging_1subarea.cpp.gcda
geninfo: Error: unknown line '64' in /eagles-project/work/mam4xx/src/validation/gasaerexch/gasaerexch_driver.cpp: there are only 55 lines in file.
Use 'geninfo --filter range' to remove out-of-range lines.
(use "geninfo --ignore-errors gcov ..." to bypass this error)
make[3]: *** [CMakeFiles/coverage] Error 255
make[2]: *** [CMakeFiles/coverage.dir/all] Error 2
make[1]: *** [CMakeFiles/coverage.dir/rule] Error 2
make: *** [coverage] Error 2

I am able to by pass this error: with

geninfo --ignore-errors gcov ...

and I got this file:
unfiltered_coverage.info

My mac:
macOS Monterey version: 12.6.1
compiler: AppleClang 14.0.0.14000029

PNNL CI Ubuntu image needs upgrade

We're currently experiencing failures in our PNNL Push Mirror action (see, for example, this PR. The failures seem to be related to an "Ubuntu-18.04 brownout," which sounds like a stick to encourage us to update to a more recent Ubuntu image like 20.04 or 22.04.

@cameronrutherford , I know we already discussed this, and again, there's no hurry. But here's an issue you can close out when you have a chance to update the Ubuntu image used by the PNNL CI system.

build-haero.sh

Hi,
I am getting this error:

./build-haero.sh: line 76: ${BUILD_TYPE^}: bad substitution

After running the build-haero.sh script:

./build-haero.sh build

I am using a mac.

conversions_unit_tests fail in single-precision builds on PNNL CI machine(s)

Briefly:

The PNNL CI pipeline set up by @cameronrutherford in PR #64 seems to be working properly, but we have a failing test when building with single precision. We decided to merge the PR and log this test failure so we could benefit from having a GPU machine in our automated test setup.

Here's the error (with CI kibble removed):

...
/people/svceagles/gitlab/3430/haero_Debug_single/src/tests/conversions_unit_tests.cpp:23: FAILED:
due to unexpected exception with message:
/people/svceagles/gitlab/3430/haero_Debug_single/src/tests/atmosphere_utils.cpp:50: FAIL: FloatingPoint<Real>::equiv( psum, p0, std::numeric_limits<float>::epsilon())
...

Evidently in this environment, psum and p0 are not equivalent within machine precision in a single-precision build. From @jaelynlitz:

This is what the atmosphere_utils test is putting out in single precision GPU Debug mode:
psum = 99999.992188 p0 = 100000.000000
And the tolerance needed for the test to pass:
tol = 65550 * std::numeric_limits::epsilon(); epsilon = 0.000000 tol = 0.007814

See #64 (in particular, click on the red X for the failed pipeline at the bottom of the page) for more details.

EAMxx integration: MAM4AerosolMicrophysics AtmosphereProcess

This issue tracks the development of an AtmosphereProcess subclass that evolves aerosols using mam4xx's parameterizations. We may end up having more than one of these AtmosphereProcess classes related to aerosols.

What's written here is fairly technical, since I'm using it to gather and organize my thoughts on this process. If you're interested in understanding what something means, or if you see an issue with something here, please don't hesitate to comment.

Prior Art

Notes and Questions

  • Do we need to use EKAT's WorkspaceManager to get memory for column variables? So far, I don't think so. So far, most operations can be performed at a single vertical level, which means our parameterizations probably don't have to create temporary ColumnViews. This is a good thing worth defending a bit, because managing memory (on GPUs specifically, but also in general) is tricky.
  • Do we need to store any local variables using the Buffer and AtmBufferManager? Answering this question probably requires looking at how aerosol data is stored in and extracted from the physics buffer in E3SM.
  • Do we need to implement set_grids? Yes. Because each of our aerosol prognostic variables is stored as a tracer in SCREAM's tracers field group (see how SHOC extracts tracer info here), we need to keep track of the physics grid.
  • How can we handle the conversion from wet to dry mixing ratios? SCREAM uses "moist" air mass, etc, whereas MAM4 uses dry air mass. It turns out that SHOC also uses dry air mass, so we can look at how SHOC performs this conversion. #40 also touches on this topic.
  • Should we defineMAM4Preprocess and MAM4Postprocess types as is done in e.g. SHOC? This seems like a good idea. SHOC uses these types to perform the conversion between moist and dry mixing ratios, and it's nice to keep that calculation separate from others.
  • How do we select only the specific aerosol tracers from the qtracers array? SHOC operates on all tracers (which, by the way, may or may not be what we want for the treatment of vertical mixing for aerosols[!]). MAM4, on the other hand, only cares about aerosol number and mass mixing ratios.

Anatomy of a Time Step

Someday, this will be a hot topic! For now, we'll do staged development to convince ourselves that we know what we're doing. These steps are only sketched here--I'll probably track them each in their own separate issue.

Stage 1: A do-nothing atmosphere process

We start by implementing a very basic AtmosphereProcess subclass that extracts aerosol and related gas tracer quantities from the tracers field on the physics grid, does nothing to them, and places them unmodified back into the tracers field.

We can make use of the mam4::AeroConfig and mam4::Prognostics types for this task, extracting fields to views within a set of columns, and then copying the values back into place.

Stage 2: Add nucleation

As a foray into aerosol dynamics, we devise a test problem in which we seed the atmosphere with localized sulphuric acid gas, let it nucleate into sulphate aerosol (to be placed in the Aitken mode), and watch the nucleation process deplete the gas. It would probably be interesting to introduce a velocity field that moves things around in a quantifiable (and verifiable, if possible) manner.

To execute this test problem, we add the nucleation process to our implementation, initializing it in the appropriate place and running it within the run_impl method of the MAM4Aerosols class. We can integrate the tendencies computed for the aerosols and gases directly into the aerosol prognostics for each column. Those values then get copied back into the tracers field using code written for Stage 1.

Stage 3: ???

Revise the GasIds enum to match that in the MAM box model

Currently, the GasId enum in aero_modes.hpp has gases that aren't used by MAM4, and they are also in a different order from how they're stored in MAM4. We need to make sure this set matches that in the MAM4 box model, because we suspect that MAM4's logic makes assumptions about these IDs in certain loops.

@overfelt , do you want me to handle this?

Investigate dropmixnuc subroutine

(This issue has been migrated from the Haero repo)

From @pbosler:

Currently, it doesn't scale as it should with the number of vertical levels.

Add e3sm timers, run standard compset & report timings

We expect SYPD to scale nearly exactly as SYPD_72_levels * 72/nlev, but throughputs fall off superlinearly with nlev.

MAM4 physical constants

First, let's articulate our strategy for using physical constants in mam4xx:

Every C++ function in mam4xx should use exactly the same values for physical constants that are used by the corresponding subroutine in mam4.

The above statement should be interpreted literally. We are using whatever values MAM4 uses, even if they are inconsistent across parameterizations, within a parameterization, or just plain wrong. In general, this means a C++ function can have its own set of constants, depending on the situation in the corresponding Fortran subroutine.

This strategy may seem crazy, but it allows us to confine our attention to porting errors when we compare output from C++ and Fortran implementations.

Tracking Inconsistencies

When you encounter an inconsistent value of a physical constant while porting MAM4 Fortran source to C++, please add a comment next to it in the C++ file with the tag BAD_CONSTANT and describe the problem if it's easy to summarize. For example:

constexpr Real pi = 4.13; // BAD_CONSTANT!! Very bad accuracy--no circles possible

This allows us to quickly search for BAD_CONSTANTs throughout the code so we can return one day and fix them. The procedure for fixing these constants is, sadly, outside the scope of this issue.

Bump Required CMake to v3.17?

I wanted to get the team's thoughts on moving to a minimum required CMake 3.17.

Currently, we require CMake 3.12, but 3.17 has some nice functionality. My immediate interest is in ZIP_LISTS that provides the same functionality as a python-style zipped list. I am interested in using it to parallel iterate over validation tests when a single default passing tolerance does not work for all tests. This uses the add_test() syntax introduced by Oscar. E.g.,

foreach(input tol IN ZIP_LISTS TEST_LIST ERROR_THRESHOLDS)
[...]
add_test(validate_${input} python3 compare_mam4xx_mam4.py mam4xx_${input}.py mam_${input}.py True ${tol})
  • Weaver: 3.12 is module default but available versions range from 3.6 to 3.24.
  • Blake: 3.22 is module default but available versions range from 3.4 to 3.25.
  • Chrysalis: 3.20 pre-loaded by default, but modules for 3.19 and 3.24.
  • Compy: has 3.19 (thanks, Jeff!)
  • Mappy: I don't have an acccount--would someone mind checking?

@jeff-cohere @pressel @overfelt @pbosler @singhbalwinder @odiazib

calcsize: replace non-boolean switching statements with logic statements

As mentioned in the below-included comment on #72:

In the porting of calcsize, we encountered a place in which a multiplication by 1 million is employed to enforce behavior that would be better handled with logical statements, namely here:

{
  /*
   * Relaxation factor is currently assumed to be a factor of 3 in diameter
   * which makes it 3**3=27 for volume.  i.e. dgnumlo_relaxed = dgnumlo/3 and
   * dgnumhi_relaxed = dgnumhi*3; therefore we use 3**3=27 as a relaxation
   * factor for volume.
   *
   * \see get_relaxed_v2n_limits
   */

  const Real relax_factor = 27.0; // BAD_CONSTANT!!

  // factor to artificially inflate or deflate v2nmin and v2nmax
  const Real szadj_block_fac = 1.0e6; // BAD_CONSTANT!!

  // default relaxation:
  v2nminrl = v2nmin / relax_factor;
  v2nmaxrl = v2nmax * relax_factor;
  // if do_aitacc_transfer is turned on, we will do the ait<->acc transfer
  // separately in aitken_accum_exchange subroutine, so we are effectively
  // turning OFF the size adjustment for these two modes here by artificially
  // inflating (or deflating) v2min and v2nmax using "szadj_block_fac" and then
  // computing v2minrl and v2nmaxrl based on newly computed v2min and v2nmax.

  if (do_aitacc_transfer) {
    // for aitken mode, divide v2nmin by 1.0e6 to effectively turn off the
    //          adjustment when number is too small (size is too big)
    if (is_aitken_mode)
      v2nmin /= szadj_block_fac;
    // for accumulation, multiply v2nmax by 1.0e6 to effectively turn off the
    //          adjustment when number is too big (size is too small)
    if (is_accum_mode)
      v2nmax *= szadj_block_fac;

    // Also change the v2nmaxrl/v2nminrl so that
    // the interstitial<-->activated number adjustment is effectively turned
    // off
    v2nminrl = v2nmin / relax_factor;
    v2nmaxrl = v2nmax * relax_factor;
  }

This will be added in a future minor PR, so as not to hold up the rather large #72.

PR Comment:

@pbosler pbosler 4 days ago
Instead of artificially turning off the relaxation by multiplying or dividing by one million, could we actually turn it off by using a bool or an if?

Author
@odiazib odiazib 4 days ago
@singhbalwinder, can you comment on this?

Author
@odiazib odiazib 3 days ago
This change will require to modify a few parts of the adjust_num_sizes function, which already has many if statements. @pbosler, may we do this change after we get additional validation tests for adjust_num_sizes? I could also create an issue for this change.

@jeff-cohere jeff-cohere 7 hours ago
I think we can do this in a subsequent PR. @mjs271 , would you mind creating an issue to track this?

@singhbalwinder singhbalwinder 7 hours ago
It is a good idea to add if statements to be more explicit. Sometimes folks use these methods to avoid if conditions. Like @jeff-cohere suggested, we can do it in a subsequent PR.

@pbosler pbosler 1 hour ago
A separate issue is a good way to handle this for now.

Can't find haero

Used the included build-haero.sh with haero install dir = <mam4xx-build-dir>/haero. CMake can't find it:

[pabosle@s1024454 Tue Aug 30 11:54 ~/mam4xx/build-default (pbosler/build-haero-use-env-cc)]$ ./config.sh
-- Configuring with build type: Debug
CMake Error at CMakeLists.txt:19 (include):
  include could not find requested file:

    haero

Added some debug prints to CMakeLists.txt around line 17:

message(STATUS "haero = ${haero}")
message(STATUS "MAM4XX_HAERO_DIR = ${MAM4XX_HAERO_DIR}")

output is:

-- haero =
-- MAM4XX_HAERO_DIR = /home/pabosle/mam4xx/build-default/haero

Also: ls /home/pabosle/mam4xx/build-default/haero shows that haero is built/installed correctly:

~/mam4xx/build-default$ ls haero/
bin  include  lib  lib64  share

Using Kokkos to select the device type in build-haero.

Currently in the build-haero script (https://github.com/eagles-project/mam4xx/blob/main/build-haero.sh) we pass an argument with the device type (DEVICE=$2). However, Kokkos has a feature that automatically select this device type. I have used this feature with NVIDIA GPUs, so I do not know if it works in other type of devices.

To do this we must pass a nvcc_wraper to the KOKKOS_CXX_COMPILER flag in the cmake configuration of kokkos and the library that uses kokkos.

One example using this feature is at: https://github.com/sandialabs/TChem/blob/main/scripts/master_bld.sh
In particular, check the following lines:
Line 12
Line 94
Line 147

Lines 246 to 252.

In this script, we select to build the library (tchem) with or without CUDA. Then Kokkos looks for the type of device.

Unable to build with haero and in-tree CMake based EKAT build

See #64 for the context where the first failure was noticed.

HAERO's exported CMake configuration installs an nvcc wrapper into install/bin for ease of use.

However, when performing an in-tree CMake build of EKAT, the CMAKE_CXX_COMPILER ends up pointing to the nvcc_wrapper in the HAERO source directory, rather than the one that is installed in install/bin.

This means that HAERO is dependent on it's own source directory when installing, and to resolve this issue, HAERO installations should be independent of the resource code.

To observe this issue, simply install HAERO per default mam4xx configuration using the GPU, delete the cloned HAERO source code, and then attempt to install mam4xx. You will see an error that no CMAKE_CXX_COMPILER can be found, where the error has the path where the HAERO source director was.

@jeff-cohere also witnessed this issue.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.