Git Product home page Git Product logo

ams's Introduction

Autonomous MultiScale Library (AMS)

A library (under construction) to simplify machine learning surrogate model integration in HPC codes.

Getting Involved

AMS is an open-source project, and we welcome contributions from the community.

Contributions

We welcome all kinds of contributions: new features, bug fixes, documentation edits; it's all great!

To contribute, make a pull request, with develop as the destination branch.

Authors

Thanks to all of AMS contributors.

AMS was created under the AMS LDRD-SI project (22-SI-004).

Citation

If you use this software, please cite it as below:

  • Bhatia, Harsh, Patki, Tapasya A., Brink, Stephanie, Pottier, Loïc, Stitt, Thomas M., Parasyris, Konstantinos, Milroy, Daniel J., Laney, Daniel E., Blake, Robert C., Yeom, Jae-Seung, Bremer, Peer-Timo, and Doutriaux, Charles. Autonomous MultiScale Library. Computer Software. https://github.com/LLNL/AMS. US DOE National Nuclear Security Administration (NNSA). 01 May. 2023. Web. doi:10.11578/dc.20230721.1.

or get the format you prefer from here

Release

AMSLib is released under Apache License (Version 2.0) with LLVM exceptions. For more details, please see the LICENSE

LLNL-CODE-851455

Installation Instructions

See INSTALL.md for full instructions.

ams's People

Contributors

bhatiaharsh avatar doutriaux1 avatar ggeorgakoudis avatar koparasy avatar lpottier avatar milroy avatar slabasan avatar tomstitt avatar tpatki avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ams's Issues

Allow surrogate-model inference using single precision.

We currently always expect the model to use the same precision as the physics code. This effectively can limit the performance gains of the model. We should allow casting inputs to single precision and then cast back to double.

AMS workflow python package installation warning

Make install prints this:

Installing collected packages: argparse, ams-wf                                                                                          
  Attempting uninstall: ams-wf            
    Found existing installation: ams-wf 1.0
    Uninstalling ams-wf-1.0:
      Successfully uninstalled ams-wf-1.0
  DEPRECATION: ams-wf is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the '
wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' optio
n. Discussion can be found at https://github.com/pypa/pip/issues/8559

Besides the warning, I observed also unnecessary re-installs of the package.

The miniapp ams-example segfaults with 10 number of elements

Running as

LIBAMS_VERBOSITY_LEVEL=-1 /p/vast1/ggeorgak/projects/ams/AMS/build_ruby/examples/ams_example --precision single --uqtype deltauq-mean -db ./db -S /p/vast1/ggeorgak/projects/ams/AMS/tests/tuple-single.torchscript -e 10

Segfaults with output:

--> cycle: 1
 material 0: using sparse packing for -5 elems
[ Workflow ] Entering Evaluate with problem dimensions [(-320, 2, -320, 4)]
[ Workflow ] Memory usage at Start is VM:1.37981e+06 RS:141504
[ ResourceManager ] Requesting to allocate -320 values using allocator :mmp-host-quickpool
terminate called after throwing an instance of 'umpire::util::Exception'
  what():  ! Umpire Exception [/var/tmp/blake14/spack-stage/spack-stage-umpire-2022.03.1-zzq5wck6qqis42cbreyywthqbw2vcb2j/spack-src/src/umpire/alloc/MallocAllocator.hpp:43]:  allocate malloc( bytes = 18446744073709551312 ) failed
    Backtrace: 10 frames
    0 0x1555550c6a07 No dladdr: /p/vast1/ggeorgak/projects/ams/AMS/build_ruby/src/libAMS.so(_ZN6umpire5alloc15MallocAllocator8allocateEm+0x527) [0x1555550c6a07]
    1 0x1555550c744e No dladdr: /p/vast1/ggeorgak/projects/ams/AMS/build_ruby/src/libAMS.so(_ZN6umpire8resource21DefaultMemoryResourceINS_5alloc15MallocAllocatorEE8allocateEm+0x4e) [0x1555550c744e]
    2 0x1555550ad940 No dladdr: /p/vast1/ggeorgak/projects/ams/AMS/build_ruby/src/libAMS.so(_ZN6umpire8strategy6mixins17AlignedAllocation16aligned_allocateEm+0x20) [0x1555550ad940]
    3 0x15555505f036 No dladdr: /p/vast1/ggeorgak/projects/ams/AMS/build_ruby/src/libAMS.so(+0x3b036) [0x15555505f036]
    4 0x40cd44 No dladdr: /p/vast1/ggeorgak/projects/ams/AMS/build_ruby/examples/ams_example(_ZN6umpire9Allocator8allocateEm+0x304) [0x40cd44]
    5 0x15555506c355 No dladdr: /p/vast1/ggeorgak/projects/ams/AMS/build_ruby/src/libAMS.so(_ZN3ams15ResourceManager8allocateIbEEPT_m15AMSResourceType+0x65) [0x15555506c355]
    6 0x15555507cd2d No dladdr: /p/vast1/ggeorgak/projects/ams/AMS/build_ruby/src/libAMS.so(_ZN3ams11AMSWorkflowIfE8evaluateEPviPPKfPPfiii+0x1cd) [0x15555507cd2d]
    7 0x407976 No dladdr: /p/vast1/ggeorgak/projects/ams/AMS/build_ruby/examples/ams_example() [0x407976]
    8 0x15554ac1dd85 No dladdr: /lib64/libc.so.6(__libc_start_main+0xe5) [0x15554ac1dd85]
    9 0x4097ee No dladdr: /p/vast1/ggeorgak/projects/ams/AMS/build_ruby/examples/ams_example() [0x4097ee]

Debugger (lldb):

(lldb) 
frame #9: 0x000015555507cd2d libAMS.so`ams::AMSWorkflow<float>::evaluate(this=0x000000000227fcd0, probDescr=0x000000000226fd00, totalElements=-320, inputs=<unavailable>, outputs=<unavailable>, inputDim=2, outputDim=4, Comm=1140850688) at workflow.hpp:333:65
   330        return;
   331      }
   332      // The predicate with which we will split the data on a later step
-> 333      bool *p_ml_acceptable = ams::ResourceManager::allocate<bool>(totalElements);
   334 
   335      // -------------------------------------------------------------
   336      // STEP 1: call the hdcache to look at input uncertainties
(lldb) p totalElements
(const int) $0 = -320
(lldb) up
frame #10: 0x0000000000407976 ams_example`main(argc=<unavailable>, argv=<unavailable>) at main.cpp:636:32
   633 
   634  #ifdef USE_AMS
   635  #ifdef __ENABLE_MPI__
-> 636            AMSDistributedExecute(workflow[mat_idx],
   637                                  MPI_COMM_WORLD,
   638                                  static_cast<void *>(eoses[mat_idx]),
   639                                  num_elems_for_mat * num_qpts,
(lldb) 
frame #11: 0x000015554ac1dd85 libc.so.6`__libc_start_main + 229
libc.so.6`__libc_start_main:
->  0x15554ac1dd85 <+229>: movl   %eax, %edi
    0x15554ac1dd87 <+231>: callq  0x15554ac34380            ; exit
    0x15554ac1dd8c <+236>: movq   0x8(%rsp), %rax
    0x15554ac1dd91 <+241>: leaq   0x14b07d(%rip), %rdi

Simplify cmake file for tests

Currently the cmake file replicates building binaries for testing that are identical. We should update building to avoid that for speedp.

Proper error handling in RMQ DB.

Currently when errors happen in the AMS lib RMQ database we tend to completely fail. We need to reliably continue execution and re-establish the connection.

This should become after #30 is merged in.

Adiak is not installed in our CI container

Our CI container does not include adiak. The current requested adiak version is not available in the spack packages. Please update both the container and correct cmake and the example code here to use the same macro guard (__AMS_ENABLE_ADIAK__)

Update model

As pointed by @ggeorgakoudis we are not passing strings as references in the case of updating a model. We should do so.

Additionally in the case of an update FAISS UQ we should throw an error

AMSspack install issue with git repo

Hi all

I am trying to install AMS using spack 0.23.0. It gets through its dependencies but halts on an error about openssl when trying to pull down AMS itself. In the package.py file you have

git = "[email protected]:LLNL/AMS.git"

When I try replacing this with

git = "https://github.com/LLNL/AMS.git"

It gets as far as downloading the module, but then fails on compilation because cmake cannot find Umpire (which was earlier installed with spack).

Is this a known issue? Is there some sort of permissions problem I might be having with github? Are the repo permissions set ok for people outside the dev team to be able to pull it down?

Implement CUDA run tests

The proposed approach is to hook with LLNL internal gitlab to run the tests on LLNL GPU-enabled runners.

Workflow orchestrator

We are missing a mechanism to bootstrap FLUX and correctly connect all the components. There are bash scripts under ./scripts/ that perform some of the actions as a guideline. We need to correctly port them into python and abstract out the specifics of the mini-app.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.