Git Product home page Git Product logo

spack-manager's Introduction

Spack-Manager

Documentation | Test Status

Spack-Manager is a light-weight extension to Spack that is intended to streamline the software development and deployment cycle for individual software projects on specific machines.

The main goals of Spack-Manager are to:

  1. Create a uniform framework for developing, testing, and deploying specific software stacks for projects across multiple platforms while utilizing Spack as much as possible.
  2. Organize Spack extensions, machine configurations, project customizations, and tools for features that are mostly project-specific, where certain necessary customizations will ultimately become generalized and merged into Spack when appropriate.
  3. A prooving ground for new commands and features that can be upstreamed to Spack

Although not strictly necessary, it is recommended that those utilizing this tool also become familiar with the features of Spack and consult Spack's documentation.

spack-manager's People

Contributors

ajpowelsnl avatar alanw0 avatar ddement avatar djglaze avatar eugeneswalker avatar frobnitzem avatar itopcuoglu avatar jrood-nrel avatar marchdf avatar mbkuhn avatar ndevelder avatar paulmullowney avatar psakievich avatar tasmith4 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spack-manager's Issues

create_machine_spack_environment.py can't find "find_machine"

Maybe this is something I did, but "find_machine" python script isn't showing up for "create_machine_spack_environment.py"...or anywhere else I can find? Commenting out the import command and using "-m" with the machine designation from the "configs" directory gets around this.

nvcc_wrapper not picked up in builds with externals on ascicgpu

After quick-develop -n develop -s 'nalu-wind@master+cuda cuda_arch=70' trilinos@develop, spack install produces errors like this during the nalu-wind build:

g++: error: unrecognized command line option '--relocatable-device-code=true'

Also, one of the build lines shows that spack is building with mpic++, not nvcc_wrapper:

/projects/wind/system-spack/opt/spack/linux-rhel7-x86_64/gcc-9.3.0/mpich-3.4.2-4h2muy6jlgfdahehxmxa4ybndfdb6gx2/bin/mpic++ -DUSE_STK_SIMD_NONE -I/scratch/tasmit/spack-manager-test/spack-manager/environments/develop/nalu-wind/include -I/scratch/tasmit/spack-manager-test/spack-manager/environments/develop/nalu-wind/spack-build-khf4ao2/include -isystem /scratch/tasmit/spack-manager-test/spack-manager/spack/opt/spack/linux-rhel7-x86_64/gcc-9.3.0/trilinos-developjeus3vtaczns5wytutya65qwntxa5yvf/include -isystem /projects/wind/spack-manager/views/exawind/snapshots/ascicgpu/2022-02-14/gcc-cuda/include -isystem /projects/wind/system-spack/opt/spack/linux-rhel7-x86_64/gcc-9.3.0/cuda-11.2.2-wlah7an4q7rej4uylqlimczaa6z3zlq7/include -isystem /projects/wind/spack-manager/views/exawind/snapshots/ascicgpu/2022-02-14/gcc-cuda/lib/cmake/yaml-cpp/../../../include -O3 -DNDEBUG -fPIC --relocatable-device-code=true -expt-extended-lambda -Wext-lambda-captures-this -arch=sm_70 --expt-relaxed-constexpr -Wall -Wextra -pedantic -faligned-new -std=c++14 -MD -MT CMakeFiles/nalu.dir/src/AssemblePNGElemSolverAlgorithm.C.o -MF CMakeFiles/nalu.dir/src/AssemblePNGElemSolverAlgorithm.C.o.d -o CMakeFiles/nalu.dir/src/AssemblePNGElemSolverAlgorithm.C.o -c /scratch/tasmit/spack-manager-test/spack-manager/environments/develop/nalu-wind/src/AssemblePNGElemSolverAlgorithm.C

However, after quick-create-dev -n create-dev -s 'nalu-wind@master+cuda cuda_arch=70' trilinos@develop, spack install succeeds. I've verified this behavior on ascicgpu057, and @ldh4 has also reported it on ascicgpu22, so it seems to not be machine-specific. Based on the fact that quick-develop fails while quick-create-dev succeeds, my initial assessment is that something is broken with the ascicgpu externals.

@psakievich we are getting successful builds without externals, so this is not urgent, but wanted you to be aware.

Document machine configuration steps

We should document the steps to set up a new machine configuration in the configs/ directory, and possibly include a location for user-added configs and repos.

Ideally we can get this documented and added before the AMR-Wind tutorial, so users on other machines will be able to take advantage of spack-manager right away.

Lawrence

spack manager create-dev-env documentation error

When running spack manager create-dev-env with no args, it fails, asking for a spec, but the help text says the spec is optional. Running with spack manager checked out at 9c2fd5e (note that in the help text it has square brackets around all options, implying they are optional):

$ quick-develop
+ spack-start
==> Removing cached information on repositories
+ spack manager create-dev-env
==> Error:
ERROR: specs are a required argument for spack manager create-dev-env.


ERROR: Exiting quick-develop prematurely

$ quick-develop --help
+ spack-start
==> Removing cached information on repositories
*************************************************************
HELP MESSAGE:
quick-develop sets up a developer environment and installs it

This command is designed to require minimal arguments and simple setup
with the caveat of accepting all the defaults for:

- repo and branch cloned for your packages
- latest external snapshot with the default compilers/configurations

Please note that for specifying multiple specs with spaces you need to
wrap them in quotes as follows:

"'amr-wind@main build_type=Debug' nalu-wind@master 'exawind@master build_type=Debug'"

The base command and it's help are echoed below:


+ spack manager create-dev-env --help
usage: spack manager create-dev-env [-h] [-m MACHINE] [-d DIRECTORY | -n NAME]
                                    [-y YAML] [-s SPEC [SPEC ...]]

optional arguments:
  -d DIRECTORY, --directory DIRECTORY
                        Directory to copy files
  -h, --help            show this help message and exit
  -m MACHINE, --machine MACHINE
                        Machine to match configs
  -n NAME, --name NAME  Name of directory to copy files that will be in $SPACK_MANAGER/environments
  -s SPEC [SPEC ...], --spec SPEC [SPEC ...]
                        Specs to populate the environment with
  -y YAML, --yaml YAML  Reference spack.yaml to copy to directory
*************************************************************

Externals includes superseding dev build headers

@PaulMullowney noticed hypre headers from his dev-build are being ignored for the snapshot headers.

Path forward is to use projections for the views so each package has its own directory. We need to verify that a single module for the exawind-suite can still be created when we do this. If that works then we can just add a basic projection scheme to the view definition in the snapshot creation.

Build failure on ascicgpu

On ascicgpu057, I was trying to build the full stack with the clingo concretizer (i.e. no externals) as follows:

$ cd /scratch/tasmit/spack-manager-test                           # empty directory
$ git clone --recursive [email protected]:psakievich/spack-manager   # checked out 9c2fd5e
$ export SPACK_MANAGER=$(pwd)/spack-manager
$ source $SPACK_MANAGER/scripts/useful_bash_functions.sh
$ quick-create-dev -s nalu-wind@master
$ spack install

This fails with numerous errors in the TPL builds, e.g. libsigsegv:

  >> 1714    /tmp/cczmYN1T.s:1123: Error: unknown .loc sub-directive `view'
  >> 1715    /tmp/cczmYN1T.s:1123: Error: unknown pseudo-op: `.lvu378'

and also readline, which fails with numerous compile errors, e.g.

     368    In file included from vi_mode.c:35:
  >> 369    ./config.h:30:17: error: two or more data types in declaration specifiers
     370       30 | #define ssize_t int
     371          |                 ^~~
     372    In file included from funmap.c:25:
  >> 373    ./config.h:28:16: error: duplicate 'unsigned'
     374       28 | #define size_t unsigned int
     375          |                ^~~~~~~~

@psakievich is it possible this is an issue with the compilers you recently installed?

Add CI

We need to have CI that provides

  1. unit-tests for scripts
  2. regression test for some basic workflow
  3. concretization checking for updating spack

2 and 3 can probably be combined

Specifying SPACK_MANAGER in environment

I was trying to use the new reduced workflow commands as follows, starting from empty directory spack-manager-test, in a fresh shell:

spack-manager-test$ git clone --recursive [email protected]:psakievich/spack-manager
# a bunch of git output, spack-manager is checked out at 9c2fd5e
spack-manager-test$ source spack-manager/scripts/useful_bash_functions.sh
spack-manager-test$ quick-develop
+ spack-start
-bash: /start.sh: No such file or directory

ERROR: Exiting quick-develop prematurely

This error goes away if I run export SPACK_MANAGER=$(pwd)/spack-manager -- is the fact that the useful bash functions fail without that environment variable a workflow feature or bug?

If it's considered a feature, I'd like to request all these functions fail immediately if SPACK_MANAGER is unset with a clear message asking the user to set it.

Docs on create_machine_spack_environment.py

The developer tutorial discusses the script create_machine_spack_environment.py. I believe this usage has been superseded by the spack manager commands, which are not discussed in the developer tutorial. The tutorial should probably be updated to reflect the current recommended usage.

Migrate ExaWind nightly testing to use spack-manager

  • Update gold file mechanisms in nalu-wind
  • Update gold file mechanisms in amr-wind
  • Update gold file mechanisms in spack-manager
  • Update cron mechanisms in spack-manager
  • Nalu-wind on a cron job on rhodes
  • AMR-wind on a cron job on rhodes
  • Exawind-driver on a cron job on rhodes
  • Minimize cloning each night by updating stage directories
  • Add nightly gcc builds for nalu-wind and amr-wind
  • Add nightly intel builds for nalu-wind and amr-wind
  • Add nightly clang builds with asan for nalu-wind and amr-wind
  • Add nightly amr-wind with latest amrex
  • Add nightly nalu-wind with trilinos master
  • Add nightly gcc builds for exawind-driver
  • Add nightly intel builds for exawind-driver
  • Add nightly clang builds with asan for exawind-driver
  • Try to remove need for signal trapping in install-exawind.sh and nightly testing scripts
  • Fix cmake warnings in nightly tests
  • Fix nalu-wind unit tests

quick-* commands add text to prompt

Is it possible for the quick-* commands to not add text to the prompt by default? I've discovered it doesn't play well with my custom prompt.

Confusing phrase in docs

The developer tutorial contains the sentence "You may also wonder why we are using nalu-wind instead of nalu-wind". I believe one or both of these specs has a typo, but am unsure what the original intent was.

Documentation: Advanced topics

I don't think we need to get all of these done. Mainly a list of ideas

  • editing cmake
  • parallel builds
  • concretization: together

Build is broken on ascicgpu22

While working with @ldh4 yesterday on ascicgpu22, we observed the following:

  • Check out latest spack-manager (probable SHA ed0d592)-- got link errors relating to libgfortran (we interpreted to mean the compiler environment was messed up)
  • Check out spack-manager SHA d08300b -- got same link errors
  • Remembered you need to do git submodule update to get the spack version associated with that spack-manager SHA -- builds fine

Strangely, I was unable to reproduce on ascicgpu057. From these facts I suspect a regression bug in spack sometime between those two spack-manager SHAs, that somehow only affects ascicgpu22.

Spec is nalu-wind+cuda cuda_arch=70 ^trilinos+cuda+stk_unit_tests.

Evaluate ninja generator

Had some problems with ninja generator for cuda builds when I change TMP_DIR
Also would like to test it with nalu-wind-nightly to make sure all the logic is there

Add environment variable to point to system golds

I'd like us to add an environment variable like we did for externals that points to the system gold files. I'd also like the packages to check for this environment variable and auto populate it for users if it exists and is valid.
This way people will be able to run the tests against the same golds we use for nightly tests on the machines automatically.

Make nightly tests call the snapshot_creator.py

Currently nightly tests create an environment and populate a few dev specs to test.
It would be even better if we created a snapshot so these binaries could be used by developers with the externals framework we've already put together.

To do this we need to

  • Refactor snapshot_creator.py to be able to either read specs, view id and exclusions from a yaml file, or take them at the command line. Probably the lowest hanging fruit is a preconfigured yaml file like what is currently done in the run-exawind-nightly-tests.sh script
  • Refactor run-nightly-tests.sh to use the updated snapshot_creator.py script.

Standardize view deployment for archived binaries

One of the intents of spack-manager is to allow for archives of binaries on a local machine. The goal of these archives are to

  1. Be accessible for modules
  2. Easily defined as externals in spack environments to limit rebuilds required

These should support the developer and analyst workflows and be independent of spack in case things need to be deleted or updated. The current plan is to use copy views to create these caches. The other ideas is that they should be stored in the local $SPACK_MANAGER/views directory.

However, we need a standardized naming convention/archival procedure. Some of the fields that need to be captured are

  • compiler/mpi combo
  • build date
  • software project (exawind/pele/etc)
  • product name (?)

These are a lot of fields and potential logic, and there are binary path relocation issues to consider. So perhaps a hash might be acceptable for handeling some of this data, but that will also need an API so devs can easily access the archives they wan.

Migrate scripts to be extensions of spack

We'd like to have all of our scripts become extensions of spack so the syntax would change to something like:
spack manager create-env
spack manager find-machine
etc.

This will consolidate unit testing and formatting and bring us closer to contributing back to spack.

trap command keeps ascicgpu from reporting results to CDash

The ascicgpu platform has been consistently unreliable in reporting results to CDash (usually failing to report), even though it consistently reports successfully installing/running the nightly test package.

Yesterday and today, I ran the nightly scripts multiple times, each time they failed to report until I commented out the trap command at the beginning of exawind-tests-script.sh:

https://github.com/psakievich/spack-manager/blob/aada732cb3e2a311d8c97af25e464a087932be8b/scripts/run-exawind-nightly-tests.sh#L34

While I don't really understand what this command is doing, I suspect it is somehow killing whatever process is reporting to the dashboard.

@jrood-nrel @psakievich what's the best path forward here? The easiest path is to simply delete that line, but I suspect it exists for a reason -- what was the original motivation for putting it in? Do we need to make this line platform-specific?

Spack manager bisect

Create a way to do a bisection based on date where every develop spec will move to git commits from the same day.

Exclusions in the snapshot logic has a bug

When there is a +cuda and ~cuda build in the same snapshot the exclusions for the ~cuda spec looks like:

exclude:
- +cuda
- +rocm
- ~cuda
- +rocm

when it should really be:

exclude:
- +cuda
- +rocm

Pretty sure this is because of nvcc-wrapper so we will need to think about how to fix this.
A whole bunch of specs get omitted from the views because of this. Most notably tioga is missing from all the views since it is always ~cuda

Standardize module deployment

We want standard modules that users can load to run the software stack. Currently spack creates an architecture flag which is a bit annoying since some instances of spack-manager serve multiple machines. Watching spack/spack#24156 to hopefully see that issue resolved. Other wise we will need to module use multiple directories for these cases.

But I guess we don't need spack to create all our modules for us either. If we are just pointing to views we could just write our own module definitions and add the logic to update where the point to. That could be simiplier.

The purpose of this issue is similar to #23 in that we want:

  • a standard naming convention for modules
  • a standard template in the repo
  • an automated way of creating/refreshing them

Some of the modules I think we need are:
nightly : the executables for the nightly tests
FYXXQXX: a module for each quarter until we get a stable release (thinking exawind)
and release modules too at some point.
Debating on weekly but not sure if that is too granular.

spack-start may not be called when using a subshell if `spack-start` was called in the spawning shell

If one calls spack-start and then drops into a new shell (e.g. via salloc on a compute node) some variables are carried through (notably SPACK_MANAGER_MACHINE) but not others. This means that spack-start is skipped and one can get an error when trying to activate an env in the new shell.

This doesn't have to be a new shell on a compute node via salloc but that's a common use case. One could just spawn a subshell and get the same problem.

Steps to replicate on NREL Eagle

el2 $ export SPACK_MANAGER=$(pwd)
el2 $ source start.sh
el2 $ spack-start
el2 $ salloc -t 1:00:00 -N 1 -A hfm -p debug
salloc: Pending job allocation 8389054
salloc: job 8389054 queued and waiting for resources
salloc: job 8389054 has been allocated resources
salloc: Granted job allocation 8389054
salloc: Waiting for resource configuration
salloc: Nodes r2i7n35 are ready for job
r2i7n35 $ z /projects/hfm/mhenryde/debug-avatar-sstlr/
r2i7n35 $ ./submit.batch
==> Error: `spack env activate` requires Spack's shell support.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.