sandialabs / spack-manager Goto Github PK
View Code? Open in Web Editor NEWA project and machine deployment model using Spack
Home Page: https://sandialabs.github.io/spack-manager/
License: Other
A project and machine deployment model using Spack
Home Page: https://sandialabs.github.io/spack-manager/
License: Other
We'd like to have all of our scripts become extensions of spack so the syntax would change to something like:
spack manager create-env
spack manager find-machine
etc.
This will consolidate unit testing and formatting and bring us closer to contributing back to spack.
Things that are external in the snapshot view should be forwarded, not have their paths point to the view
Currently the copying of compile_commands.json
has caused failures for @jrood-nrel and @PaulMullowney when using exawind
as a develop spec. This should not be happening. A quick fix is to check for the file before copying which we should do (see #43).
The bigger question is why isn't the file being created. We should answer that as well.
Allow fork and branch specification and clone
spack manager develop fork branch
Maybe this is something I did, but "find_machine" python script isn't showing up for "create_machine_spack_environment.py"...or anywhere else I can find? Commenting out the import command and using "-m" with the machine designation from the "configs" directory gets around this.
Currently nightly tests create an environment and populate a few dev specs to test.
It would be even better if we created a snapshot so these binaries could be used by developers with the externals framework we've already put together.
To do this we need to
snapshot_creator.py
to be able to either read specs, view id and exclusions from a yaml file, or take them at the command line. Probably the lowest hanging fruit is a preconfigured yaml file like what is currently done in the run-exawind-nightly-tests.sh
scriptrun-nightly-tests.sh
to use the updated snapshot_creator.py
script.spack manager exteral spec path
Is it possible for the quick-*
commands to not add text to the prompt by default? I've discovered it doesn't play well with my custom prompt.
The developer tutorial discusses the script create_machine_spack_environment.py
. I believe this usage has been superseded by the spack manager
commands, which are not discussed in the developer tutorial. The tutorial should probably be updated to reflect the current recommended usage.
How to use the command.
How to find externals.
How to let other people external your work.
Known shortcomings and limitations.
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType
linked to #162
When running spack manager create-dev-env
with no args, it fails, asking for a spec, but the help text says the spec is optional. Running with spack manager checked out at 9c2fd5e (note that in the help text it has square brackets around all options, implying they are optional):
$ quick-develop
+ spack-start
==> Removing cached information on repositories
+ spack manager create-dev-env
==> Error:
ERROR: specs are a required argument for spack manager create-dev-env.
ERROR: Exiting quick-develop prematurely
$ quick-develop --help
+ spack-start
==> Removing cached information on repositories
*************************************************************
HELP MESSAGE:
quick-develop sets up a developer environment and installs it
This command is designed to require minimal arguments and simple setup
with the caveat of accepting all the defaults for:
- repo and branch cloned for your packages
- latest external snapshot with the default compilers/configurations
Please note that for specifying multiple specs with spaces you need to
wrap them in quotes as follows:
"'amr-wind@main build_type=Debug' nalu-wind@master 'exawind@master build_type=Debug'"
The base command and it's help are echoed below:
+ spack manager create-dev-env --help
usage: spack manager create-dev-env [-h] [-m MACHINE] [-d DIRECTORY | -n NAME]
[-y YAML] [-s SPEC [SPEC ...]]
optional arguments:
-d DIRECTORY, --directory DIRECTORY
Directory to copy files
-h, --help show this help message and exit
-m MACHINE, --machine MACHINE
Machine to match configs
-n NAME, --name NAME Name of directory to copy files that will be in $SPACK_MANAGER/environments
-s SPEC [SPEC ...], --spec SPEC [SPEC ...]
Specs to populate the environment with
-y YAML, --yaml YAML Reference spack.yaml to copy to directory
*************************************************************
If one calls spack-start
and then drops into a new shell (e.g. via salloc
on a compute node) some variables are carried through (notably SPACK_MANAGER_MACHINE
) but not others. This means that spack-start
is skipped and one can get an error when trying to activate an env in the new shell.
This doesn't have to be a new shell on a compute node via salloc
but that's a common use case. One could just spawn a subshell and get the same problem.
Steps to replicate on NREL Eagle
el2 $ export SPACK_MANAGER=$(pwd)
el2 $ source start.sh
el2 $ spack-start
el2 $ salloc -t 1:00:00 -N 1 -A hfm -p debug
salloc: Pending job allocation 8389054
salloc: job 8389054 queued and waiting for resources
salloc: job 8389054 has been allocated resources
salloc: Granted job allocation 8389054
salloc: Waiting for resource configuration
salloc: Nodes r2i7n35 are ready for job
r2i7n35 $ z /projects/hfm/mhenryde/debug-avatar-sstlr/
r2i7n35 $ ./submit.batch
==> Error: `spack env activate` requires Spack's shell support.
We want to automate the creation of a branch/tag that will be the last commit before a change to a config on each supported machine in spack-manager
@PaulMullowney noticed hypre headers from his dev-build are being ignored for the snapshot headers.
Path forward is to use projections for the views so each package has its own directory. We need to verify that a single module for the exawind-suite can still be created when we do this. If that works then we can just add a basic projection scheme to the view definition in the snapshot creation.
We should document the steps to set up a new machine configuration in the configs/
directory, and possibly include a location for user-added configs and repos.
Ideally we can get this documented and added before the AMR-Wind tutorial, so users on other machines will be able to take advantage of spack-manager
right away.
Lawrence
See if it is possible to shrink all the includes files into one includes
Create a way to do a bisection based on date where every develop spec will move to git commits from the same day.
I don't think we need to get all of these done. Mainly a list of ideas
Had some problems with ninja generator for cuda builds when I change TMP_DIR
Also would like to test it with nalu-wind-nightly
to make sure all the logic is there
Spack-manager is not designed to run with python 2 so we need to check and throw. Probably in the start script
I'd like us to add an environment variable like we did for externals that points to the system gold files. I'd also like the packages to check for this environment variable and auto populate it for users if it exists and is valid.
This way people will be able to run the tests against the same golds we use for nightly tests on the machines automatically.
I would like to be able to turn off UVM for Cuda builds with something like:
exawind+hypre ^nalu-wind ^trilinos~cuda_uvm
We need to have CI that provides
2 and 3 can probably be combined
So far I've been unsuccessful at compiling nalu-wind with avx512 simd instructions turned on.
Trilinos builds fine with but Nalu-Wind dies with SIMD errors.
spack-build-out.txt
The ascicgpu platform has been consistently unreliable in reporting results to CDash (usually failing to report), even though it consistently reports successfully installing/running the nightly test package.
Yesterday and today, I ran the nightly scripts multiple times, each time they failed to report until I commented out the trap command at the beginning of exawind-tests-script.sh
:
While I don't really understand what this command is doing, I suspect it is somehow killing whatever process is reporting to the dashboard.
@jrood-nrel @psakievich what's the best path forward here? The easiest path is to simply delete that line, but I suspect it exists for a reason -- what was the original motivation for putting it in? Do we need to make this line platform-specific?
We want usable binaries in a view from nightly testing. We can keep going the current way of custom packages or possibly explore spack pipelines.
I was trying to use the new reduced workflow commands as follows, starting from empty directory spack-manager-test
, in a fresh shell:
spack-manager-test$ git clone --recursive [email protected]:psakievich/spack-manager
# a bunch of git output, spack-manager is checked out at 9c2fd5e
spack-manager-test$ source spack-manager/scripts/useful_bash_functions.sh
spack-manager-test$ quick-develop
+ spack-start
-bash: /start.sh: No such file or directory
ERROR: Exiting quick-develop prematurely
This error goes away if I run export SPACK_MANAGER=$(pwd)/spack-manager
-- is the fact that the useful bash functions fail without that environment variable a workflow feature or bug?
If it's considered a feature, I'd like to request all these functions fail immediately if SPACK_MANAGER is unset with a clear message asking the user to set it.
On ascicgpu057, I was trying to build the full stack with the clingo concretizer (i.e. no externals) as follows:
$ cd /scratch/tasmit/spack-manager-test # empty directory
$ git clone --recursive [email protected]:psakievich/spack-manager # checked out 9c2fd5e
$ export SPACK_MANAGER=$(pwd)/spack-manager
$ source $SPACK_MANAGER/scripts/useful_bash_functions.sh
$ quick-create-dev -s nalu-wind@master
$ spack install
This fails with numerous errors in the TPL builds, e.g. libsigsegv:
>> 1714 /tmp/cczmYN1T.s:1123: Error: unknown .loc sub-directive `view'
>> 1715 /tmp/cczmYN1T.s:1123: Error: unknown pseudo-op: `.lvu378'
and also readline, which fails with numerous compile errors, e.g.
368 In file included from vi_mode.c:35:
>> 369 ./config.h:30:17: error: two or more data types in declaration specifiers
370 30 | #define ssize_t int
371 | ^~~
372 In file included from funmap.c:25:
>> 373 ./config.h:28:16: error: duplicate 'unsigned'
374 28 | #define size_t unsigned int
375 | ^~~~~~~~
@psakievich is it possible this is an issue with the compilers you recently installed?
While working with @ldh4 yesterday on ascicgpu22, we observed the following:
git submodule update
to get the spack version associated with that spack-manager SHA -- builds fineStrangely, I was unable to reproduce on ascicgpu057. From these facts I suspect a regression bug in spack sometime between those two spack-manager SHAs, that somehow only affects ascicgpu22.
Spec is nalu-wind+cuda cuda_arch=70 ^trilinos+cuda+stk_unit_tests
.
When there is a +cuda
and ~cuda
build in the same snapshot the exclusions for the ~cuda
spec looks like:
exclude:
- +cuda
- +rocm
- ~cuda
- +rocm
when it should really be:
exclude:
- +cuda
- +rocm
Pretty sure this is because of nvcc-wrapper
so we will need to think about how to fix this.
A whole bunch of specs get omitted from the views because of this. Most notably tioga is missing from all the views since it is always ~cuda
99.9% of the time we are developing off the default or highest precedent infinity version of packages. So let's just add auto detection of that and make things less confusing for users.
install-exawind.sh
and nightly testing scriptsconfig.yaml
to be outside of spack
:One of the intents of spack-manager is to allow for archives of binaries on a local machine. The goal of these archives are to
These should support the developer and analyst workflows and be independent of spack in case things need to be deleted or updated. The current plan is to use copy views to create these caches. The other ideas is that they should be stored in the local $SPACK_MANAGER/views
directory.
However, we need a standardized naming convention/archival procedure. Some of the fields that need to be captured are
These are a lot of fields and potential logic, and there are binary path relocation issues to consider. So perhaps a hash might be acceptable for handeling some of this data, but that will also need an API so devs can easily access the archives they wan.
The developer tutorial contains the sentence "You may also wonder why we are using nalu-wind
instead of nalu-wind
". I believe one or both of these specs has a typo, but am unsure what the original intent was.
After quick-develop -n develop -s 'nalu-wind@master+cuda cuda_arch=70' trilinos@develop
, spack install
produces errors like this during the nalu-wind
build:
g++: error: unrecognized command line option '--relocatable-device-code=true'
Also, one of the build lines shows that spack is building with mpic++
, not nvcc_wrapper
:
/projects/wind/system-spack/opt/spack/linux-rhel7-x86_64/gcc-9.3.0/mpich-3.4.2-4h2muy6jlgfdahehxmxa4ybndfdb6gx2/bin/mpic++ -DUSE_STK_SIMD_NONE -I/scratch/tasmit/spack-manager-test/spack-manager/environments/develop/nalu-wind/include -I/scratch/tasmit/spack-manager-test/spack-manager/environments/develop/nalu-wind/spack-build-khf4ao2/include -isystem /scratch/tasmit/spack-manager-test/spack-manager/spack/opt/spack/linux-rhel7-x86_64/gcc-9.3.0/trilinos-developjeus3vtaczns5wytutya65qwntxa5yvf/include -isystem /projects/wind/spack-manager/views/exawind/snapshots/ascicgpu/2022-02-14/gcc-cuda/include -isystem /projects/wind/system-spack/opt/spack/linux-rhel7-x86_64/gcc-9.3.0/cuda-11.2.2-wlah7an4q7rej4uylqlimczaa6z3zlq7/include -isystem /projects/wind/spack-manager/views/exawind/snapshots/ascicgpu/2022-02-14/gcc-cuda/lib/cmake/yaml-cpp/../../../include -O3 -DNDEBUG -fPIC --relocatable-device-code=true -expt-extended-lambda -Wext-lambda-captures-this -arch=sm_70 --expt-relaxed-constexpr -Wall -Wextra -pedantic -faligned-new -std=c++14 -MD -MT CMakeFiles/nalu.dir/src/AssemblePNGElemSolverAlgorithm.C.o -MF CMakeFiles/nalu.dir/src/AssemblePNGElemSolverAlgorithm.C.o.d -o CMakeFiles/nalu.dir/src/AssemblePNGElemSolverAlgorithm.C.o -c /scratch/tasmit/spack-manager-test/spack-manager/environments/develop/nalu-wind/src/AssemblePNGElemSolverAlgorithm.C
However, after quick-create-dev -n create-dev -s 'nalu-wind@master+cuda cuda_arch=70' trilinos@develop
, spack install
succeeds. I've verified this behavior on ascicgpu057
, and @ldh4 has also reported it on ascicgpu22
, so it seems to not be machine-specific. Based on the fact that quick-develop
fails while quick-create-dev
succeeds, my initial assessment is that something is broken with the ascicgpu
externals.
@psakievich we are getting successful builds without externals, so this is not urgent, but wanted you to be aware.
We want standard modules that users can load to run the software stack. Currently spack creates an architecture flag which is a bit annoying since some instances of spack-manager serve multiple machines. Watching spack/spack#24156 to hopefully see that issue resolved. Other wise we will need to module use
multiple directories for these cases.
But I guess we don't need spack to create all our modules for us either. If we are just pointing to views we could just write our own module definitions and add the logic to update where the point to. That could be simiplier.
The purpose of this issue is similar to #23 in that we want:
Some of the modules I think we need are:
nightly
: the executables for the nightly tests
FYXXQXX
: a module for each quarter until we get a stable release (thinking exawind)
and release modules too at some point.
Debating on weekly but not sure if that is too granular.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.