Git Product home page Git Product logo

Comments (7)

cyrush avatar cyrush commented on May 17, 2024 1

Thoughts + Notes on upgrading our developer TPL build process to a newer version of spack.

These notes apply not only to Conduit, but also Axom and Alpine (https://github.com/Alpine-DAV/) , since they use the same spack-based TPL build strategy. Since Axom and Alpine both depend on Conduit, we expect to flush out the upgrade process in Conduit, and then propagate it to those projects. Additionally, there are other LLNL codes that have similar requirements -- even if they aren't using uberenv.

Our ultimate goal is a sharable & reproducible TPL build process.

Once someone has paved the way for building TPLs on a platform (eg: a specific HPC cluster or OSX) with a given set of compilers, other team members should be able to easily replicate this. Easily here means minimally selecting which compiler to use, and possibly some feature variants -- the platform should automatically be detected.

Here are a few key things we want to do in support of this goal:

  • We want to explicitly specify compiler paths and prevent any settings in user's environment from undermining the process.

  • Any system dependencies also need to be tied to a specific platform and compiler.
    For example, we need to select a blessed MPI or CUDA install for each platform and compiler. (ex: gcc 4.9x on toss3 will use mvapich-zzz located at /here)

  • After installation, the TPL installs need to be shareable, guarded via standard file system perms. Any user can build a shareable set TPLs, we don't rely on a magic user to handle builds.

Towards this goal, spack does the heavy lifting for us -- however we also use a small python script named uberenv (https://github.com/LLNL/conduit/blob/master/scripts/uberenv/uberenv.py) to automate the process.

uberenv is a thin veneer around spack that helps automate TPL builds for Axom, Conduit, or Alpine.

Here is what uberenv currently does:

-checks out spack from github (a specific hash, selected by an entry in a json file)

-optionally helps setup a spack mirror in a shared location

-patches spack to disable any user or system settings related to compilers

-copies in a compilers.yaml file with blessed compilers

-patches spack to limit max number of build jobs to 8
(when python's multiproc lib reports 48 CPUs, make -j 48 breaks many autoconf TPL builds)

-copies a set of custom spack packages over the built-in spack package repo files
(this allows us to add new packages and customize or override the default logic)

-launches the spack build of a special "uberenv-zzz" package for a specific spack spec

This special "uberenv-zzz" package specifies all of the dependencies that are needed to develop the desired software project zzz (eg: Conduit, Axom, etc)

It does not build these software projects, instead it generates a file that can be used to locate the compilers and all of the TPLs that spack built. This file is called a "host-config" file b/c we use the host name in the names of these files. The host config file contents are used as CMake initial cache file that the build systems of all of these projects support. We revision control these files, so that anyone on a system with a shared TPL install can use this file to bootstrap a build.

Long term, we hope to simplify uberenv and rely more on spack features to achieve the same process.

The version of spack we are using is quite old, but to update we need to address a few issues:


The version we are using allows us to easily craft a single compilers.yaml file that outlines details for a wide range of systems. We can do so generally (say for Linux or OSX) and provide more specific options for a known HPC cluster (we are using LLNL's SYS_TYPE var).

For a concrete example, see: https://github.com/LLNL/conduit/blob/master/scripts/uberenv/compilers.yaml

Newer versions of spack support different naming schemes to identify platforms.
I don't believe they support SYS_TYPE. So we have desktop linux systems that are running rhel-x-y etc, and those appear to be the same as an HPC cluster that happens to be using the same version of rhel, even though they are drastically different systems.

We would even be happy with host-name based solutions. But we need to understand what is supported, and how we can craft our new compilers.yaml file(s).


We want to use external packages for a very small set of TPLs.

The version we are using lacks support for external packages.

In our automated build process for Conduit and Alpine on LLNL clusters, we rely on the proper MPI and CUDA being exposed in a user's PATH. (MPI via looking for mpicc, CUDA via looking for nvcc).

Axom's TPLs don't require MPI yet. In our automated build process for Axom TPLs, we have manually created files that augment the spack generated host config to enable MPI. These manual edits files are revison controlled, and provide per platform + compiler paths to MPI.

In both cases, we want to get away from our current solutions and instead use external package support via packages.yaml.

With a newer version of spack, I don't know what the correct path to use packages.yaml to select a specific MPI for a given platform + compiler. We need to discuss what is supported.

We could manage the platform specifics via uberenv spack tweaks (forcing a specific compilers.yaml + packages.yaml file based on SYS_TYPE), but we would like to do as much as possible using spack features.


We would like spack command line options that allow us to specify a "compilers.yaml" and "packages.yaml" file.

When these are passed, spack needs to use these and ignore any other user or system level spack settings. When this exists, we can remove the uberenv step that patches spack to disable user settings.


We would like a spack command line option to limit the max build jobs, to replace our current patch. (Perhaps this already exists?)


In some cases, errors that occur when building packages aren't captured in log files.
Todd has a reproducer. The problem has to be with how spack captures the standard output streams in the install env.

Unfortunately -- this happens in LLNL batch jobs, which are a very important case for our automated builds. I have also seen it a CI setting where I was building a docker container inside of a docker container.

This is a big issue b/c when things go wrong, we have to spawn another build by hand (outside of a batch job) to try to tackle the error.


How do we move our custom packages forward?

There are two issues here:

a.

Our current packages are tested heavily and frequently on LLNL's HPC clusters, and we had to hardened them against the build horrors we experienced (for example on BG/Q). New packages have not been exposed to the same vetting. There will be many issues when we upgrading.

b.
We have concerns with the complexity related to how optional dependencies have evolved in spack packages. Our goal for development is a minimal set of deps to develop our projects. As more deps are added to packages,the build process can get undermined by packages that we really don't need. Some policies on variants and default builds could help, but I don't think we will get agreement among the diverse set of spack developers and users.

We can address this with our own packages, but we want to use as many off the shelf packages as possible.

from conduit.

cyrush avatar cyrush commented on May 17, 2024

spack v0.10.0 was released on 2017-01-17

from conduit.

cyrush avatar cyrush commented on May 17, 2024

In addition to testing packages, this requires reworking our compilers.yaml and adding support for external packages for via packages.yaml. In our current setup we use SYS_TYPEs to easily add compiler specifics for llnl machines, while still allowing generic defaults for other linux platforms.
We need to understand how to recreate that setup with the new compilers.yaml format, and hopefully do the same for external packages (ex mpi) in packages.yaml

from conduit.

cyrush avatar cyrush commented on May 17, 2024

the output issue (5 above) is now resolved in spack develop, w/ the merge of spack/spack#5084

from conduit.

cyrush avatar cyrush commented on May 17, 2024

(4 above), there is a config option in new version of spack to limit the # of build tasks.

from conduit.

cyrush avatar cyrush commented on May 17, 2024

We meet with @becker33, here are a few near term things we will work in:

  • command line options for all yaml files (config, compilers, package, etc)
  • examples of how to wield new config, compilers, and package yaml files
    • including how to use a env var to select target (to support SYS_TYPE magic)
  • variant forwarding
  • active on install (for python)
  • periodic testing of collections of packages vs blessed specs

from conduit.

cyrush avatar cyrush commented on May 17, 2024

this is complete with #225, which enhanced uberenv to fill the gaps needed for us to update to a newer spack. However it still outlines our wishlist for spack support that would simplify what uberenv requires.

from conduit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.