Git Product home page Git Product logo

archer2-docs's Introduction

ARCHER2 Documentation

ARCHER2 is the next generation UK National Supercomputing Service. You can find more information on the service and the research it supports on the ARCHER2 website.

This repository contains the documentation for the service and is linked to a rendered version currently hosted on Github pages.

This documentation is drawn from the Cirrus documentation, Sheffield Iceberg documentation and the ARCHER documentation.

Rendered documentation

How to contribute

We welcome contributions from the ARCHER2 community and beyond. Contributions can take many different forms, some examples are:

  • Raising Issues if you spot a mistake or something that could be improved
  • Adding/updating material via a Pull Request
  • Adding your thoughts and ideas to any open issues

All people who contribute and interact via this Github repository undertake to abide by the ARCHER2 Code of Conduct so that we, as a community, provide a welcoming and supportive environment for all people, regardless of background or identity.

To contribute content to this documentation, first you have to fork it on GitHub and clone it to your machine, see Fork a Repo for the GitHub documentation on this process.

Once you have the git repository locally on your computer, you will need to install Material for mkdocs to be able to build the documentation. This can be done using a local installation or using a Docker container.

Once you have made your changes and updated your Fork on GitHub you will need to Open a Pull Request.

Building the documentation on a local machine

Once Material for mkdocs is installed, you can preview the site locally using the instructions in the Material for mkdocs documentation.

Making changes and style guide

The documentation consists of a series of Markdown files which have the .md extension. These files are then automatically converted to HTMl and combined into the web version of the documentation by mkdocs. It is important that when editing the files the syntax of the Markdown files is followed. If there are any errors in your changes the build will fail and the documentation will not update, you can test your build locally by running mkdocs serve. The easiest way to learn what files should look like is to read the Markdown files already in the repository.

A short list of style guidance:

  • Headings should be in sentance case

archer2-docs's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

archer2-docs's Issues

Add detailed hardware information in a new User and Best Practice Guide section

It would be useful to have a new section in the User and Best Practice Guide that covers the ARCHER2 hardware and architecture in more detail. This could go after Overview but before Connecting. This should cover:

  • System overview: node types, storage types, interconnect, external networking
  • Compute node details: layout, interconnect
  • Processor details: cores, core complexes, infinity core, NUMA regions, FP unit and instruction sets, cache
  • Memory details: type, speed, volume, bandwidth/latency (theoretical and measured)
  • Interconnect details: topology, features, bandwidth/latency (theoretical and measured)
  • Point to IO section for more details on storage

Performance tuning and best practice for OpenMP

We need a section on getting the most out of OpenMP : both generic (e.g. top ten tips for OpenMP, pointing to further documentation) and specific for ARCHER2 and AMD EPYC Zen2 (will need at least the TDS for this). Should also cover what functionality is available in the various PrgEnv and what is not.

Modify MITgcm documentation for clarity on ECCOv4-r4 process

Based on user feedback, I need to:

  • Mention that after using 'wget' to obtain the forcing data, the files need to be copied from their default directory
  • For clarity and redundancy, copy the compilation instructions into the ECCOv4-r4 case

(Please feel free to assign this issue to me)

Document use of hybrid MPI+OpenMP

The current documentation has an example MPI+OpenMP script but no documentation describing the background of how to run these jobs, more advanced placement information and a description of the best layout to match onto the ARCHER2 NUMA structure. This should be added in the Scheduler chapter. Point to the Tuning chapter for more advanced information on OpenMP.

Are "here" documents useful for job submission on ARCHER2?

In the NERSC scheduler best practice, they use here documents to potentially reduce load on compute nodes and make jobs more efficient. See:

https://docs.nersc.gov/jobs/best-practices/#improve-efficiency-by-preparing-user-environment-before-running

Where they describe creating a script such as:

#!/bin/bash -l

# Submit this script as: "./prepare-env.sh" instead of "sbatch prepare-env.sh"

# Prepare user env needed for Slurm batch job
# such as module load, setup runtime environment variables, or copy input files, etc.
# Basically, these are the commands you usually run ahead of the srun command 

module load cray-netcdf
export OMP_NUM_THREADS=4

# Generate the Slurm batch script below with the here document, 
# then when sbatch the script later, the user env set up above will run on the login node
# instead of on a head compute node (if included in the Slurm batch script),
# and inherited into the batch job.

cat << EOF > prepare-env.sl 
#!/bin/bash
#SBATCH -t 30:00
#SBATCH -N 8
#SBATCH -q debug
#SBATCH -C haswell

srun -n 16 -c 32 --cpu_bind=cores ./myapp.exe 

# Other commands needed after srun, such as copy your output filies,
# should still be included in the Slurm script.
cp <my_output_file> <target_location>/.
EOF

# Now submit the batch job
sbatch prepare-env.sl

@kevinstratford commented

Not sure I like that here document business; if the preparatory work is really
significant, it could be a separate job with the main job as dependency. This
prevents conflating scripts (does prepare-env.sh here document overwrite
the submitted prepare-env.sh??)

What do people think, should we include this advice or not?

archer-migration/data-migration

There is currently a duplication of material in

archer-migration/data-migration

and

user-guide/data-migration

This needs to be rationalised.

List of Issues and Items for review prior to main system going live

Review of ARCHER2 Docs and identifying issues with move to main system (7/ JUL/21)

Changes completed

https://docs.archer2.ac.uk/faq/index.html#archer-work-data
Add year (2021) to date that ARCHER /work was decommissioned - DONE (CB)

https://docs.archer2.ac.uk/user-guide/connecting/#logging-in
Order of password and ssh key passphrase being reversed - DONE (ART)

https://docs.archer2.ac.uk/user-guide/data#work-file-systems - DONE (ART)
Update size to full /work

https://docs.archer2.ac.uk/user-guide/sw-environment - DONE (ART)
@aturner-epcc to look at this. See: #301

https://docs.archer2.ac.uk/user-guide/scheduler/#quality-of-service-qos - DONE (ART)

https://docs.archer2.ac.uk/user-guide/scheduler/#using-modules-in-the-batch-system-the-epcc-job-env-module
Need to review whether epcc-job-env-module will continue
This may break every user submit script if changed! - DONE (ART)
@aturner-epcc to look at this

https://docs.archer2.ac.uk/user-guide/scheduler/#bolt-job-submission-script-creation-tool
*** Julien check if bolt works - DONE (ART)

https://docs.archer2.ac.uk/user-guide/dev-environment/
@aturner-epcc to look at this #302 - DONE (ART)

Add information on resources on ARCHER2

Add information to docs on:

  • What a CU is and how it corresponds to time use on ARCHER2
  • How charging works: based on used time rather than requested time
  • You are charged for the nodes assigned to the job even if you do not use them all. e.g. if you request 4 nodes and and only use 2 then you are charged for the 4 nodes as they are not available to users while assigned to your jobs

Add info on ownership of data in subgroup directories

New data created in subgroup directories has the correct ownership due to the setgid bit but data copied/moved from elsewhere on /work (e.g. main project directories) keeps its current ownership (and has the setgid bit set so new data within the directories has original ownership). We should document this issue and the use of the chown command to fix ownership as it does trip users up.

Remove ARCHER to ARCHER2 part of docs

ARCHER is no more so some of this material is no longer relevant. Some of the information may still be of use so should be moved to other sections as required.

Initial version of data analysis section

This section is going to be difficult until we see what is available via the collaboration platform.

Could include information on using the cray-R environment here.

Add research-software templates

I intend to add subdirectories with scraped template content under

reserch-software

The following are relevant with most recent existing source

Cirrus -> CASTEP
ARCHER -> Code Staturne
ARCHER -> PyChemShell/ChemShell
Cirrus -> CP2K
ARCHER -> ELK
ARCHER -> FEniCS
Cirrus -> GROMACS
Cirrus -> LAMMPS
New!!! -> Met Office Unified Model
New!!! -> MITgcm
Cirrus -> NAMD
New!!! -> Nektar++
New!!! -> NEMO
ARCHER -> NWChem
ARCHER -> ONETEP
Cirrus -> OpenFOAM
Cirrus -> Quantum Espresso
Cirrus -> VASP

Initial version of Python chapter

Before we can update the Python chapter, we need to decide on the approach to Python on ARCHER2. Initial proposal is:

  • For compute node, high-performance Python: use the cray-python environment. Need to document how you use this and how you install further Python modules on top
  • For data analysis, serial Python: probably provide an Anaconda distribution. Should this be provided as a module or a container environment?
  • For self-installed Python: need to recommend a solution. Could be miniconda or could advise to pull containers from the DockerHub

Add information on shared directories and their use

Should cover:

  • Shared directories on /home and /work
  • Sharing with subgroup, project, others - different directory hierarchies and unix permissions
  • Impact on quotas
  • What happens to data in shred directories when user accounts are removed

Performance tuning and best practice for MPI

We need a section on getting the most out of MPI : both generic (e.g. top ten tips for MPI, pointing to further documentation) and specific for ARCHER2 and Slingshot (will need at least the TDS for this). Should also cover what functionality is available in CrayMPI and what is not, also any limits that users should know (maximum tag counts, eager message defaults, etc.).

Update library modules requiring new versions

  • adios 1.13.1
  • boost 1.72
  • glm 0.9.9.6
  • HYPRE 2.18.0
  • matio 1.5.18
  • metis/parmetis 5.1.0 / 4.0.3
  • mumps 5.3.5
  • petsc 3.14.2
  • scotch 6.1.0
  • slepc 3.14.1
  • superlu 5.2.2
  • superlu-dist 6.4.0
  • trilinos 13.18.1

New

  • ARPACK 3.8.0

Deferred

  • ADIOS 2.6.0

Other

  • Confirm status of cray-ga or remove

Make sure all job script examples have correct use of `epcc-job-env` module and document it

The epcc-job-env module makes sure that there is a default PrgEnv restored (unless users have modified the SBATCH_EXPORT environment variable. All example scripts should ensure that it is used in the correct place.

We should also add a section in the Scheduler chapter covering the module and what it does. Noting that it must be the first module loaded in a script if it is used.

Slurm email notifications

Document that email notifications are disabled on Slurm. Several queries related to this matter have already been handled on the ARCHER2 Service Desk.

Add information on getting memory use data from Slurm

It would be useful to show the commands for extracting memory use information from Slurm in the profiling or tuning chapters.

For example, to get current memory use of a running job:

sstat --format=JobID,JobName,averss,maxrss,maxrsstask,avevms,maxvms,maxvmsize -j 12345

Or, to get memory use of a completed job:

sacct --format=JobID,JobName,averss,maxrss,maxrsstask,avevms,maxvms,maxvmsize -j 12345

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.