olcf / olcf-user-docs Goto Github PK
View Code? Open in Web Editor NEWSources for the Oak Ridge Leadership Computing Facility User Documentation
Home Page: https://docs.olcf.ornl.gov
Sources for the Oak Ridge Leadership Computing Facility User Documentation
Home Page: https://docs.olcf.ornl.gov
The (non-documentation) navigation at olcf.ornl.gov needs to point to the RTD docs.
See 'File Systems'
$ cat /etc/os-release | head -n +2
NAME="Red Hat Enterprise Linux Server"
VERSION="7.6 (Maipo)"
Currently listed as 7.5.
Include MAP for example, something like mpiP or mpitrace for MPI profiling.
It would be nice to catch syntax errors as early as possible.
each top-level heading on the home page should be in the collapsible sidebar.
As per Jack's suggestion
We need to change the Tutorials link on the training/index.html
page to point to https://github.com/olcf-tutorials instead of https://www.olcf.ornl.gov/for-users/training/tutorials/ since the latter is mostly old tutorials related to Titan.
The openshift build needs to be updated to remove the volume mounts and perform binary builds to pull in the necessary changes to the webserver configuration.
We also need to verify that the space is being managed correctly regarding access logs, etc.
Right now, the "Transferring Data" page only describes Globus. Should we have instructions about using command-line utilities with the DTNs, and/or other methods?
Right now, the Rhea Users Guide is the only place where Cross-submission is discussed.
I think it needs moved to a higher level and then referenced in the summit guide and the rhea guide (similar to "Connecting" #55)
(It also needs slurm and lsf updates, and verified that it all still works)
Starting w/ Sept. 24 outage - 100GB of each compute node's NVMe device will be dedicated to NFS cache to hold default libs. This should improve launch times. It also limits user-writable space on the burst buffer to 1500GB 1400GB (down from 1600 1500).
The batch-scripts
ref introduced by #34 has no destination.
Per community request
See Spack docs for reference.
Body content is responsive for smaller screens / browser windows. Tables are somewhat responsive, but a table's responsiveness is limited by the amount of content in its cells. When tables cannot be dynamically made smaller, the rightmost columns become inaccessible. Horizontal scrolling is an option, but is not working.
Tested and confirmed a problem in:
It would be nice to have some useful screenshots, especially for submitting pull requests and submitting issues.
More detail elsewhere might be needed.
Instead of formatted blocks with bold "Note" or "Warning", we can use .. note:: a note
and .. warning: a warning
to get colored blocks. We should update throughout.
References to Titan and Eos should be removed following the decommissioning of these systems.
There are a handful of references to these individual systems in our Policies, as well as in the System User Guides.
Currently, the OLCF internal K8 cluster polls every few minutes to check for changes.
I suspect it's possible to setup a webook or some other CI/DevOps process to trigger a rebuild on the K8 cluster when a PR is merged into the master branch. We should look into those options and what sort of security and credential management we'd need to do that with a public GitHub repo and a closed-access internal K8 cluster.
as the "connecting" and "data and transfers" sections will likely link back to stand-alone pages on those topics, we shouldn't have cross-ref destinations pointing to those sections; wherever these references are needed, they should point to the authoritative pages.
The wordpress-based Rhea User Guide at https://www.olcf.ornl.gov/for-users/system-user-guides/rhea/ is frozen as of 06 September 2019.
This means that the content in these new pages needs to be re-synced. Here is the procedure:
*these sections should just refer back to the general pages on the topic, possibly with exceptions mentioned for Rhea
We frequently see OLCF users get shut out from repeated network connection attempts when sshfs
attempts auto-reconnects (that fail due to 2 factor, of course).
Don't run sshfs
with reconnect options.
When running simple gpu codes with jsrun that do not also have MPI support, sometimes one might run into a warning such as:
CUDA Hook Library: Failed to find symbol mem_find_dreg_entries, ./a.out: undefined symbol: __PAMI_Invalidate_region
This can be solved in a few ways:
jsrun -E LD_PRELOAD=/opt/ibm/spectrum_mpi/lib/pami_451/libpami.so ...
jsrun --smpiargs="off" ...
jsrun --smpiargs="-disable_gpu_hooks ...
Some more discussion can be found at kokkos/kokkos#1985; reports there say it has been reported to IBM.
Rhea user guide for the compiling section has broken links for the compilers.
Add note to Connecting for the First Time
for ORNL employees. If using an ORNL-distributed RSA fob, there's no need to set a new PIN.
The top-level Data pages need to be populated with the following sections/structure:
Looks like this section of the main index.rst
is from the initial bare bones reST setup, and can probably be removed.
Currently we have several ways of referring to this filesystem. Let's limit it to at most 2, and one if we could agree on it.
I am preparing data section, removing Lustre, organizing some images and need to redo some of the videos.
This is a top-level section, and also needs to be linked to in the relevant places in the system user guides, and elsewhere as appropriate.
Here are is a preliminary list of offenders found with a simple grep grep -r '</for-users/' *
, but there are other forms of brokenness as well:
For in-page relative links, something like this is broken:
`NVIDIA Volta V100 </for-users/system-user-guides/summit/nvidia-v100-gpus/>`__
while something like this is fine:
`hardware threads <#hardware-threads>`__
or the most robust way (should survive page restructuring and section renaming) for linking (espcially to other pages) within the documentation is to provide cross-references: https://www.sphinx-doc.org/en/master/usage/restructuredtext/roles.html#ref-role
full URLs are okay:
`NVIDIA Volta Architecture White Paper <http://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf>`_
This might even should be on the docs homepage. People have been asking about it.
Best practices for getting support:
-- see comments below
In some areas of the user guides, say Summit > Running Jobs, images can become distorted when resizing the browser window.
If possible, enforce preservation of aspect ratios site-wide. If not possible, this should be corrected on existing images.
With this migration of OLCF docs into version control, we will be dropping the "Software Pages".
However, there is a small set of software packages that have important documentation living in these pages. This issue is to identify which software packages should have their content migrated into this repository, and to make it happen.
Most likely, packages of interest are those that are vendor-provided and have vendor-provided site-specific accompanying documentation.
The wordpress-based Summit User Guide at https://www.olcf.ornl.gov/for-users/system-user-guides/summit/summit-user-guide/#system-overview is frozen as of 06 September 2019.
This means that the content in these new pages needs to be re-synced. Here is the procedure:
*these sections should just refer back to the general pages on the topic, possibly with exceptions mentioned for Rhea
It's in the summitdev user guide, so we can hopefully lift from there.
Per discussion on #57. Version information for any software should be found by querying the system itself. The Compiling
section of the user guide lists available compiler suites, options, and feature support, but should not be responsible for capturing default versions.
We should probably add to this repo:
Probably just detail the differences between Summit and Summitdev.
The Summit 'Software' section should direct users to Rhea for viz tools like VisIt and ParaView.
in some browsers, there is a flash or brief pause before the css/js gets loaded. Other sites using sphinx with the same theme do not seem to do this (at least as badly). This might be due to several issues, and probably needs some investigation. See e.g. https://stackoverflow.com/a/12358419
e.g. https://www.olcf.ornl.gov/for-users/documents-forms/olcf-account-application/ points to various policies, user agreement, applying for an account instructions, etc.
In the Common bsub Options
section, there is a link to see the documentation
that does not exist.
@grahamlopez We should tackle this ahead of the User concall this Wednesday going live.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.