openhackathons-org / gpubootcamp Goto Github PK
View Code? Open in Web Editor NEWThis repository consists for gpu bootcamp material for HPC and AI
License: Apache License 2.0
This repository consists for gpu bootcamp material for HPC and AI
License: Apache License 2.0
The request is to enhance the current Jupyter notebook template with more details related to:
-- Structure
-- Flow
-- AI-specific changes if any
All the training materials currently covers converting a sequential code to GPU using N-Ways. This feature request is to add bootcamp material related to Multi-GPU programming. Ideally the tutorial should cover different methods like
-- NVSHMEM
-- NCCL
-- MPI+X
Also the impact of system and network topology on scaling. The tutorial should also look into tools that help in getting more insights on scaling and bottlenecks.
Currently, all the lab containers demand the usage of exclusive access to GPU. The ratio of 1:1 for No of GPU to bootcamp participants is okay for small numbers but does not work well for scaling. The request is to change the lab setup to share the GPU behind the scene by supporting submission scripts to prominent schedulers like SLURM.
This Bootcamp is designed to give NLP researchers an overview of the fundamentals of NVIDIA Megatron-LM ( NVIDIA open-source framework for training very large language models). The focus will be on training GPT Megatron models specifically.
It will consist of intro to Megatron-LM code base, converting data to mmap format, understanding model parallel, data-parallel and how to config your training, then train and profile GPT Megatron models.
Currently, the labs make use of Jupyter notebooks. Jupyter Notebooks lack some functionality with Jupyter Lab has like accessing the terminal. This feature request is to move labs to using Jupyter labs
During the Bootcamp MD mini-challenge, the attendees have no access to Lab Jupyter notebook content when the need to recall or revise previously learned concept that could aid their attempt on the mini-challenge arise. Thus the need for the inclusion.
Slides are currently hosted under OpenACC org google drive. Normally, we upload .ppt slides and if one tries to edit them on fly or review them, the format will be ruined as google slides don't support the .ppt well. Solutions would be to either use google slides to create slides or host it on one drive.
The current version of Climate and CFD containers derive from nvcr.io/nvidia/tensorflow:20.01-tf2-py3
These containers won't run on the latest architectures like A100. In order to run them, we need to use: tf-21.04-tf2-py3
When the container is derived from tf-21.04-tf2-py3 some python code breaks like serialization error wrt weight files due to the usage of API like tf.keras.models.load_model. Instead, API like load_weights should be used.
The request is to upgrade the labs to use the latest container supporting A100 and fix the old TF API problems.
/usr/local/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/lib64/
GPG error: https://developer.download.nvidia.com/devtools/repos/ubuntu2004/amd64 Release: The following signatures were invalid: BADSIG F60F4B3D7FA2AF80 cudatools <[email protected]>
)rm -f arraymalloc.cpp boundary.o cfd.o cfdio.o jacobi.o cfd velocity.dat colourmap.dat cfd.plt core
- needs fixingFortran is the most preferred language in HPC. This feature request is to add Fortran support to N-Ways tutorial . I should have
-- Add a footer for all Slides pointing to our Bootcamp github
-- Slide talking about programs ( Github, Hackathon, Mentor Program )
-- For Jupyter also add similar to CC of Nvidia
TO Create: Exercise notebooks for SPeechtotext, QuestionAnswering, TokenClassification, Bootcamp Challenge
Introduction notebook
Material list:
Intro to Jarvis (Slides, introductory notebook)
Intro to ASR (slides)
Intro to NLP (Slides)
SpeechToText (introductory notebook, example notebook, exercise notebook)
TokenClassification (introductory notebook, example notebook, exercise notebook)
QuestionAnswering (introductory notebook, example notebook, exercise notebook)
Suggestion to add a "General Troubleshooting" section to the top-level readme, for troubleshooting that applies to all the Docker images. Perhaps could move the 16GB GPU memory requirement there.
And one additional troubleshooting item I'd like to see is for port conflicts when hosting multiple students running the workshop containers on the same cluster. They will clobber each other on port 8888. So anyone hosting this workshop should be prepared to assign ports to students. Perhaps there there any other solutions you've encountered?
At the moment, notebooks/makefiles (particularly HPC) contains fixed paths depending on the versions of the compiler and environment variables set inside the container. This causes issues if someone tries to run the material on bare metal.
These paths in cells should be independent of the container and ideally should work on any cluster by loading modules only. Many paths are absolutes and assume a specific container configuration. This needs fixing.
Nways-Python (both the challenge and the base) containers (singularity and docker) are based on cuda:11.2 container. There is also installations of nsight-systems-2020.5.1
and nsight-compute-2020.2.1
inside it. These are very old and should be removed.
Nways Fortran and C are up to date. CUDA 11.2 [1] does have Nsight systems (2021.3) and Compute (2021.2). I suggest testing and using those instead.
[1]. https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
The documentation in Nways stdpar refers to changing the loop to use stl algorithm along with execution policy which is std::parallel . However it is observed that without converting the loop to use execution policy the and just recompiling the code with -stpar generates a parallel gpu code.
TAO bootcamp material in progress.
Content would include:
Currently, the lab N-Ways covers the only introduction to OpenACC, OpenMP, CUDA, and ISO standards using Nsight System.
This feature request is to create material on optimization techniques with help of Nsight Compute for the above-given different methods
The feature request is to integrate Physics Informed Neural Network into the AI for Science Training material.
-- The existing structure should be modified to show two approaches : Data Driven and PINN
-- Add SimNet as framework for PINN
MakeFile works with CUDA 11.0 but breaks with CUDA 11.2 version.
Sample error found when using the MakeFile to run the CUDA C MD mini-challenge :
nvcc nvlink fatal : Input file
'/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/lib64/libcudadevrt.a:cuda_device_runtime.o' newer than toolkit (112 vs 110)
Python is gaining traction in the HPC community and most sought-after programming language. The feature request is to add support of CUDA using CuPy and Numba for N-Ways tutorial.
When attendees run make clean
in the CUDA notebooks, it executes the below command:
rm -f arraymalloc.o boundary.o cfd.cu cfdio.o jacobi.cu cfd velocity.dat colourmap.dat cfd.plt core
and it removes the .cu
file . The makefile
needs to be updated and .cu
should be added.
It seems that the Fortran source code is missing a reference to nvtx
, so the solution is to add the use nvtx
.
In every notebook, we need to give a brief of our repository so anyone going through any lab would know the contents of our repository and can explore the materials. We also need to provide relevant references so that anyone interested will become aware of our Hackathons.
For example, at the end of this Notebook
I found the following lines with relevant hyperlinks.
Examples and tutorials
Here are some examples for using distribution strategy with custom training loops:
- Tutorial to train MNIST using MirroredStrategy.
- Guide on training MNIST using TPUStrategy.
- TensorFlow Model Garden repository containing collections of state-of-the-art models implemented using various strategies.
Which made me curious to check out their repository and the collection of models they have.
So if we can add a couple of lines about what we do in our repository and hackathons, more people would be aware of our work and interested people could contact us.
Currently, the description of the labs is documented on the main page and the individual labs have no documentation of what the lab is all about. Add documentation to individual labs to remove any confusion for the users of bootcamp
Suggestions related to the cell block in Resnets notebook and CNN:
# Reshape input data from (28, 28) to (28, 28, 1)
# w, h = 28, 28
#train_images = train_images.reshape(train_images.shape[0], w, h)
#test_images = test_images.reshape(test_images.shape[0], w, h)
Deletion of entire lines in the "Making predictions" block. It overwrites train_image on test_image variable names and prevents the previous cells to be re-run again. And we are not using "train_images" in the following cells.
This back-conversion is necessary?
Error reported for the stdpar implementation :
nvc++ -std=c++17 -stdpar=gpu -lm -I/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/include -L/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/lib64 -lnvToolsExt -c jacobi.cpp
"/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/include/thrust/system/detail/generic/for_each.h", line 48: error: static assertion failed with "unimplemented for this system"
THRUST_STATIC_ASSERT_MSG(
^
detected during:
instantiation of "InputIterator thrust::system::detail::generic::for_each(thrust::execution_policy<DerivedPolicy> &, InputIterator, InputIterator, UnaryFunction) [with DerivedPolicy=thrust::detail::execute_with_allocator<thrust::mr::allocator<char, thrust::mr::disjoint_unsynchronized_pool_resource<thrust::device_memory_resource, thrust::mr::new_delete_resource>>, thrust::cuda_cub::execute_on_stream_base>, InputIterator=thrust::counting_iterator<unsigned int, thrust::use_default, thrust::use_default, thrust::use_default>, UnaryFunction=lambda [](unsigned int)->void]" at line 44 of "/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/include/thrust/detail/for_each.inl"
instantiation of "InputIterator thrust::for_each(const thrust::detail::execution_policy_base<DerivedPolicy> &, InputIterator, InputIterator, UnaryFunction) [with DerivedPolicy=thrust::detail::execute_with_allocator<thrust::mr::allocator<char, thrust::mr::disjoint_unsynchronized_pool_resource<thrust::device_memory_resource, thrust::mr::new_delete_resource>>, thrust::cuda_cub::execute_on_stream_base>, InputIterator=thrust::counting_iterator<unsigned int, thrust::use_default, thrust::use_default, thrust::use_default>, UnaryFunction=lambda [](unsigned int)->void]" at line 1035 of "/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/compilers/include/nvhpc/algorithm_execution.hpp"
instantiation of "void std::__pstl::__algorithm_wrapper_struct<true>::for_each(_FIt, _FIt, _UF) [with _FIt=thrust::counting_iterator<unsigned int, thrust::use_default, thrust::use_default, thrust::use_default>, _UF=lambda [](unsigned int)->void]" at line 2136 of "/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/compilers/include/nvhpc/algorithm_execution.hpp"
instantiation of "std::__pstl::__enable_if_EP<_EP, void> std::for_each(_EP &&, _FIt, _FIt, _UF) [with _EP=const std::execution::parallel_policy &, _FIt=thrust::counting_iterator<unsigned int, thrust::use_default, thrust::use_default, thrust::use_default>, _UF=lambda [](unsigned int)->void]" at line 11 of "jacobi.cpp"
1 error detected in the compilation of "jacobi.cpp".
make: *** [Makefile:41: jacobi.o] Error 2
It looks like the issue is from the below part:
This is the culprit:
void jacobistep(double *psinew, double *psi, int m, int n)
{
std::for_each(std::execution::par, thrust::counting_iterator<unsigned int>(1u),
thrust::counting_iterator<unsigned int>(m),
[psinew, psi, m, n](unsigned int i) {
for(int j=1;j<=n;j++)
{
psinew[i*(m+2)+j]=0.25*(psi[(i-1)*(m+2)+j]+psi[(i+1)*(m+2)+j]+psi[i*(m+2)+j-1]+psi[i*(m+2)+j+1]);
}
});
}
This needs investigation to recreate the issue.
Materials on the raplab-hackathon cluster should be updated for future bootcamps.
The feature request is to train the participants on deployment-related scenarios. Deployment of models requires a different skillset as compared to training the model. This Bootcamp should cover the aspects of deployment like latency, scaling, etc with help of the popular framework Triton.
The feature request is to make the users of this repository to get familiar with the concepts of :
-- Virtual Environments
-- Types of Virtual Env: Conda, Containers ( Docker, Singularity ), VM
-- Container usage: How to use containers like Singularity and Docker
The fundamentals here will help developers using this repository to get familiar with containers and how to use them in their own environment.
Currently, all the labs use explicitly state the path to open the file and the user needs to manually traverse the folder structure. This results in confusion . Instead there should be explicit hyperlinks to files resulting in better readability of material.
The material currently contains solutions for the challenge part to improve accuracy. The sample solution should be moved to gpubootcamp-challenge repository instead.
In the stdpar notebook, it is not clear as to why we are using the -DUSE_COUNTING_ITERATOR
when compiling the code for the GPU. Having fixed paths to include libs and headers for NVTX at the compile-time confuses users as to why it is not needed when we compile for the GPU but it is needed for the multicore.
Add # Copyright (c) 2021 NVIDIA Corporation. All rights reserved.
to the top of the container def files
Solutions missing for the nways - CFD (C and Fortran):
We need to come up with some structures for sharing the materials.
It is difficult to navigate inside the repo . For example, to use nways, once needs to clone the whole repo and navigate to HPC folder , then nways and it can be confusing.
The feature request is to add hands-on training for using debugging tools for CPU and GPU.
Currently, the Presentations with Speaker Notes for the labs are not visible to Bootcamp github users. Add description on where to access the presentation via google form submissions.
Working with the AI Bootcamp and some of the teams are asking why the randomization happens with the TensorFlow work.
Could we add tensorflow.random.set_seed() into the notebooks? The numpy random seed does not hold the tensorflow works accountable. Thanks!
https://www.tensorflow.org/api_docs/python/tf/random/set_seed
The feature is to add support for automated continuous integration testing for the material. The testing should support feature like:
-- Successful creation of both Singularity and Docker container
-- Run basic testing like compilation and running certain GPU Benchmark
Rename Existing Materials to Riva from Jarvis
Combined Container Solution
TO Create: Exercise notebooks for Punctuation, TextClassification, SpeechToTextClassification, Sample Application Creation, Hackathon Challenge
A developer trying to adopt stdpar should ideally start with a c++ version. the current code shown is C-based and the users are then expected to convert using templates and lambda. Doing the following will make it simpler to understand for stdpar developer
-- Start with code which is already C++ based
-- Change from using lambda to functor which is more easily understood
Integration of CUDA-Python into existing Numba and CuPy NWays Material.
A suggestion from an attendee for the file Countering_Data_Imbalance.ipynb
:
(h, w) = (232,232)
center = (w / 2, h / 2)
angle90 = 90
angle180 = 180
angle270 = 270
scale = 1.0
The variables seem unused and can likely be deleted.
The feature request will expose the developers to different methods and tools for training an AI model on the GPU.
Will expose concepts ike:
-- Data Parallelism vs Model Parallelism
-- System Topology
-- Framework Support like Horovord
-- Tools for Multi GPU Profiling
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.