openhackathons-org / gpubootcamp Goto Github PK

This repository consists for gpu bootcamp material for HPC and AI

License: Apache License 2.0

Dockerfile 0.18% Jupyter Notebook 37.06% Makefile 0.45% C++ 11.95% Fortran 5.37% C 23.17% Python 17.30% Cuda 3.76% CMake 0.01% Singularity 0.17% AMPL 0.07% Shell 0.50%

machine-learning deep-learning data-science gpu cuda openacc mpi hpc deepstream rapidsai openmp ai4hpc

gpubootcamp's People

Stargazers

Watchers

Forkers

dancebean temitope-benson raunakdune maddipanjan chandanbose rahulsundar shadzzz90 chomolungma harryjmoss garrettwrong mehta-nitin mozhgan-kch vivekrunwal ami-anubhav blipblipgo enccs sflscientific ucl-rits yash-n-p vsenderov venkatsaikumarreddy zunaidalam nikhilmeena cameronbrown100 manics jzhang73 qiuwshou daitr616 hiter-joe bharat3012 tatha911 yh-wu pogmat xarryus marcodvisser vyluu aibalove rugvedpund kuangllbnu anish-saxena ccianos programmah jeng1220 maxpkatz dengshunchen infernolia bianrui0315 jrossthomson ayushchatur deebyadeepparida girihemant19 jmesong marijalogarusic sbuljat asenta00 aswinkumar1999 shantanunair meghmak13 qayin-book muntasers mmccarty bharatk-parallel lkampoli zenodia sumhncku nathanyc sverley ulyssed nikhiltkur mohsen-kalantar yujiandiao mohamedelmesawy sara-elshatby amrahmedelagoz seafgomaa rehamessameltagoury mohamedgamin aditirm omar-safwat fspiga suron-003 hugedata fermiq ovalerio cheryli lamastex xianchao-wu saidctb subodhchhabra liuqi123123 fonglulu peppyguy bhanudaysharma dynaryu molssi-hpc zora-wuw yhgon swartmilan hafizsunny anhmike

gpubootcamp's Issues

[General] [Misc] Jupyter Template for contribution

The request is to enhance the current Jupyter notebook template with more details related to:
-- Structure
-- Flow
-- AI-specific changes if any

[HPC] [N-Ways] Multi-GPU Programming

All the training materials currently covers converting a sequential code to GPU using N-Ways. This feature request is to add bootcamp material related to Multi-GPU programming. Ideally the tutorial should cover different methods like
-- NVSHMEM
-- NCCL
-- MPI+X
Also the impact of system and network topology on scaling. The tutorial should also look into tools that help in getting more insights on scaling and bottlenecks.

[General][All] Batch submission for labs

Currently, all the lab containers demand the usage of exclusive access to GPU. The ratio of 1:1 for No of GPU to bootcamp participants is okay for small numbers but does not work well for scaling. The request is to change the lab setup to share the GPU behind the scene by supporting submission scripts to prominent schedulers like SLURM.

[AI][Megtron] Train GPT models with NVIDIA Megatron-LM

This Bootcamp is designed to give NLP researchers an overview of the fundamentals of NVIDIA Megatron-LM ( NVIDIA open-source framework for training very large language models). The focus will be on training GPT Megatron models specifically.

It will consist of intro to Megatron-LM code base, converting data to mmap format, understanding model parallel, data-parallel and how to config your training, then train and profile GPT Megatron models.

[General][All] Move to Jupyter Labs

Currently, the labs make use of Jupyter notebooks. Jupyter Notebooks lack some functionality with Jupyter Lab has like accessing the terminal. This feature request is to move labs to using Jupyter labs

[HPC][NWays] Inclusion of Lab Jupyter Notebooks in the MD mini-challenge

During the Bootcamp MD mini-challenge, the attendees have no access to Lab Jupyter notebook content when the need to recall or revise previously learned concept that could aid their attempt on the mini-challenge arise. Thus the need for the inclusion.

[General] [All] Slides Formatting

Slides are currently hosted under OpenACC org google drive. Normally, we upload .ppt slides and if one tries to edit them on fly or review them, the format will be ruined as google slides don't support the .ppt well. Solutions would be to either use google slides to create slides or host it on one drive.

[HPC-AI] [Climate_CFD] Execution fails with latest container on A100

The current version of Climate and CFD containers derive from nvcr.io/nvidia/tensorflow:20.01-tf2-py3
These containers won't run on the latest architectures like A100. In order to run them, we need to use: tf-21.04-tf2-py3

When the container is derived from tf-21.04-tf2-py3 some python code breaks like serialization error wrt weight files due to the usage of API like tf.keras.models.load_model. Instead, API like load_weights should be used.

The request is to upgrade the labs to use the latest container supporting A100 and fix the old TF API problems.

[HPC] [NWays] Container and notebook bug fixes

nways challenge container def file needs to be updated to reflect the correct LD_LIBRARY_PATH : /usr/local/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/lib64/
nways challenge CUDA notebook, issue with the paths. All the cudac have to be replaced with cuda-c including the path to download the profiler output
installation of Nsight system and compute separately inside the compiler is no longer needed. Those can be safely removed and paths should be updated accordingly. Containers should be updated to reflect that (we found an issue with the signature validation during the separate installation that needs looking into too:GPG error: https://developer.download.nvidia.com/devtools/repos/ubuntu2004/amd64 Release: The following signatures were invalid: BADSIG F60F4B3D7FA2AF80 cudatools <[email protected]>)
In the nways challenge, when you run the make clean, it removes extra files in the cuda notebook! This is what it does: rm -f arraymalloc.cpp boundary.o cfd.o cfdio.o jacobi.o cfd velocity.dat colourmap.dat cfd.plt core - needs fixing

[HPC] [NWays] Fortran support

Fortran is the most preferred language in HPC. This feature request is to add Fortran support to N-Ways tutorial . I should have

OpenACC Fortran
OpenMP Fortran
CUDA Fortran
Fortran ISO DO-Concurrent

[General][Documentation] Adding Bootcamp reference to Jupyter and Slides

-- Add a footer for all Slides pointing to our Bootcamp github
-- Slide talking about programs ( Github, Hackathon, Mentor Program )
-- For Jupyter also add similar to CC of Nvidia

[AI] [AI Knowledge Base] [Jarvis] Create Jarvis Bootcamp Materials

TO Create: Exercise notebooks for SPeechtotext, QuestionAnswering, TokenClassification, Bootcamp Challenge
Introduction notebook

Material list:
Intro to Jarvis (Slides, introductory notebook)
Intro to ASR (slides)
Intro to NLP (Slides)
SpeechToText (introductory notebook, example notebook, exercise notebook)
TokenClassification (introductory notebook, example notebook, exercise notebook)
QuestionAnswering (introductory notebook, example notebook, exercise notebook)

[HPC] [NWays] Add port conflicts in General Troubleshooting Section

Suggestion to add a "General Troubleshooting" section to the top-level readme, for troubleshooting that applies to all the Docker images. Perhaps could move the 16GB GPU memory requirement there.

And one additional troubleshooting item I'd like to see is for port conflicts when hosting multiple students running the workshop containers on the same cluster. They will clobber each other on port 8888. So anyone hosting this workshop should be prepared to assign ports to students. Perhaps there there any other solutions you've encountered?

[General] [All] Notebooks to work outside container - replace absolute paths

At the moment, notebooks/makefiles (particularly HPC) contains fixed paths depending on the versions of the compiler and environment variables set inside the container. This causes issues if someone tries to run the material on bare metal.

These paths in cells should be independent of the container and ideally should work on any cluster by loading modules only. Many paths are absolutes and assume a specific container configuration. This needs fixing.

[HPC][Nways] Update Nways-Python containers

Nways-Python (both the challenge and the base) containers (singularity and docker) are based on cuda:11.2 container. There is also installations of nsight-systems-2020.5.1 and nsight-compute-2020.2.1 inside it. These are very old and should be removed.

Nways Fortran and C are up to date. CUDA 11.2 [1] does have Nsight systems (2021.3) and Compute (2021.2). I suggest testing and using those instead.

[1]. https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

[HPC] [NWays] Stdpar code gives more performance just by adding -stdpar

The documentation in Nways stdpar refers to changing the loop to use stl algorithm along with execution policy which is std::parallel . However it is observed that without converting the loop to use execution policy the and just recompiling the code with -stpar generates a parallel gpu code.

[AI] [AI Knowledge Base] TAO Bootcamp Material

TAO bootcamp material in progress.
Content would include:

Transfer learning techniques
NGC pretrained models (eg resnet 10/18/50, vgg19, mobilenet etc.)
pretrained models configuration spec files
GPU Architecture for transfer learning models

[HPC] [NWays] Optimization for N-Ways with Nsight Compute

Currently, the lab N-Ways covers the only introduction to OpenACC, OpenMP, CUDA, and ISO standards using Nsight System.
This feature request is to create material on optimization techniques with help of Nsight Compute for the above-given different methods

Optimization for OpenACC
Optimization for OpenMP
Optimization CUDA

[HPC-AI][PINN] Physics informed neural network

The feature request is to integrate Physics Informed Neural Network into the AI for Science Training material.
-- The existing structure should be modified to show two approaches : Data Driven and PINN
-- Add SimNet as framework for PINN

[HPC][NWays] MD Mini Challenge CUDA C MakeFile Bug

MakeFile works with CUDA 11.0 but breaks with CUDA 11.2 version.

Sample error found when using the MakeFile to run the CUDA C MD mini-challenge :
nvcc nvlink fatal : Input file
'/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/lib64/libcudadevrt.a:cuda_device_runtime.o' newer than toolkit (112 vs 110)

[HPC][OpenACC] Update DLI notebooks

Fortran rdf code in the DLI assessment is not producing the correct result when outputting the result
Add notes to the DLI assessment to guide students
fix broken links due to Jupyter lab installation

[HPC] [Nways] Adding Python langauge to HPC N-Ways tutorial

Python is gaining traction in the HPC community and most sought-after programming language. The feature request is to add support of CUDA using CuPy and Numba for N-Ways tutorial.

Implementation of N-Ways MD with CuPy
Implementation of N-Ways MD with Numba

[HPC] [NWays-challenge] CUDA makefile issue

When attendees run make clean in the CUDA notebooks, it executes the below command:

rm -f arraymalloc.o boundary.o cfd.cu cfdio.o jacobi.cu cfd velocity.dat colourmap.dat cfd.plt core

and it removes the .cu file . The makefile needs to be updated and .cu should be added.

[HPC] [NWays] missing "use nvtx" in CUDA Fortran examples

It seems that the Fortran source code is missing a reference to nvtx , so the solution is to add the use nvtx.

[General][All] Add Description for all notebooks

In every notebook, we need to give a brief of our repository so anyone going through any lab would know the contents of our repository and can explore the materials. We also need to provide relevant references so that anyone interested will become aware of our Hackathons.

For example, at the end of this Notebook

I found the following lines with relevant hyperlinks.

Examples and tutorials
Here are some examples for using distribution strategy with custom training loops:

Tutorial to train MNIST using MirroredStrategy.

Guide on training MNIST using TPUStrategy.

TensorFlow Model Garden repository containing collections of state-of-the-art models implemented using various strategies.

Which made me curious to check out their repository and the collection of models they have.

So if we can add a couple of lines about what we do in our repository and hackathons, more people would be aware of our work and interested people could contact us.

[General][All] Add README description for all labs

Currently, the description of the labs is documented on the main page and the individual labs have no documentation of what the lab is all about. Add documentation to individual labs to remove any confusion for the users of bootcamp

[HPC_AI][AI for Science Climate] Improvements

Suggestions related to the cell block in Resnets notebook and CNN:

# Reshape input data from (28, 28) to (28, 28, 1)
# w, h = 28, 28
#train_images = train_images.reshape(train_images.shape[0], w, h)
#test_images = test_images.reshape(test_images.shape[0], w, h)

Deletion of entire lines in the "Making predictions" block. It overwrites train_image on test_image variable names and prevents the previous cells to be re-run again. And we are not using "train_images" in the following cells.
This back-conversion is necessary?

[HPC] [NWays-challenge] stdpar implementation issue

Error reported for the stdpar implementation :

nvc++ -std=c++17 -stdpar=gpu -lm -I/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/include -L/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/lib64 -lnvToolsExt -c jacobi.cpp
"/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/include/thrust/system/detail/generic/for_each.h", line 48: error: static assertion failed with "unimplemented for this system"
    THRUST_STATIC_ASSERT_MSG(
    ^
          detected during:
            instantiation of "InputIterator thrust::system::detail::generic::for_each(thrust::execution_policy<DerivedPolicy> &, InputIterator, InputIterator, UnaryFunction) [with DerivedPolicy=thrust::detail::execute_with_allocator<thrust::mr::allocator<char, thrust::mr::disjoint_unsynchronized_pool_resource<thrust::device_memory_resource, thrust::mr::new_delete_resource>>, thrust::cuda_cub::execute_on_stream_base>, InputIterator=thrust::counting_iterator<unsigned int, thrust::use_default, thrust::use_default, thrust::use_default>, UnaryFunction=lambda [](unsigned int)->void]" at line 44 of "/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/include/thrust/detail/for_each.inl"
            instantiation of "InputIterator thrust::for_each(const thrust::detail::execution_policy_base<DerivedPolicy> &, InputIterator, InputIterator, UnaryFunction) [with DerivedPolicy=thrust::detail::execute_with_allocator<thrust::mr::allocator<char, thrust::mr::disjoint_unsynchronized_pool_resource<thrust::device_memory_resource, thrust::mr::new_delete_resource>>, thrust::cuda_cub::execute_on_stream_base>, InputIterator=thrust::counting_iterator<unsigned int, thrust::use_default, thrust::use_default, thrust::use_default>, UnaryFunction=lambda [](unsigned int)->void]" at line 1035 of "/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/compilers/include/nvhpc/algorithm_execution.hpp"
            instantiation of "void std::__pstl::__algorithm_wrapper_struct<true>::for_each(_FIt, _FIt, _UF) [with _FIt=thrust::counting_iterator<unsigned int, thrust::use_default, thrust::use_default, thrust::use_default>, _UF=lambda [](unsigned int)->void]" at line 2136 of "/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/compilers/include/nvhpc/algorithm_execution.hpp"
            instantiation of "std::__pstl::__enable_if_EP<_EP, void> std::for_each(_EP &&, _FIt, _FIt, _UF) [with _EP=const std::execution::parallel_policy &, _FIt=thrust::counting_iterator<unsigned int, thrust::use_default, thrust::use_default, thrust::use_default>, _UF=lambda [](unsigned int)->void]" at line 11 of "jacobi.cpp"

1 error detected in the compilation of "jacobi.cpp".
make: *** [Makefile:41: jacobi.o] Error 2

It looks like the issue is from the below part:
This is the culprit:

void jacobistep(double *psinew, double *psi, int m, int n)
{
  
  
		std::for_each(std::execution::par, thrust::counting_iterator<unsigned int>(1u), 
                      thrust::counting_iterator<unsigned int>(m),
					  [psinew, psi, m, n](unsigned int i) {

      for(int j=1;j<=n;j++)
	{
	  psinew[i*(m+2)+j]=0.25*(psi[(i-1)*(m+2)+j]+psi[(i+1)*(m+2)+j]+psi[i*(m+2)+j-1]+psi[i*(m+2)+j+1]);
        }
                      });
  
}

This needs investigation to recreate the issue.

[General][all] Update the material on the Hackathon cluster

Materials on the raplab-hackathon cluster should be updated for future bootcamps.

[AI][Triton] Deployment using Triton

The feature request is to train the participants on deployment-related scenarios. Deployment of models requires a different skillset as compared to training the model. This Bootcamp should cover the aspects of deployment like latency, scaling, etc with help of the popular framework Triton.

[General][Fundamental] Basics of using container environment

The feature request is to make the users of this repository to get familiar with the concepts of :
-- Virtual Environments
-- Types of Virtual Env: Conda, Containers ( Docker, Singularity ), VM
-- Container usage: How to use containers like Singularity and Docker

The fundamentals here will help developers using this repository to get familiar with containers and how to use them in their own environment.

[General][All] Manual parsing of linking to files

Currently, all the labs use explicitly state the path to open the file and the user needs to manually traverse the folder structure. This results in confusion . Instead there should be explicit hyperlinks to files resulting in better readability of material.

[AI-HPC][CFD-Climate] Remove solution from Ai for Science lab

The material currently contains solutions for the challenge part to improve accuracy. The sample solution should be moved to gpubootcamp-challenge repository instead.

[HPC][Nways] explanation for the user of -DUSE_COUNTING_ITERATOR

In the stdpar notebook, it is not clear as to why we are using the -DUSE_COUNTING_ITERATOR when compiling the code for the GPU. Having fixed paths to include libs and headers for NVTX at the compile-time confuses users as to why it is not needed when we compile for the GPU but it is needed for the multicore.

[AI][All] add license on top of the container def files

[HPC] [NWays] Solutions for the nways-cfd

Solutions missing for the nways - CFD (C and Fortran):

OpenACC
OpenMP
STDPAR
CUDA

[General][All] Discussion on giving access to slides/challenge

We need to come up with some structures for sharing the materials.

[General][All] Download links for Profiler and Zipping not working

After migration to Jupyter lab, the existing download links for profiler or any other files throws an error.

[General] [All] navigating inside the repo

It is difficult to navigate inside the repo . For example, to use nways, once needs to clone the whole repo and navigate to HPC folder , then nways and it can be confusing.

[HPC][N-Ways] Debugging Heterogeneous codes

The feature request is to add hands-on training for using debugging tools for CPU and GPU.

[General][All] Presentation access to Bootcamp Material

Currently, the Presentations with Speaker Notes for the labs are not visible to Bootcamp github users. Add description on where to access the presentation via google form submissions.

Tensorflow Random Seed set for reproducibility

Working with the AI Bootcamp and some of the teams are asking why the randomization happens with the TensorFlow work.

Could we add tensorflow.random.set_seed() into the notebooks? The numpy random seed does not hold the tensorflow works accountable. Thanks!

https://www.tensorflow.org/api_docs/python/tf/random/set_seed

[General][Testing] Adding Automated integration testing

The feature is to add support for automated continuous integration testing for the material. The testing should support feature like:
-- Successful creation of both Singularity and Docker container
-- Run basic testing like compilation and running certain GPU Benchmark

[AI] [AI Knowledge Base] [Jarvis/Riva] Modify Riva/Jarvis Bootcamp and Hackathon Materials

Rename Existing Materials to Riva from Jarvis
Combined Container Solution
TO Create: Exercise notebooks for Punctuation, TextClassification, SpeechToTextClassification, Sample Application Creation, Hackathon Challenge

[HPC] [NWays] Stdpar code simplification

A developer trying to adopt stdpar should ideally start with a c++ version. the current code shown is C-based and the users are then expected to convert using templates and lambda. Doing the following will make it simpler to understand for stdpar developer
-- Start with code which is already C++ based
-- Change from using lambda to functor which is more easily understood

Integration of CUDA-Python into NWays

Integration of CUDA-Python into existing Numba and CuPy NWays Material.

[HPC][OpenACC] Update OpenACC notebooks

Update the notebooks (OpenACC,MiniProfiler) with HPCSDK (in text and cells).
Check the Slides
Update DLI OpenACC assignment and fix bugs/Typo (leave a link to edit files)

[HPC_AI][AI for Science Climate]Improvements

A suggestion from an attendee for the file Countering_Data_Imbalance.ipynb:

    (h, w) = (232,232)
    center = (w / 2, h / 2)
    angle90 = 90
    angle180 = 180
    angle270 = 270
    scale = 1.0

The variables seem unused and can likely be deleted.

[AI][Multi-GPU] Multi GPU Training on GPU

The feature request will expose the developers to different methods and tools for training an AI model on the GPU.
Will expose concepts ike:
-- Data Parallelism vs Model Parallelism
-- System Topology
-- Framework Support like Horovord
-- Tools for Multi GPU Profiling

Do not do `using namespace std;` in a header or at all

https://github.com/gpuhackathons-org/gpubootcamp/blob/c357a6b0b50a57fabaa699d0a0e5b49700f66b19/hpc/nways/nways_labs/nways_MD/English/C/source_code/stdpar/dcdread.h#L2

https://www.geeksforgeeks.org/using-namespace-std-considered-bad-practice/

openhackathons-org / gpubootcamp Goto Github PK

gpubootcamp's People

Stargazers

Watchers

Forkers

gpubootcamp's Issues

Recommend Projects

Recommend Topics

Recommend Org