Git Product home page Git Product logo

openhackathons-org / gpubootcamp Goto Github PK

View Code? Open in Web Editor NEW
495.0 23.0 255.0 267.04 MB

This repository consists for gpu bootcamp material for HPC and AI

License: Apache License 2.0

Dockerfile 0.18% Jupyter Notebook 37.06% Makefile 0.45% C++ 11.95% Fortran 5.37% C 23.17% Python 17.30% Cuda 3.76% CMake 0.01% Singularity 0.17% AMPL 0.07% Shell 0.50%
machine-learning deep-learning data-science gpu cuda openacc mpi hpc deepstream rapidsai openmp ai4hpc

gpubootcamp's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gpubootcamp's Issues

[HPC] [N-Ways] Multi-GPU Programming

All the training materials currently covers converting a sequential code to GPU using N-Ways. This feature request is to add bootcamp material related to Multi-GPU programming. Ideally the tutorial should cover different methods like
-- NVSHMEM
-- NCCL
-- MPI+X
Also the impact of system and network topology on scaling. The tutorial should also look into tools that help in getting more insights on scaling and bottlenecks.

[General][All] Batch submission for labs

Currently, all the lab containers demand the usage of exclusive access to GPU. The ratio of 1:1 for No of GPU to bootcamp participants is okay for small numbers but does not work well for scaling. The request is to change the lab setup to share the GPU behind the scene by supporting submission scripts to prominent schedulers like SLURM.

[AI][Megtron] Train GPT models with NVIDIA Megatron-LM

This Bootcamp is designed to give NLP researchers an overview of the fundamentals of NVIDIA Megatron-LM ( NVIDIA open-source framework for training very large language models). The focus will be on training GPT Megatron models specifically.

It will consist of intro to Megatron-LM code base, converting data to mmap format, understanding model parallel, data-parallel and how to config your training, then train and profile GPT Megatron models.

[General][All] Move to Jupyter Labs

Currently, the labs make use of Jupyter notebooks. Jupyter Notebooks lack some functionality with Jupyter Lab has like accessing the terminal. This feature request is to move labs to using Jupyter labs

[General] [All] Slides Formatting

Slides are currently hosted under OpenACC org google drive. Normally, we upload .ppt slides and if one tries to edit them on fly or review them, the format will be ruined as google slides don't support the .ppt well. Solutions would be to either use google slides to create slides or host it on one drive.

[HPC-AI] [Climate_CFD] Execution fails with latest container on A100

The current version of Climate and CFD containers derive from nvcr.io/nvidia/tensorflow:20.01-tf2-py3
These containers won't run on the latest architectures like A100. In order to run them, we need to use: tf-21.04-tf2-py3

When the container is derived from tf-21.04-tf2-py3 some python code breaks like serialization error wrt weight files due to the usage of API like tf.keras.models.load_model. Instead, API like load_weights should be used.

The request is to upgrade the labs to use the latest container supporting A100 and fix the old TF API problems.

[HPC] [NWays] Container and notebook bug fixes

  1. nways challenge container def file needs to be updated to reflect the correct LD_LIBRARY_PATH : /usr/local/lib:/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/lib64/
  2. nways challenge CUDA notebook, issue with the paths. All the cudac have to be replaced with cuda-c including the path to download the profiler output
  3. installation of Nsight system and compute separately inside the compiler is no longer needed. Those can be safely removed and paths should be updated accordingly. Containers should be updated to reflect that (we found an issue with the signature validation during the separate installation that needs looking into too:GPG error: https://developer.download.nvidia.com/devtools/repos/ubuntu2004/amd64 Release: The following signatures were invalid: BADSIG F60F4B3D7FA2AF80 cudatools <[email protected]>)
  4. In the nways challenge, when you run the make clean, it removes extra files in the cuda notebook! This is what it does: rm -f arraymalloc.cpp boundary.o cfd.o cfdio.o jacobi.o cfd velocity.dat colourmap.dat cfd.plt core - needs fixing

[HPC] [NWays] Fortran support

Fortran is the most preferred language in HPC. This feature request is to add Fortran support to N-Ways tutorial . I should have

  1. OpenACC Fortran
  2. OpenMP Fortran
  3. CUDA Fortran
  4. Fortran ISO DO-Concurrent

[AI] [AI Knowledge Base] [Jarvis] Create Jarvis Bootcamp Materials

TO Create: Exercise notebooks for SPeechtotext, QuestionAnswering, TokenClassification, Bootcamp Challenge
Introduction notebook

Material list:
Intro to Jarvis (Slides, introductory notebook)
Intro to ASR (slides)
Intro to NLP (Slides)
SpeechToText (introductory notebook, example notebook, exercise notebook)
TokenClassification (introductory notebook, example notebook, exercise notebook)
QuestionAnswering (introductory notebook, example notebook, exercise notebook)

[HPC] [NWays] Add port conflicts in General Troubleshooting Section

Suggestion to add a "General Troubleshooting" section to the top-level readme, for troubleshooting that applies to all the Docker images. Perhaps could move the 16GB GPU memory requirement there.

And one additional troubleshooting item I'd like to see is for port conflicts when hosting multiple students running the workshop containers on the same cluster. They will clobber each other on port 8888. So anyone hosting this workshop should be prepared to assign ports to students. Perhaps there there any other solutions you've encountered?

[General] [All] Notebooks to work outside container - replace absolute paths

At the moment, notebooks/makefiles (particularly HPC) contains fixed paths depending on the versions of the compiler and environment variables set inside the container. This causes issues if someone tries to run the material on bare metal.

These paths in cells should be independent of the container and ideally should work on any cluster by loading modules only. Many paths are absolutes and assume a specific container configuration. This needs fixing.

[HPC][Nways] Update Nways-Python containers

Nways-Python (both the challenge and the base) containers (singularity and docker) are based on cuda:11.2 container. There is also installations of nsight-systems-2020.5.1 and nsight-compute-2020.2.1 inside it. These are very old and should be removed.

Nways Fortran and C are up to date. CUDA 11.2 [1] does have Nsight systems (2021.3) and Compute (2021.2). I suggest testing and using those instead.

[1]. https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

[HPC] [NWays] Stdpar code gives more performance just by adding -stdpar

The documentation in Nways stdpar refers to changing the loop to use stl algorithm along with execution policy which is std::parallel . However it is observed that without converting the loop to use execution policy the and just recompiling the code with -stpar generates a parallel gpu code.

[AI] [AI Knowledge Base] TAO Bootcamp Material

TAO bootcamp material in progress.
Content would include:

  • Transfer learning techniques
  • NGC pretrained models (eg resnet 10/18/50, vgg19, mobilenet etc.)
  • pretrained models configuration spec files
  • GPU Architecture for transfer learning models

[HPC] [NWays] Optimization for N-Ways with Nsight Compute

Currently, the lab N-Ways covers the only introduction to OpenACC, OpenMP, CUDA, and ISO standards using Nsight System.
This feature request is to create material on optimization techniques with help of Nsight Compute for the above-given different methods

  • Optimization for OpenACC
  • Optimization for OpenMP
  • Optimization CUDA

[HPC-AI][PINN] Physics informed neural network

The feature request is to integrate Physics Informed Neural Network into the AI for Science Training material.
-- The existing structure should be modified to show two approaches : Data Driven and PINN
-- Add SimNet as framework for PINN

[HPC][NWays] MD Mini Challenge CUDA C MakeFile Bug

MakeFile works with CUDA 11.0 but breaks with CUDA 11.2 version.

Sample error found when using the MakeFile to run the CUDA C MD mini-challenge :
nvcc nvlink fatal : Input file
'/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/lib64/libcudadevrt.a:cuda_device_runtime.o' newer than toolkit (112 vs 110)

[HPC][OpenACC] Update DLI notebooks

  • Fortran rdf code in the DLI assessment is not producing the correct result when outputting the result
  • Add notes to the DLI assessment to guide students
  • fix broken links due to Jupyter lab installation

[HPC] [Nways] Adding Python langauge to HPC N-Ways tutorial

Python is gaining traction in the HPC community and most sought-after programming language. The feature request is to add support of CUDA using CuPy and Numba for N-Ways tutorial.

  • Implementation of N-Ways MD with CuPy
  • Implementation of N-Ways MD with Numba

[HPC] [NWays-challenge] CUDA makefile issue

When attendees run make clean in the CUDA notebooks, it executes the below command:

rm -f arraymalloc.o boundary.o cfd.cu cfdio.o jacobi.cu cfd velocity.dat colourmap.dat cfd.plt core

and it removes the .cu file . The makefile needs to be updated and .cu should be added.

[General][All] Add Description for all notebooks

In every notebook, we need to give a brief of our repository so anyone going through any lab would know the contents of our repository and can explore the materials. We also need to provide relevant references so that anyone interested will become aware of our Hackathons.

For example, at the end of this Notebook

I found the following lines with relevant hyperlinks.

Examples and tutorials
Here are some examples for using distribution strategy with custom training loops:

  1. Tutorial to train MNIST using MirroredStrategy.
  2. Guide on training MNIST using TPUStrategy.
  3. TensorFlow Model Garden repository containing collections of state-of-the-art models implemented using various strategies.

Which made me curious to check out their repository and the collection of models they have.

So if we can add a couple of lines about what we do in our repository and hackathons, more people would be aware of our work and interested people could contact us.

[General][All] Add README description for all labs

Currently, the description of the labs is documented on the main page and the individual labs have no documentation of what the lab is all about. Add documentation to individual labs to remove any confusion for the users of bootcamp

[HPC_AI][AI for Science Climate] Improvements

Suggestions related to the cell block in Resnets notebook and CNN:

# Reshape input data from (28, 28) to (28, 28, 1)
# w, h = 28, 28
#train_images = train_images.reshape(train_images.shape[0], w, h)
#test_images = test_images.reshape(test_images.shape[0], w, h)

Deletion of entire lines in the "Making predictions" block. It overwrites train_image on test_image variable names and prevents the previous cells to be re-run again. And we are not using "train_images" in the following cells.
This back-conversion is necessary?

[HPC] [NWays-challenge] stdpar implementation issue

Error reported for the stdpar implementation :

nvc++ -std=c++17 -stdpar=gpu -lm -I/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/include -L/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/lib64 -lnvToolsExt -c jacobi.cpp
"/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/include/thrust/system/detail/generic/for_each.h", line 48: error: static assertion failed with "unimplemented for this system"
    THRUST_STATIC_ASSERT_MSG(
    ^
          detected during:
            instantiation of "InputIterator thrust::system::detail::generic::for_each(thrust::execution_policy<DerivedPolicy> &, InputIterator, InputIterator, UnaryFunction) [with DerivedPolicy=thrust::detail::execute_with_allocator<thrust::mr::allocator<char, thrust::mr::disjoint_unsynchronized_pool_resource<thrust::device_memory_resource, thrust::mr::new_delete_resource>>, thrust::cuda_cub::execute_on_stream_base>, InputIterator=thrust::counting_iterator<unsigned int, thrust::use_default, thrust::use_default, thrust::use_default>, UnaryFunction=lambda [](unsigned int)->void]" at line 44 of "/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/cuda/11.2/include/thrust/detail/for_each.inl"
            instantiation of "InputIterator thrust::for_each(const thrust::detail::execution_policy_base<DerivedPolicy> &, InputIterator, InputIterator, UnaryFunction) [with DerivedPolicy=thrust::detail::execute_with_allocator<thrust::mr::allocator<char, thrust::mr::disjoint_unsynchronized_pool_resource<thrust::device_memory_resource, thrust::mr::new_delete_resource>>, thrust::cuda_cub::execute_on_stream_base>, InputIterator=thrust::counting_iterator<unsigned int, thrust::use_default, thrust::use_default, thrust::use_default>, UnaryFunction=lambda [](unsigned int)->void]" at line 1035 of "/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/compilers/include/nvhpc/algorithm_execution.hpp"
            instantiation of "void std::__pstl::__algorithm_wrapper_struct<true>::for_each(_FIt, _FIt, _UF) [with _FIt=thrust::counting_iterator<unsigned int, thrust::use_default, thrust::use_default, thrust::use_default>, _UF=lambda [](unsigned int)->void]" at line 2136 of "/opt/nvidia/hpc_sdk/Linux_x86_64/21.3/compilers/include/nvhpc/algorithm_execution.hpp"
            instantiation of "std::__pstl::__enable_if_EP<_EP, void> std::for_each(_EP &&, _FIt, _FIt, _UF) [with _EP=const std::execution::parallel_policy &, _FIt=thrust::counting_iterator<unsigned int, thrust::use_default, thrust::use_default, thrust::use_default>, _UF=lambda [](unsigned int)->void]" at line 11 of "jacobi.cpp"

1 error detected in the compilation of "jacobi.cpp".
make: *** [Makefile:41: jacobi.o] Error 2

It looks like the issue is from the below part:
This is the culprit:

void jacobistep(double *psinew, double *psi, int m, int n)
{
  
  
		std::for_each(std::execution::par, thrust::counting_iterator<unsigned int>(1u), 
                      thrust::counting_iterator<unsigned int>(m),
					  [psinew, psi, m, n](unsigned int i) {

      for(int j=1;j<=n;j++)
	{
	  psinew[i*(m+2)+j]=0.25*(psi[(i-1)*(m+2)+j]+psi[(i+1)*(m+2)+j]+psi[i*(m+2)+j-1]+psi[i*(m+2)+j+1]);
        }
                      });
  
}

This needs investigation to recreate the issue.

[AI][Triton] Deployment using Triton

The feature request is to train the participants on deployment-related scenarios. Deployment of models requires a different skillset as compared to training the model. This Bootcamp should cover the aspects of deployment like latency, scaling, etc with help of the popular framework Triton.

[General][Fundamental] Basics of using container environment

The feature request is to make the users of this repository to get familiar with the concepts of :
-- Virtual Environments
-- Types of Virtual Env: Conda, Containers ( Docker, Singularity ), VM
-- Container usage: How to use containers like Singularity and Docker

The fundamentals here will help developers using this repository to get familiar with containers and how to use them in their own environment.

[General][All] Manual parsing of linking to files

Currently, all the labs use explicitly state the path to open the file and the user needs to manually traverse the folder structure. This results in confusion . Instead there should be explicit hyperlinks to files resulting in better readability of material.

[HPC][Nways] explanation for the user of -DUSE_COUNTING_ITERATOR

In the stdpar notebook, it is not clear as to why we are using the -DUSE_COUNTING_ITERATOR when compiling the code for the GPU. Having fixed paths to include libs and headers for NVTX at the compile-time confuses users as to why it is not needed when we compile for the GPU but it is needed for the multicore.

[General] [All] navigating inside the repo

It is difficult to navigate inside the repo . For example, to use nways, once needs to clone the whole repo and navigate to HPC folder , then nways and it can be confusing.

[General][Testing] Adding Automated integration testing

The feature is to add support for automated continuous integration testing for the material. The testing should support feature like:
-- Successful creation of both Singularity and Docker container
-- Run basic testing like compilation and running certain GPU Benchmark

[HPC] [NWays] Stdpar code simplification

A developer trying to adopt stdpar should ideally start with a c++ version. the current code shown is C-based and the users are then expected to convert using templates and lambda. Doing the following will make it simpler to understand for stdpar developer
-- Start with code which is already C++ based
-- Change from using lambda to functor which is more easily understood

[HPC][OpenACC] Update OpenACC notebooks

  • Update the notebooks (OpenACC,MiniProfiler) with HPCSDK (in text and cells).
  • Check the Slides
  • Update DLI OpenACC assignment and fix bugs/Typo (leave a link to edit files)

[HPC_AI][AI for Science Climate]Improvements

A suggestion from an attendee for the file Countering_Data_Imbalance.ipynb:

    (h, w) = (232,232)
    center = (w / 2, h / 2)
    angle90 = 90
    angle180 = 180
    angle270 = 270
    scale = 1.0

The variables seem unused and can likely be deleted.

[AI][Multi-GPU] Multi GPU Training on GPU

The feature request will expose the developers to different methods and tools for training an AI model on the GPU.
Will expose concepts ike:
-- Data Parallelism vs Model Parallelism
-- System Topology
-- Framework Support like Horovord
-- Tools for Multi GPU Profiling

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.