This works with latest CUDA 12.4 at the moment of writing this. Tested on Ubuntu 22.04
Make sure to have CUDA libraries in Path
export PATH=/usr/local/cuda-12.4/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64\ {LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
Install Miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-py38_23.9.0-0-Linux-x86_64.sh
chmod 777 Miniconda3-py38_23.9.0-0-Linux-x86_64.sh
bash Miniconda3-py38_23.9.0-0-Linux-x86_64.sh -b -p ~/miniconda/
source ~/miniconda/bin/activate
conda init
conda config --set auto_activate_base false
conda deactivate
conda update -y conda
conda install -n base conda-libmamba-solver
conda config --set solver libmamba
To create environment follow this steps:
- Create virtual env.
conda env create -v -f env-wsl2-tf-directml.yml
- Activate env
conda activate tensorflow-gpu
- Run AI Benchmark
python run_ai_bench.py
To create environment follow this steps:
- Create virtual env.
conda env create -v -f env-wsl2-tf-nvidia.yml
- Activate env
conda activate tensorflow-gpu-nvidia
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/
# Verify GPUs available
python3 -c "import os; os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'; import tensorflow as tf; print('Num GPUs Available: ', len(tf.config.list_physical_devices('GPU')))"
- Run AI Benchmark
python run_ai_bench.py
To remove env do:
conda deactivate
conda env remove -y --name tensorflow-gpu
or
conda deactivate
conda env remove -y --name tensorflow-gpu-nvidia
To create environment follow this steps:
- Create virtual env.
conda env create -v -f env-wsl2-pytorch-nvidia.yml
- Activate env
conda activate pytorch-gpu-nvidia
# Verify GPUs available
python3 -c "import torch; print(torch.cuda.is_available())"
conda env remove -y --name pytorch-gpu-nvidia
- Run Training on one node
Based on https://lambdalabs.com/blog/multi-node-pytorch-distributed-training-guide. Find best batch size and epoch
cd LambdaLabsML-examples/pytorch/distributed/resnet
torchrun --nproc_per_node=1 --nnodes=1 --node_rank=0 --master_addr=10.1.96.5 --master_port=1234 main.py --backend=nccl --batch_size=1024 --num_epochs=50 --arch=resnet50
- Run training distributed OpenMPI
/opt/.openmpi/bin/mpirun -H 10.1.96.5:1,10.1.96.6:1 -x MASTER_ADDR=10.1.96.5 -x MASTER_PORT=1234 -x PATH -bind-to none -map-by slot -mca pml ob1 -mca btl ^openib python3 main.py --backend=nccl --batch_size=768 --num_epochs=10 --arch=resnet50
Profiling python -m cProfile myscript.py