HPC Dockerfiles

You can use these to create environments for jobs on HPC (Runai). Images come with Python, Pytorch, CUDA drivers, and a few useful utilities (tree, tmux, git).

Requirements

Make sure you have Docker installed - HERE
If you want to build images from scratch, create a Docker Hub account
For use with Runai: university network access
For use with Pycharm: Professional edition

Dockerfile - Base environment. You can run scripts in this using tmux. Comes with Jupyterlab and ssh support (for use with Pycharm).
Dockerfile-jl - Built on base image; has Jupyterlab entrypoint
Dockerfile-ssh - Build on base image; has ssh server entrypoint (for use with Pycharm)
Dockerfile-tb - Build on base image; has TensorBoard entrypoint

For the latter images, replace the FROM base image with your own if you want to build from scratch.

Building

docker build -t yourdockerhubname/base . - Builds base image (Dockerfile)
docker build -f Dockerfile-jl -t yourdockerhubname/jl . - Builds Jupyterlab image
docker build -f Dockerfile-ssh -t yourdockerhubname/ssh . - Builds ssh server image

Once you build an image, push it to Docker Hub with docker login and docker push imagename.

Running job on Runai (base image)

runai submit --pvc=storage:/storage -i morrisalp/base --name myjobname

You can enter it with runai bash myjobname and run persistent scripts in tmux.

Running job on Runai (JL)

Basic usage

runai submit --pvc=storage:/storage -i morrisalp/jl --name myjobname --interactive --service-type=portforward --port 8888:8888

Make sure to leave this running (recommended in tmux) so that port forwarding persists. If you are on some university server uid@serverip, you must also set up port forwarding locally with:

ssh uid@serverip -NL 8888:localhost:8888

Now you can access the service locally at localhost:8888. Enter this into your browser, and use the token from runai logs myjobname.

In all of the above, replace 8888 with another local port if needed.

JL with persistent storage (files, environments)

If you would like JupyterLab to use a subdirectory of your storage directory as the working directory, add the --working-dir flag as shown:

runai submit --pvc=storage:/storage -i morrisalp/jl --name myjobname --interactive --service-type=portforward --port 8888:8888 --working-dir /storage/yourname/notebooks

(Replace yourname with your name and create the corresponding directory.)

The default kernel does not persist between jobs. To add kernels to Jupyter for existing virtual environments (venv/conda), add the KERNEL_ENVS_DIR environment variable flag as shown:

runai submit --pvc=storage:/storage -i morrisalp/jl --name myjobname --interactive --service-type=portforward --port 8888:8888 --working-dir /storage/yourname/notebooks -e KERNEL_ENVS_DIR=/storage/yourname/notebooks/envs

Replace /storage/yourname/notebooks/envs with the directory in which your environments are saved. Note that each environment must have ipykernel installed (pip install ipykernel). In your code, you can see which environment is being used with import sys; sys.executable.

Note: To install libraries in these kernels when working in a notebook, use pip install without an exclamation point (not ! pip install).

Recommended on first run: To clone the base environment (contains CUDA-compatible torch and other scientific packages) into a new conda environment within KERNEL_ENVS_DIR, set the additional flag CLONE_BASE_TO as shown below:

runai submit --pvc=storage:/storage -i morrisalp/jl --name myjobname --interactive --service-type=portforward --port 8888:8888 --working-dir /storage/yourname/notebooks -e KERNEL_ENVS_DIR=/storage/yourname/notebooks/envs -e CLONE_BASE_TO=cloned_env

Running job on Runai (SSH / PyCharm)

runai submit --pvc=storage:/storage -i morrisalp/ssh --name myjobname --interactive --service-type=portforward --port 8888:22

Make sure to leave this running (recommended in tmux) so that port forwarding persists. If you are on some university server uid@serverip, you must also set up port forwarding locally with:

ssh uid@serverip -NL 8888:localhost:8888

Now you can access the service locally at localhost:8888. Create a new remote interpreter for your project with settings: localhost:8888, username=password=root.

In all of the above, replace 8888 with another local port if needed.

Overriding entrypoint - TensorBoard example

The following command is an example of overriding the default entrypoint of the base image, in order to launch a TensorBoard instance with port forwarding:

runai submit --pvc=storage:/storage -i morrisalp/base --name your_tb_job --interactive --service-type=portforward --port 6006:6006 --working-dir /storage/yourname/repo --command -- tensorboard --logdir lightning_logs

Replace /storage/yourname/repo and lightning_logs with the relevant directories. If you are on some university server uid@serverip, you must also set up port forwarding locally with:

ssh uid@serverip -NL 6006:localhost:6006

You can then access TensorBoard locally at the URL localhost:6006.

morrisalp / hpc-dockerfiles Goto Github PK

hpc-dockerfiles's Introduction

HPC Dockerfiles

Requirements

Contents

Building

Running job on Runai (base image)

Running job on Runai (JL)

Basic usage

JL with persistent storage (files, environments)

Running job on Runai (SSH / PyCharm)

Overriding entrypoint - TensorBoard example

hpc-dockerfiles's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent