Git Product home page Git Product logo

hpc-dockerfiles's Introduction

HPC Dockerfiles

You can use these to create environments for jobs on HPC (Runai). Images come with Python, Pytorch, CUDA drivers, and a few useful utilities (tree, tmux, git).

Requirements

  • Make sure you have Docker installed - HERE
  • If you want to build images from scratch, create a Docker Hub account
  • For use with Runai: university network access
  • For use with Pycharm: Professional edition

Contents

  • Dockerfile - Base environment. You can run scripts in this using tmux. Comes with Jupyterlab and ssh support (for use with Pycharm).
  • Dockerfile-jl - Built on base image; has Jupyterlab entrypoint
  • Dockerfile-ssh - Build on base image; has ssh server entrypoint (for use with Pycharm)
  • Dockerfile-tb - Build on base image; has TensorBoard entrypoint

For the latter images, replace the FROM base image with your own if you want to build from scratch.

Building

  • docker build -t yourdockerhubname/base . - Builds base image (Dockerfile)
  • docker build -f Dockerfile-jl -t yourdockerhubname/jl . - Builds Jupyterlab image
  • docker build -f Dockerfile-ssh -t yourdockerhubname/ssh . - Builds ssh server image

Once you build an image, push it to Docker Hub with docker login and docker push imagename.

Running job on Runai (base image)

  • runai submit --pvc=storage:/storage -i morrisalp/base --name myjobname

You can enter it with runai bash myjobname and run persistent scripts in tmux.

Running job on Runai (JL)

Basic usage

  • runai submit --pvc=storage:/storage -i morrisalp/jl --name myjobname --interactive --service-type=portforward --port 8888:8888

Make sure to leave this running (recommended in tmux) so that port forwarding persists. If you are on some university server uid@serverip, you must also set up port forwarding locally with:

  • ssh uid@serverip -NL 8888:localhost:8888

Now you can access the service locally at localhost:8888. Enter this into your browser, and use the token from runai logs myjobname.

In all of the above, replace 8888 with another local port if needed.

JL with persistent storage (files, environments)

If you would like JupyterLab to use a subdirectory of your storage directory as the working directory, add the --working-dir flag as shown:

  • runai submit --pvc=storage:/storage -i morrisalp/jl --name myjobname --interactive --service-type=portforward --port 8888:8888 --working-dir /storage/yourname/notebooks

(Replace yourname with your name and create the corresponding directory.)

The default kernel does not persist between jobs. To add kernels to Jupyter for existing virtual environments (venv/conda), add the KERNEL_ENVS_DIR environment variable flag as shown:

  • runai submit --pvc=storage:/storage -i morrisalp/jl --name myjobname --interactive --service-type=portforward --port 8888:8888 --working-dir /storage/yourname/notebooks -e KERNEL_ENVS_DIR=/storage/yourname/notebooks/envs

Replace /storage/yourname/notebooks/envs with the directory in which your environments are saved. Note that each environment must have ipykernel installed (pip install ipykernel). In your code, you can see which environment is being used with import sys; sys.executable.

Note: To install libraries in these kernels when working in a notebook, use pip install without an exclamation point (not ! pip install).

Recommended on first run: To clone the base environment (contains CUDA-compatible torch and other scientific packages) into a new conda environment within KERNEL_ENVS_DIR, set the additional flag CLONE_BASE_TO as shown below:

  • runai submit --pvc=storage:/storage -i morrisalp/jl --name myjobname --interactive --service-type=portforward --port 8888:8888 --working-dir /storage/yourname/notebooks -e KERNEL_ENVS_DIR=/storage/yourname/notebooks/envs -e CLONE_BASE_TO=cloned_env

Running job on Runai (SSH / PyCharm)

  • runai submit --pvc=storage:/storage -i morrisalp/ssh --name myjobname --interactive --service-type=portforward --port 8888:22

Make sure to leave this running (recommended in tmux) so that port forwarding persists. If you are on some university server uid@serverip, you must also set up port forwarding locally with:

  • ssh uid@serverip -NL 8888:localhost:8888

Now you can access the service locally at localhost:8888. Create a new remote interpreter for your project with settings: localhost:8888, username=password=root.

In all of the above, replace 8888 with another local port if needed.

Overriding entrypoint - TensorBoard example

The following command is an example of overriding the default entrypoint of the base image, in order to launch a TensorBoard instance with port forwarding:

  • runai submit --pvc=storage:/storage -i morrisalp/base --name your_tb_job --interactive --service-type=portforward --port 6006:6006 --working-dir /storage/yourname/repo --command -- tensorboard --logdir lightning_logs

Replace /storage/yourname/repo and lightning_logs with the relevant directories. If you are on some university server uid@serverip, you must also set up port forwarding locally with:

  • ssh uid@serverip -NL 6006:localhost:6006

You can then access TensorBoard locally at the URL localhost:6006.

hpc-dockerfiles's People

Contributors

morrisalp avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.