Git Product home page Git Product logo

spotty-fastai-v3's Introduction

Fast.ai course v3 on AWS Spot Instances

This repository allows you to run Jupyter Notebooks from fast.ai course Practical Deep Learning for Coders, v3 in a cheapest way on AWS Spot Instances.

AWS Spot Instnces allows you to cut the cost of your GPU instance by about 75%. They are spare unused Amazon EC2 instances (for the course we are interested in P2 and P3 instances) that you can bid for. Once your bid exceeds the current spot price the instance is launched. As the spot prices fluctuates in real time based on demand-and-supply the instance can go away anytime the spot price becomes greater than your bid price. The downside is, it will remove your instance entirely and you will lose your work.

The solution for managing all necessary AWS resources including AMIs, volumes and snapshots in the way that your work will be preserved is Spotty which was greatly described in the article How to train Deep Learning models on AWS Spot Instances using Spotty? by Oleg Polosin, the author of Spotty.

Please read this article and Spotty documentation to know how it works.

Requirements

  1. [Python 3] (https://www.python.org/downloads/)
  2. AWS account (see Sign Up page if you don't have one)
  3. Installed AWS CLI for your account (see Installing the AWS CLI).

Setup

  1. Clone this repository
$ git clone https://github.com/jagin/spotty-fastai-v3
$ cd spotty-fastai-v3
  1. Install Spotty
$ pip install -U spotty
  1. Find the cheapest region for your spot p2.xlarge instance
$ spotty spot-prices -i p2.xlarge

You should get something like:

Getting spot instance prices for "p2.xlarge"...

Price  Zone
0.2700 us-east-2c
0.2700 us-east-2b
0.2700 us-west-2a
0.2700 us-west-2b
0.2751 us-west-2c
0.2916 eu-west-1a
0.2924 eu-west-1c
0.2947 eu-west-1b
0.2949 us-east-1d
0.2960 us-east-1e
0.3015 us-east-1a
0.3037 us-east-1b
0.3140 us-east-2a
0.3648 us-east-1f
0.3690 us-east-1c
0.3978 eu-central-1b

We can clearly see that us-east-2 and us-west-2 region is the cheapest by now.
You can also select other instance type if you want but p2.xlarge should suffice for the course.

  1. Configuring the AWS CLI with selected region

I would strongly suggest to create separate named profile faceid for your account with the selected region. Don't forget to set AWS_PROFILE environment variable for your named profile.

  1. Update spotty.yaml

Edit spotty.yaml file and set your region and instanceType in the instance section.

  1. Create an AMI with NVIDIA Docker

Run the following command from the project directory (where the spotty.yaml file is located):

$ spotty create-ami

It will take some time (several minutes) to create an AMI that can be used for all your projects within the AWS region.

  1. Start an instance
$ spotty start

It will run a Spot Instance, create or mount your volumes, restore snapshots if any, synchronize the project with the running instance and start the Docker container with the environment. Notice an IP address of your spot instnce for further reference.

  1. Setup the course notebooks
$ spotty run setup

Running the first time it will clone fastai/course-v3 repository. Running the spotty run setup command again will pull the changes for the repo.

  1. Run the Jupyter Notebook
$ spotty run jupyter

Notice the ?token=your_jupyter_notebook_token string. Open a browser and type the following url http://your_spot_instnce_ip:8888/?token=your_jupyter_notebook_token

To query your GPU device state open separate terminal (be sure that your faceid AWS profile is selected) and run:

$ spotty run nvsmi

To connect to the running container via SSH, use the following command:

$ spotty ssh

It runs a tmux session, so you can always detach this session using Crtl + b, then d combination of keys. To be attached to that session later, just use the spotty ssh command again.

  1. Stop the instance

After finishing your work don't forget to stop the instance running:

$ spotty stop

The volume with your data will be unmunted from the instance. When you will be starting an instance next time, it will mount the volume automatically. You can also instruct Spotty to create a snapshot of your volume (it is cheaper than persisting the volume but will take longer to recreate the instance with a new volume). See Spotty Configuration for instance.volumes.deletionPolicy When you’re stopping the instance, Spotty automatically creates snapshots of the volumes. When you will be starting an instance next time, it will restore the snapshots automatically.

Credits

I would like to thank Oleg Polosin for his great work and support for Spotty.

License

This project is licensed under the MIT License - see the LICENSE file for details.

spotty-fastai-v3's People

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

spotty-fastai-v3's Issues

Timing out on "waiting for the Docker container to be ready"

When I pass "spotty start", it sits on "waiting for the Docker container to be ready" for 30 minutes then times out.

I'm using the default Dockerfile and spotty.yaml files that come with this repo. Server type is set to p2.xlarge in us-west-2, with 128gb volume size.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.