Git Product home page Git Product logo

praveen-oak / downpour Goto Github PK

View Code? Open in Web Editor NEW
1.0 2.0 0.0 40 KB

Implementation of the synchronous distributed machine learning algorithm downpour. The project takes a neural net inspired by inception net and then uses distributed pytorch and OPEN MPI packages to implement data parallelism across multiple GPU cores to achieve near perfect linear scalability.

Python 92.24% Shell 7.76%

downpour's Introduction

downpour

Implementation of the synchronous distributed machine learning algorithm downpour. The project takes a neural net inspired by inception net and then uses distributed pytorch and OPEN MPI packages to implement data parallelism across multiple GPU cores to achieve near perfect linear scalability.

The research paper on the downpour distributed algorithm is here: https://static.googleusercontent.com/media/research.google.com/en//archive/large_deep_networks_nips2012.pdf

Requirements: The project has the following dependencies Python 3.6, cuda/9.2.88, openmpi/intel/2.0.3.

In addition pytorch needs to be compiled with the cuda and openmpi libraries. Please refer online resources and guides on how to accomplish this. Once you are done, please run the following command in the shell prompt python -c 'import torch; print(torch.version)' If your compilation was successful, then the command should print out the following on the console 1.0.0a0+4c11dee

To test if MPI is setup you can run the following mpirun -np 4python -c 'import torch.distributed as dist; dist.init_process_group(backend="mpi"),print("hello", dist.get_rank())' It should print the following on the console hello 1 hello 3 hello 2 hello 0

If you are running on a standalone node(or personal computer) with multiple GPU cores, no additional softwares are required. If you are running it on a cluster, please use a cluster batch management tool. I have used my university's HPC cluster which has slurm workload manager built in.

The project has two files

  1. downpour.py This file contains the implementation of the algorithm. The file has a main function which accepts input parameters. The file needs to be invoked from the mpi environment. Details of how to load cuda, mpi modules prior to running the project can be found in the launch_multinode.s shell script.

  2. launch_multinode.s This is a shell script which can be used to launch a distributed job. It contains command line arguments to be passed to the python file as well as arguments to the slurm workload manager in case you are using a HPC cluster to run the project. If you are not using such a cluster, you can ignore all the commands before module load openmpi/intel/2.0.3 command

downpour's People

Contributors

praveen-oak avatar

Stargazers

 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.