Git Product home page Git Product logo

baidu-allreduce's Introduction

baidu-allreduce

baidu-allreduce is a small C++ library, demonstrating the ring allreduce and ring allgather techniques. The goal is to provide a template for deep learning framework authors to use when implementing these communication algorithms within their respective frameworks.

A description of the ring allreduce with its application to deep learning is available on the Baidu SVAIL blog.

Installation

Prerequisites: Before compiling baidu-allreduce, make sure you have installed CUDA (7.5 or greater) and an MPI implementation.

baidu-allreduce has been tested with OpenMPI, but should work with any CUDA-aware MPI implementation, such as MVAPICH.

To compile baidu-allreduce, run

# Modify MPI_ROOT to point to your installation of MPI.
# You should see $MPI_ROOT/include/mpi.h and $MPI_ROOT/lib/libmpi.so.
# Modify CUDA_ROOT to point to your installation of CUDA.
make MPI_ROOT=/usr/lib/openmpi CUDA_ROOT=/path/to/cuda/lib64

You may need to modify your LD_LIBRARY_PATH environment variable to point to your MPI implementation as well as your CUDA libraries.

To run the baidu-allreduce tests after compiling it, run

# On CPU.
mpirun --np 3 allreduce-test cpu

# On GPU. Requires a CUDA-aware MPI implementation.
mpirun --np 3 allreduce-test gpu

Interface

The baidu-allreduce library provides the following C++ functions:

// Initialize the library, including MPI and if necessary the CUDA device.
// If device == NO_DEVICE, no GPU is used; otherwise, the device specifies which CUDA
// device should be used. All data passed to other functions must be on that device.
#define NO_DEVICE -1
void InitCollectives(int device);

// The ring allreduce. The lengths of the data chunks passed to this function
// must be the same across all MPI processes. The output memory will be
// allocated and written into `output`.
void RingAllreduce(float* data, size_t length, float** output);

// The ring allgather. The lengths of the data chunks passed to this function
// may differ across different devices. The output memory will be allocated and
// written into `output`.
void RingAllgather(float* data, size_t length, float** output);

The interface is simple and inflexible and is meant as a demonstration. The code is fairly straightforward and the same technique can be integrated into existing codebases in a variety of ways.

baidu-allreduce's People

Contributors

gibiansky avatar gdiamos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.