Git Product home page Git Product logo

flexflow's Introduction

FlexFlow

A distributed deep learning framework that supports flexible parallelization strategies.

Prerequisties

After you have cloned FlexFlow, use the following command lines to clone Legion and GASNet.

git submodule init
git submodule update

Compilation

  • Download FlexFlow source code:
# Using git to download FlexFlow
git clone --recursive https://gitlab.com/fflow/flexflow
  • Build a DNN model (e.g., alexnet):
./ffcompile.sh examples/alexnet

where examples/alexnet.cc defines all operators in a DNN.

  • To build a distributed version of FlexFlow, add a -d flag:
./ffcompile.sh -d alexnet

Parallelization Strategy

Flexflow accepts any parallelization strategy in layer-wise parallelism (see ) to parallelize training. A parallelization strategy should describe how to parallelize each operator in a DNN. An example parallelization strategy for AlexNet is as follows.

Layer Type Configuration Devices
conv1 conv2d n=4 c=1 h=1 w=1 0 1 2 3
pool1 pool2d n=4 c=1 h=1 w=1 0 1 2 3
conv2 conv2d n=1 c=1 h=2 w=2 0 2 1 3
pool2 pool2d n=1 c=1 h=2 w=2 0 2 1 3
flat1 flat n=2 c=1 0 2
linear1 linear n=1 c=3 0 2 3
linear2 linear n=1 c=3 0 1 2
linear3 linear n=1 c=1 0
Some example parallelization strategies are available in the strategies subfolder.

Training a DNN model

To train a DNN model, run the complied application with the path to the training dataset, the path to the parallelization strategy, and some additional configuration flags. For example:

./alexnet -e 10 -b 256 --lr 0.1 --wd 1e-4 -p 10 -d path_to_dataset -s path_to_strategy -ll:gpu 4 -ll:fsize 90000 -ll:zsize 5000 -ll:cpu 4
  • -e or --epochs: number of total epochs to run (default: 90)
  • -b or --batch-size: global batch size in each iteration (default: 64)
  • --lr or --learning-rate: initial learning rate (default: 0.1)
  • --wd or --weight-decay: weight decay (default: 1e-4)
  • -p or --print-freq: print frequency (default 10)
  • -d or --dataset: path to the training dataset. If not set, synthetic data is used to conduct training.
  • -s or --strategy: path to the strategy to parallelize training. If not set, data parallelism is used as the default strategy.
  • -ll:gpu: number of GPU processors to use on each node
  • -ll:fsize: size of device memory on each GPU (in MB)
  • -ll:zsize: size of zero-copy memory (pinned DRAM with direct GPU access) on each node (in MB). This is used for prefecthing training images from disk.
  • -ll:cpu: number of data loading workers (default: 4)

Publication

Zhihao Jia, Sina Lin, Charles R. Qi, and Alex Aiken. Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks. In Proceedings of the International Conference on Machine Learning (ICML), Stockholm, Sweden, July 2018.

flexflow's People

Contributors

jiazhihao avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.