Git Product home page Git Product logo

harmony's Introduction

Harmony

This repository contains the source code implementation of the following papers:

This work was done as part of Microsoft Research's Project Fiddle. This source code is available under the MIT License.

Directory Structure

  • harmony: the Harmony source code, with detailed instructions, various example scripts, as well as previous results.

  • model_lib: the model libary containing model code that is not included in pytorch, such as the transformer library from huggingface.

  • util_lib: the customized utility libary.

Setup

To run Harmony, the easiest way is to use the standard nvidia's container (nvcr.io/nvidia/pytorch:20.03-py3) which satisfies most dependencies. It can be launched by:

./launch.sh

Once getting into the container, the remaining dependencies can be satisified by running:

./install.sh

Note:

  • Harmony was developed in the environment of Python 3.6.9, PyTorch 1.5.0a0, CUDA 10.1.243, cuDNN 7.6.3, NCCL 2.4.8, Nvidia driver 418, Ubuntu 18.04.3 LTS.

  • Harmony was developed with Nivida GPUs.

  • Harmony does not modfiy PyTorch library and may remain portable to different versions.

Dataset

  • GLUE (including MRPC): It can be downloaded by running this script and unpacked to a directorary /data/glue/MRPC.

  • WikiText-2 and WikiText-103: It can be downloaded from here and unpacked to a directorary /data/wikitext-2-tokens and /data/wikitext-103-tokens.

  • ImageNet: The ImageNet ILSVC 2012 can be downloaded by running this script and unpacked to a directory /data/imagenet/.

End-to-end Workflow

The end-to-end workflow of Harmony can be illustrated by the figure below:

drawing

For example, to run a BERT-Large with Harmony, we can go through following steps:

Decompose model into per-layer code

cd harmony/1_decomposer/bert_thomwolf && ./run_bert_large.sh

Profile each layer

cd ../../2_profiler/bert_thomwolf && ./run_bert_large.sh

Search the best schedule

cd ../../3_scheduler && ./run_bert_large.sh

Run the best schedule

cd ../4_runtime/bert_thomwolf && ./run_bert_large.sh

More examples can be found under harmony/1_decomposer, harmony/2_profiler, harmony/3_scheduler, and harmony/4_runtime.

Experiments

To conduct the experiments in the VLDB paper, the scripts are provided as below:

  • Figure 8

    cd harmony/4_runtime/bert_thomwolf && ./run_bert_large__fig8.sh
  • Figure 10

    cd harmony/4_runtime/bert_thomwolf && ./run_bert96__fig10.sh
    cd harmony/4_runtime/gpt2_huggingface && ./run_gpt2_xl__fig10_fig12.sh
    cd harmony/4_runtime/vgg_resnet_torch && ./run_vgg416__fig10.sh
    cd harmony/4_runtime/vgg_resnet_torch && ./run_resnet1026__fig10.sh
  • Figure 12

    cd harmony/4_runtime/gpt2_huggingface && ./run_gpt2_xl__fig10_fig12.sh
  • Figure 13

    cd harmony/4_runtime/bert_thomwolf && ./run_bert_large__fig13.sh
  • Figure 17 and Figure 18

    cd harmony/4_runtime/gpt2_huggingface && ./run_gpt2_billions__fig17_fig18.sh
  • Figure 21

    cd harmony/4_runtime/gpt2_huggingface && ./run_gpt2_medium__fig21.sh
  • Table 1

    cd harmony/3_scheduler && ./run_four_models__tab1.sh

Note

For experiments of Figure 17 and Figure 18, three prerequisits exist to run largest models saturating the CPU memory capacity. (Tested on Ubuntu 18.04.)

  • Raise the limitation of pinned memory

    Step 1: open /etc/security/limits.conf

    sudo vim /etc/security/limits.conf

    Step 2: make memlock unlimited

    #<domain>      <type>  <item>         <value>
    #
    
    #*               soft    core            0
    #root            hard    core            100000
    #*               hard    rss             10000
    #@student        hard    nproc           20
    #@faculty        soft    nproc           20
    #@faculty        hard    nproc           50
    #ftp             hard    nproc           0
    #ftp             -       chroot          /ftp
    #@student        -       maxlogins       4
    
    *              -       memlock         unlimited
    root           -       memlock         unlimited
    
    # End of file
    

    Step 3: verify

    ulimit -a
    
  • Max out shared memory

    Step 1: Open /etc/fstab

    sudo vim /etc/fstab 

    Step 2: Locate /dev/shm and use the tmpfs size option to specify max size

    # /etc/fstab: static file system information.
    #
    # Use 'blkid' to print the universally unique identifier for a
    # device; this may be used with UUID= as a more robust way to name devices
    # that works even if disks are added and removed. See fstab(5).
    #
    # <file system> <mount point>   <type>  <options>       <dump>  <pass>
    # / was on /dev/sda1 during installation
    UUID=4e3b7d44-77c9-4cc8-be72-fa2ff836ac2f /               ext4    errors=remount-ro 0       1
    /swapfile                                 none            swap    sw              0       0
    # resize /dev/shm
    tmpfs /dev/shm tmpfs defaults,size=750g 0 0
    

    Step 3: To make change effective immediately, remount the /dev/shm filesystem:

    mount -o remount /dev/shm

    Step 4: Verify

    df -h
  • Disable swapping to disk

    Step 1: Open sysctl.conf

    sudo vim /etc/sysctl.conf

    Step 2: Add this line vm.swappiness = 0

    ###################################################################
    # Protected links
    #
    # Protects against creating or following links under certain conditions
    # Debian kernels have both set to 1 (restricted) 
    # See https://www.kernel.org/doc/Documentation/sysctl/fs.txt
    #fs.protected_hardlinks=0
    #fs.protected_symlinks=0
    
    vm.swappiness = 0
    

    Step 3: Restart machine

    sudo reboot now
    

    After all experiments, restore swapping to disk

    # vm.swappiness = 0 # comment out
    
  • Setup Container

    Finally, we need to unlock the resource limitation of container by setting options in launch.sh as below. Assume that the machine has 750GB CPU memory and 8 GPUs.

    nvidia-docker run \
        ...
        --memory=750g \
        --memory-swap=750g \
        --memory-swappiness=0 \
        --memory-reservation=750g \
        --shm-size=750g \
        --ulimit memlock=750000000000:750000000000 \
        --gpus '"device=0,1,2,3,4,5,6,7"' \
        ...

Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

License

Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

Reference

If you find the code helpful, citing our papers would be appreciated : )

@article{VLDB22Harmony,
    title = {{Harmony: Overcoming the Hurdles of GPU Memory Capacity to Train Massive DNN Models on Commodity Servers}}, 
    author = {Youjie Li and Amar Phanishayee and Derek Murray and Jakub Tarnawski and Nam Sung Kim},
    journal = {The 48th International Conference on Very Large Databases (VLDB'22)},
    year = {2022},
    address = {Sydney, Australia},
    month = sep
}

@inproceedings{HotOS21Harmony,
    title = {{Doing More with Less: Training Large DNN Models on Commodity Servers for the Masses}},
    author = {Youjie Li and Amar Phanishayee and Derek Murray and Nam Sung Kim},
    booktitle = {Workshop on Hot Topics in Operating Systems (HotOS’21)},
    year = {2021},
    address = {Ann Arbor, MI, USA},
    month = jun
}

harmony's People

Contributors

leonardo0lyj avatar msr-fiddle avatar

Stargazers

Zijie Tian avatar  avatar HH&CC avatar Zihao Yu avatar  avatar XinYao avatar Byungsoo Oh avatar Daniel Zou avatar  avatar  avatar  avatar  avatar Xiaonan Nie avatar Shengyu Fan avatar

Watchers

 avatar  avatar

harmony's Issues

How to use harmony

Thanks for the great work! Before navigating through the code base, I'm wondering if harmony provides an easy way to play with, for example:

import harmony

model = GPT2()
optimized_model = harmony.optimize(model) # run the profile, schedule, and generate the new DAG

# The original training loop on optmized_model

It will be beneficial to have the API to play with.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.