Git Product home page Git Product logo

vineeths96 / gradient-compression Goto Github PK

View Code? Open in Web Editor NEW
8.0 2.0 3.0 33.28 MB

We present a set of all-reduce compatible gradient compression algorithms which significantly reduce the communication overhead while maintaining the performance of vanilla SGD. We empirically evaluate the performance of the compression methods by training deep neural networks on the CIFAR10 dataset.

License: MIT License

Python 96.14% C++ 3.86%
distributed-optimization large-scale machine-learning deep-learning gradient-compression federated-learning pytorch

gradient-compression's Introduction

Language Contributors Forks Stargazers Issues MIT License LinkedIn


Quantization for Distributed Optimization

Explore the repository»
View Paper

tags : distributed optimization, large-scale machine learning, gradient compression, edge learning, federated learning, deep learning, pytorch

About The Project

Massive amounts of data have led to the training of large-scale machine learning models on a single worker inefficient. Distributed machine learning methods such as Parallel-SGD have received significant interest as a solution to tackle this problem. However, the performance of distributed systems does not scale linearly with the number of workers due to the high network communication cost for synchronizing gradients and parameters. Researchers have proposed techniques such as quantization and sparsification to alleviate this problem by compressing the gradients. Most of the compression schemes result in compressed gradients that cannot be directly aggregated with efficient protocols such as all-reduce. In this paper, we present a set of all-reduce compatible gradient compression algorithms - QSGDMaxNorm Quantization, QSGDMaxNormMultiScale Quantization, and its sparsified variants - which significantly reduce the communication overhead while maintaining the performance of vanilla SGD. We establish upper bounds on the variance introduced by the quantization schemes and prove its convergence for smooth convex functions. The proposed compression schemes can trade off between the communication costs and the rate of convergence. We empirically evaluate the performance of the compression methods by training deep neural networks on the CIFAR10 dataset. We examine the performance of training ResNet50 (computation-intensive) model and VGG16 (communication-intensive) model with and without the compression methods. We also compare the scalability of these methods with the increase in the number of workers. Our compression methods perform better than the in-built methods currently offered by the deep learning frameworks.

Built With

This project was built with

  • python v3.7.6
  • PyTorch v1.7.1
  • The environment used for developing this project is available at environment.yml.

Getting Started

Clone the repository into a local machine using,

git clone https://github.com/vineeths96/Gradient-Compression
cd Gradient-Compression/

Prerequisites

Create a new conda environment and install all the libraries by running the following command

conda env create -f environment.yml

The dataset used in this project (CIFAR 10) will be automatically downloaded and setup in data directory during execution.

Instructions to run

The training of the models can be performed on a distributed cluster with multiple machines and multiple worker GPUs. We make use of torch.distributed.launch to launch the distributed training. More information is available here.

To launch distributed training on a single machine with multiple workers (GPUs),

python -m torch.distributed.launch --nproc_per_node=<num_gpus> trainer.py --local_world_size=<num_gpus> 

To launch distributed training on multiple machine with multiple workers (GPUs),

export NCCL_SOCKET_IFNAME=ens3

python -m torch.distributed.launch --nproc_per_node=<num_gpus> --nnodes=<num_machines> --node_rank=<node_rank> --master_addr=<master_address> --master_port=<master_port> trainer.py --local_world_size=<num_gpus>

Model overview

We conducted experiments on ResNet50 architecture and VGG16 architecture. Refer the original papers for more information about the models. We use publicly available implementations from GitHub for reproducing the models.

Results

We highly recommend to read through the paper before proceeding to this section. The paper explains the different compression schemes we propose and contains many more analysis & results than what is presented here.

We begin with an explanation of the notations used for the plot legends in this section. AllReduce-SGD corresponds to the default gradient aggregation provided by PyTorch. QSGD-MN and GRandK-MN corresponds to QSGDMaxNorm Quantization and GlobalRandKMaxNorm Compression respectively. The precision or number of bits used for the representation follows it. QSGD-MN-TS and GRandK-MN-TS corresponds to QSGDMaxNormMultiScale Quantization and GlobalRandKMaxNormMultiScale Compression respectively, with two scales (TS) of compression. The precision or number of bits used for the representation of the two scales follows it. For the sparsified schemes, we choose the value of K as 10000 for all the experiments. We compare our methods with a recent all-reduce compatible gradient compression scheme PowerSGD for Rank-1 compression and Rank-2 compression.

ResNet50 VGG16
LossLoss Curve LossLoss Curve
Top1Accuracy Curve Top1Accuracy Curve
ScalingScalability with number of GPUs ScalingScalability with number of GPUs

License

Distributed under the MIT License. See LICENSE for more information.

Contact

Vineeth S - [email protected]

Project Link: https://github.com/vineeths96/Gradient-Compression

gradient-compression's People

Contributors

vineeths96 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.