Git Product home page Git Product logo

gblm-pruner's Introduction

GBLM-Pruner: Gradient Based Language Model Pruner

Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models.

Rocktim Jyoti Das*, Mingjie Sun*, Liqun Ma, Zhiqiang Shen*

Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi.

Carnegie Mellon University, Pittsburgh.

Contents

Introduction

Gradient information has been overlooked by prior methods for neural model pruning. Even the original Optimal Brain Surgeon work ignored first-order term in the derivation of OBS framework for model pruning. It was done under the assumption that gradients at the minimum vanish and cease to offer any valuable information. In this work, we revisited and refined the OBS framework by incorporation consideration of first-order-term. Based on our analysis, we propose our gradient based pruning metric.

Install

The installation instructions are provided here.

GBLM-Pruned Weights

Please check out our Model Zoo for all public GBLM-Pruner compressed model checkpoints, and the instructions of how to use the weights.

Usage

Our method require computation of gradient magnitude for calculation of pruning metric. The gradient for a model can be computed as follows:

bash run_grad_compute.sh

Overview of the arguments in the bash file:

  • --model: The identifier or the path for the LLaMA model.
  • --llama_version: Version of Llama model using (for LLaMA-1 use 1 and for LLaMA-2 use 2)
  • --nsamples: No of calibration samples.

After computation of the model gradient, the pruned model can be obtained using the following command.

bash run_gblm_prune.sh

Overview of the arguments in the bash file:

  • --model: The identifier or the path for the LLaMA model.
  • --gradient_path: Path to the pre-computed gradient
  • --prune_method: Pruning method to be used.
  • --nsamples: No of calibration samples.
  • --seed: Random seed.
  • --sparsity_ratio: Percentage of the weights to be pruned.
  • --sparsity_type: Specify the sparsity type.
  • --save: Path to store results.

Zero-Shot Harness Evaluation

We use the EleutherAI LM Harness implementation for the zero-shot evaluation on Harness. We used the same instructions provided here for producing our results. We used the following command for reproducing our results.

python main.py \
    --model hf-causal-experimental \
    --model_args pretrained=/path/to/model \
    --tasks task_name \
    --device cuda:0 \
    --no_cache

Acknowledgement

This codebase is built upon SparseGPT and Wanda.

Issues

Please don't hesitate to contact us if you encounter any code-related issues or wish to discuss the paper. You can reach out to us via the GitHub issues or through email at [email protected].

License

This project is released under the MIT license. Please see the LICENSE file for more information.

Citation

If you found this work useful, please consider citing:

@misc{das2023size,
      title={Beyond Size: How Gradients Shape Pruning Decisions in Large Language Models}, 
      author={Rocktim Jyoti Das and Liqun Ma and Zhiqiang Shen},
      year={2023},
      eprint={2311.04902},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

gblm-pruner's People

Contributors

rocktimjyotidas avatar szq0214 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.