Git Product home page Git Product logo

nanotron's Introduction

⚡️ Nanotron

GitHub release License

The objective of this library is to provide easy distributed primitives in order to train a variety of models efficiently using 3D parallelism. For more information about the internal design of the library or 3D parallelism in general, please check out [docs.md] and [3d_parallelism.md].

Philosophy

  • Make it fast. At least as fast as other open source versions.
  • Make it minimal. We don't actually need to support all techniques and all versions of 3D parallelism. What matters is that we can efficiently use the "best" ones.
  • Make everything explicit instead of transparent. As we move forward, making things transparent works well when it works well but is a horrible debugging experience if one doesn't understand the implications of techniques used. In order to mitigate this, we choose to be explicit in the way it does things

Core Features

We support the following:

  • 3D parallelism, including one-forward-one-backward pipeline engine
  • ZeRO-1 optimizer
  • FP32 gradient accumulation
  • Parameter tying/sharding
  • Spectral µTransfer parametrization for scaling up neural networks

Installation

Requirements:

  • Python >= 3.10
  • PyTorch >= 2.0.0
  • Flash-Attention >= 2.5.0

To install (in a new env):

pip install torch
pip install packaging; pip install "flash-attn>=2.5.0"  --no-build-isolation
pip install nanotron

Also nice to have: pip install transformers datasets python-etcd tensorboardX

We also support a set of flavors that you can install using pip install -e [$FLAVOR]:

  • dev: Used is you are developping in nanotron. It installs in particular our linter mechanism. On top of that you have to run pre-commit install afterwards.
  • test: We use pytest in order to run out testing suite. In order to run tests in parallel, it will install pytest-xdist, which you can leverage by running pytest -n 12 tests (12 is the number of parallel test)

Quick examples

In the /examples directory, you can find a few example configuration file, and a script to run it.

You can run a sample training using:

torchrun --nproc_per_node=8 run_train.py --config-file examples/train_tiny_llama.sh

And run a sample generation using:

torchrun --nproc_per_node=8 run_generation.py --ckpt-path checkpoints/text/4

Development guidelines

If you plan on developing on nanotron, we suggest you install the dev flavor: pip install -e ".[dev]"

We use pre-commit to run a bunch of callbacks on each commit, mostly normalization code in order for the codebase to stay consistent. Please do run pre-commit install.

For the linting:

pre-commit install
pre-commit run --config .pre-commit-config.yaml --all-files

As a part of making sure we aren't slowed down as the codebase grows, we will not merge a PR if the features it introduces do not have test coverage.

We have extensions built on top of Nanotron, with their tests located in the /examples folder. Since VSCode defaults to discovering tests only in the /tests folder, please run tests from both /examples and /tests to ensure your PR does not break these extensions. Please run make tests to execute all the nanotron tests and the tests in the /examples directory that you need to pass.

Features we would like to add:

  • Support torch.compile
  • More optimized kernels
  • Support Zero3
  • Other PP schedules (such as Interleaved 1f1b...)
  • Ring attention / Sequence Parallelism
  • 3D Parallel MoEs
  • Supporting more architectures (Mamba..)
  • ...

Useful scripts

  • scripts/log_lighteval_to_wandb.py: logs the evaluation results of LightEval to wandb, including summary statistics.

Environment Variables

  • NANOTRON_BENCHMARK=1: if you want to log the throughput during training

Credits

We would like to thank everyone working on LLMs, especially those sharing their work openly from which we took great inspiration: Nvidia for Megatron-LM/apex, Microsoft for DeepSpeed, HazyResearch for flash-attn

nanotron's People

Contributors

xrsrke avatar nouamanetazi avatar 3outeille avatar thomwolf avatar 0xkerem avatar zzhhjjj avatar saforem2 avatar staghado avatar glegendre01 avatar andylolu2 avatar jordane95 avatar standardai avatar nopperl avatar tj-solergibert avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.