Light

calculon-ai / calculon Goto Github PK

View Code? Open in Web Editor NEW

77.0 5.0 20.0 298 KB

License: Apache License 2.0

Makefile 0.30% Python 98.75% Shell 0.95%

calculon's Introduction

Calculon - Co-design for large scale parallel applications

Running

Run Calculon like this:

$> PYTHONPATH=. ./bin/ <args>

Calculon is a hierarchical command line. To see the commands it accepts, use --help or -h:

$> PYTHONPATH=. ./bin/ -h

You can also see how to use any command specifically by using --help or -h on the command:

$> PYTHONPATH=. ./bin/ llm -h

LLM Example

Run a single calculation for LLM (~1 sec):

$> PYTHONPATH=. ./bin/ llm models/megatron-1T.json examples/3072_t4_p64_d12_mbs4_full.json systems/a100_80g.json -

Run a system execution optimizer for LLM (~1 min):

$> PYTHONPATH=. ./bin/ llm-optimal-execution models/turing-530B.json 5128 2520 float16 systems/a100_80g.json output.json -m

opt_exe.json will contain the optimal way to run Turing-530B across 5128 A100 GPUs.

To store results from all successful runs from the same experiment, run a special system optimizer (~1 min):

$> PYTHONPATH=. ./bin/ llm-all-executions models/turing-530B.json 5128 2520 float16 systems/a100_80g.json all_output.csv

Testing and validation (optional)

To make sure that the current build is working, use

$> make test

To validate Calculon performance modeling against Megatron run on NVIDIA's Selene A100-based supercomputer with results published in "Sequence parallelism" paper, use

$> PYTHONPATH=. ./bin/calculon llm-validation

Publications

Calculon: A Methodology and Tool for High-Level Co-Design of Systems and Large Language Models
Mikhail Isaev, Nic McDonald, Larry Dennison, Richard Vuduc
Paper
Scaling Infrastructure to Support Multi-Trillion Parameter LLM Training
Mikhail Isaev, Nic McDonald, Richard Vuduc
Paper

calculon's People

Contributors

Stargazers

Watchers

Forkers

weimingzha0 lisa-summer woshiyyya machinelearningsystem superleo puchupala zhengyinliang mit10000 fudanemwlab atsyplikhin c-tc zineos carmark quetric xinyao1994 brianwuuu binbinmeng zhuango

calculon's Issues

DP should be checked to be <= max global batch size

LAE:
https://github.com/calculon-ai/calculon/blob/76256af5bf888a8ed4dfda4bc9741c0f59175cb8/calculon/llm/all_executions.py#L94C9-L94C20

LOE:
https://github.com/calculon-ai/calculon/blob/76256af5bf888a8ed4dfda4bc9741c0f59175cb8/calculon/llm/optimal_execution.py#L84C10-L84C10

Ask for help -- how to modify printing parameters

Modify def display_stats(self): Modify the parameters in the log as comment f"Sample rate: {self.get_sample_rate():.2f};\n". But still no change in printing in centos linux. I want to ask you why

Questions about the communication op factor and offset in the provided examples

for example: https://github.com/calculon-ai/calculon/blob/main/systems/a100_80g.json,

{
 "networks": [
    {
      "bandwidth": 300,
      "efficiency": 0.65,
      "size": 8,
      "latency": 0.00001,
      "ops": {
        "p2p": [1.0, null],
        "reduce_scatter": [1.5, -1],
        "all_gather": [1.5, -1],
        "all_reduce": [2.0, -1]
      },
      "must_be_filled": true,
      "processor_usage": 0.15
    },{
      "bandwidth": 25,
      "efficiency": 0.9,
      "size": 65536,
      "latency": 0.00002,
      "ops": {
        "p2p": [1.0, null],
        "reduce_scatter": [1.0, 0],
        "all_gather": [1.0, 0],
        "all_reduce": [1.0, 0]
      },
      "must_be_filled": false,
      "processor_usage": 0.02
    }
}

I am confused. I understand that the scalar values for 'all_reduce,' 'all_gather,' and 'reduce_scatter' in network 0/1 should be (2, 1, 1), offsets should be -1. Why are there other values present?

Recomputation in block shouldn't be accumulated twice

In line 1220-1224 of llm.py, the recomputation flops and memory overhead are accumulated with accumulated forward value (line 1197-1200). This implies that recomputation of every layer would incur a recomputation from the start of the block. Is it a mistake or you really assume the block recomputation works that way? Besides, the computation methods of the first four lines (1220-1223) do not match with the line 1224.

what is the meaning of activation gradient in the code

could you please support some material about the conception and explanation of “activation gradient”

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.