Git Product home page Git Product logo

calculon's Introduction

DOI

Calculon - Co-design for large scale parallel applications

Running

Run Calculon like this:

$> PYTHONPATH=. ./bin/ <args>

Calculon is a hierarchical command line. To see the commands it accepts, use --help or -h:

$> PYTHONPATH=. ./bin/ -h

You can also see how to use any command specifically by using --help or -h on the command:

$> PYTHONPATH=. ./bin/ llm -h

LLM Example

Run a single calculation for LLM (~1 sec):

$> PYTHONPATH=. ./bin/ llm models/megatron-1T.json examples/3072_t4_p64_d12_mbs4_full.json systems/a100_80g.json -

Run a system execution optimizer for LLM (~1 min):

$> PYTHONPATH=. ./bin/ llm-optimal-execution models/turing-530B.json 5128 2520 float16 systems/a100_80g.json output.json -m

opt_exe.json will contain the optimal way to run Turing-530B across 5128 A100 GPUs.

To store results from all successful runs from the same experiment, run a special system optimizer (~1 min):

$> PYTHONPATH=. ./bin/ llm-all-executions models/turing-530B.json 5128 2520 float16 systems/a100_80g.json all_output.csv

Testing and validation (optional)

To make sure that the current build is working, use

$> make test

To validate Calculon performance modeling against Megatron run on NVIDIA's Selene A100-based supercomputer with results published in "Sequence parallelism" paper, use

$> PYTHONPATH=. ./bin/calculon llm-validation

Publications

  • Calculon: A Methodology and Tool for High-Level Co-Design of Systems and Large Language Models
    Mikhail Isaev, Nic McDonald, Larry Dennison, Richard Vuduc
    Paper

  • Scaling Infrastructure to Support Multi-Trillion Parameter LLM Training
    Mikhail Isaev, Nic McDonald, Richard Vuduc
    Paper

calculon's People

Contributors

nicmcd avatar michael-isaev avatar

Stargazers

orbits avatar  avatar Andrew Li avatar Sairam Sri Vatsavai avatar  avatar  avatar Sheng Qin avatar Lingda Li avatar  avatar  avatar demenlee avatar  avatar  avatar Yan Wang avatar Gao Zhutianya avatar Devebstyu avatar Liang Tang avatar leekeifon avatar Julia Gusak avatar Dapeng Du avatar Marco Kurzynski avatar XinYao avatar  avatar  avatar  avatar neos avatar Joo Donghyeon avatar Exynos-boom avatar Weiyang Wang avatar Abhimanyu Bambhaniya avatar Taekyung Heo avatar Huan Xu avatar Robert avatar Takuya Wakisaka avatar  avatar  avatar Hussein Abdulreda avatar Seungsu Baek avatar  avatar Weiqiang Tang avatar Xiao Lei avatar ..o.n.o.. avatar  avatar Balazs Gerofi avatar aySun avatar Wenxuan Pan avatar Han Huang avatar Zijie Yan avatar  avatar Weihua He avatar  avatar Alex Eisenschmied avatar Hayden Estes avatar Miguel Vazquez avatar Mie~~~ avatar Dali Kilani avatar  avatar hua0x522 avatar xiangze avatar Lambda Shi  avatar Max Hawkins avatar huyg avatar  avatar  avatar ZiMeng Huang avatar Brandon Ye avatar DouJS avatar shenggan avatar Nie Hao avatar  avatar Jinnian Zhang avatar xuqiang avatar Yonghan avatar  avatar  avatar Haocheng Liu avatar Zhibo Yan avatar

Watchers

 avatar Haocheng Liu avatar Bruno Monnet avatar  avatar  avatar

calculon's Issues

Ask for help -- how to modify printing parameters

Modify def display_stats(self): Modify the parameters in the log as comment f"Sample rate: {self.get_sample_rate():.2f};\n". But still no change in printing in centos linux. I want to ask you why

Questions about the communication op factor and offset in the provided examples

for example: https://github.com/calculon-ai/calculon/blob/main/systems/a100_80g.json,

{
 "networks": [
    {
      "bandwidth": 300,
      "efficiency": 0.65,
      "size": 8,
      "latency": 0.00001,
      "ops": {
        "p2p": [1.0, null],
        "reduce_scatter": [1.5, -1],
        "all_gather": [1.5, -1],
        "all_reduce": [2.0, -1]
      },
      "must_be_filled": true,
      "processor_usage": 0.15
    },{
      "bandwidth": 25,
      "efficiency": 0.9,
      "size": 65536,
      "latency": 0.00002,
      "ops": {
        "p2p": [1.0, null],
        "reduce_scatter": [1.0, 0],
        "all_gather": [1.0, 0],
        "all_reduce": [1.0, 0]
      },
      "must_be_filled": false,
      "processor_usage": 0.02
    }
}

I am confused. I understand that the scalar values for 'all_reduce,' 'all_gather,' and 'reduce_scatter' in network 0/1 should be (2, 1, 1), offsets should be -1. Why are there other values present?

Recomputation in block shouldn't be accumulated twice

In line 1220-1224 of llm.py, the recomputation flops and memory overhead are accumulated with accumulated forward value (line 1197-1200). This implies that recomputation of every layer would incur a recomputation from the start of the block. Is it a mistake or you really assume the block recomputation works that way? Besides, the computation methods of the first four lines (1220-1223) do not match with the line 1224.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.