Skip to content

Latest commit

 

History

History
59 lines (45 loc) · 1.99 KB

README.md

File metadata and controls

59 lines (45 loc) · 1.99 KB

DOI

Calculon - Co-design for large scale parallel applications

Running

Run Calculon like this:

$> PYTHONPATH=. ./bin/calculon <args>

Calculon is a hierarchical command line. To see the commands it accepts, use --help or -h:

$> PYTHONPATH=. ./bin/calculon -h

You can also see how to use any command specifically by using --help or -h on the command:

$> PYTHONPATH=. ./bin/calculon llm -h

LLM Example

Run a single calculation for LLM (~1 sec):

$> PYTHONPATH=. ./bin/calculon llm models/megatron-1T.json examples/3072_t4_p64_d12_mbs4_full.json systems/a100_80g.json -

Run a system execution optimizer for LLM (~1 min):

$> PYTHONPATH=. ./bin/calculon llm-optimal-execution models/turing-530B.json 5128 2520 float16 systems/a100_80g.json output.json -m

opt_exe.json will contain the optimal way to run Turing-530B across 5128 A100 GPUs.

To store results from all successful runs from the same experiment, run a special system optimizer (~1 min):

$> PYTHONPATH=. ./bin/calculon llm-all-executions models/turing-530B.json 5128 2520 float16 systems/a100_80g.json all_output.csv

Testing and validation (optional)

To make sure that the current build is working, use

$> make test

To validate Calculon performance modeling against Megatron run on NVIDIA's Selene A100-based supercomputer with results published in "Sequence parallelism" paper, use

$> PYTHONPATH=. ./bin/calculon llm-validation

Publications

  • Calculon: A Methodology and Tool for High-Level Co-Design of Systems and Large Language Models
    Mikhail Isaev, Nic McDonald, Larry Dennison, Richard Vuduc
    Paper

  • Scaling Infrastructure to Support Multi-Trillion Parameter LLM Training
    Mikhail Isaev, Nic McDonald, Richard Vuduc
    Paper