PipeFisher

The implementation of pipeline-parallel training with K-FAC optimizer (PipeFisher) in PyTorch used in PipeFisher: Efficient Training of Large Language Models Using Pipelining and Fisher Information Matrices (to appear at MLSys 2023).

Setup

Data preparation

https://github.com/microsoft/AzureML-BERT/blob/master/docs/dataprep.md

Please store wikipedia.segmented.nltk.txt file under the bert_data/ directory.

Installation

pip install -r requirements.txt
pip install asdfghjkl/

For training, we use apex.optimizers.FusedLAMB of NVIDIA's Apex library. Please follow the instruction for installing apex.

For profiling, we use NVIDIA Nsight Systems. Please make sure you can execute nsys command.

Our scripts are intended to run through the SLURM workload manager on a GPU cluster with 1 GPU per node.

Training

Phase 1 pretraining of BERT-Base on the English Wikipedia by NVLAMB on 32 GPUs

sbatch scripts/train.sh

Phase 1 pretraining of BERT-Base on the English Wikipedia by K-FAC on 32 GPUs

sbatch scripts/train_kfac.sh

Profiling

Step 0. Profiling Chimera with 8 stages for BERT-Large on 8 GPUs

sbatch scripts/prof_steps.sh

sh scripts/plot_cuda_timeline.sh

output: bert_prof/bert-large_chimera_8stages_8gpus_microbs32_acc1.pdf

Step 1. Profiling Chimera with K-FAC with 8 stages for BERT-Large on 8 GPUs

sbatch scripts/prof_kfac_steps.sh

sh scripts/plot_cuda_timeline_kfac.sh

output: bert_prof/bert-large_chimera_8stages_8gpus_microbs32_acc1_kfac.pdf

Step 2. Automatic work assignments

sh scripts/auto_schedule.sh

output: bert-large_chimera_8stages_8gpus_microbs32_acc1_kfac_schedule.pickle

Step 3. Profiling Chimera with PipeFisher with 8 stages for BERT-Large on 8 GPUs

sbatch scripts/prof_pipefisher_steps.sh

sh scripts/plot_cuda_timeline_pipefisher.sh

output: bert_prof/bert-large_chimera_8stages_8gpus_microbs32_acc1_pipefisher.pdf

By changing the settings of each script, you can run training/profiling on other BERT models, pipeline methods, number of pipeline stages, number of GPUs, etc.

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
asdfghjkl		asdfghjkl
bert_data		bert_data
configs		configs
prof		prof
scripts		scripts
README.md		README.md
auto_schedule.py		auto_schedule.py
bert_dataset.py		bert_dataset.py
bert_model.py		bert_model.py
bert_optim.py		bert_optim.py
main_bert.py		main_bert.py
main_bert_simple.py		main_bert_simple.py
pipeline.py		pipeline.py
requirements.txt		requirements.txt
threadsafe_counter.py		threadsafe_counter.py
threadsafe_queue.py		threadsafe_queue.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PipeFisher

Setup

Data preparation

Installation

Training

Profiling

Step 0. Profiling Chimera with 8 stages for BERT-Large on 8 GPUs

Step 1. Profiling Chimera with K-FAC with 8 stages for BERT-Large on 8 GPUs

Step 2. Automatic work assignments

Step 3. Profiling Chimera with PipeFisher with 8 stages for BERT-Large on 8 GPUs

About

Releases

Packages

Languages

MachineLearningSystem/23MLSYS-pipe-fisher

Folders and files

Latest commit

History

Repository files navigation

PipeFisher

Setup

Data preparation

Installation

Training

Profiling

Step 0. Profiling Chimera with 8 stages for BERT-Large on 8 GPUs

Step 1. Profiling Chimera with K-FAC with 8 stages for BERT-Large on 8 GPUs

Step 2. Automatic work assignments

Step 3. Profiling Chimera with PipeFisher with 8 stages for BERT-Large on 8 GPUs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages