New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Add Benchmarking Compatibility to PaddingFree Plugin #66

Merged

fabianlim merged 6 commits into foundation-model-stack:main from achew010:orca-math-bench

Aug 19, 2024

Contributor

achew010 commented Aug 14, 2024 •

edited by fabianlim

Loading

Description

This PR provides support to run the benchmark for the AADP plugin

Key items:

data_processing stanza in scenarios.yaml to configure the benchmark dataset and define formatting using Jinja template
modifications to benchmark.py
additional AADP sample configurations

Benchmarks (Mistral 7B)

Benchmarks on Flan shows the improvement from PF is consistent with Transformers

46% improvement in training runtime with PaddingFree for full FT
20% improvement in training runtime with PaddingFree for QPEFT

NOTE: there are some regressions that affected the below results #70

Subset of Flan

run bash scripts/run_benchmarks.sh "1 2" "benchmark_outputs/flan" "scenarios-flan.yaml "none""

Framework Type	Num Device	Device Batch Size	Train Runtime (sec)	Throughput (toks/sec)	% Runtime Improvement
Full FT	2	4	1430	992	base
Full FT + PF	2	4	774	1875	+46
BNB	2	4	1612	880	base
BNB + PF	2	4	1271	1142	+21
BNB + FOAK	2	4	1068	1328	base
BNB + FOAK + PF	2	4	605	2400	+43
GPTQ	2	4	1569	904	base
GPTQ + PF	2	4	1255	1156	+20
GPTQ + FOAK	2	4	1034	1372	base
GPTQ + FOAK + PF	2	4	587	2472	+43

Subset of Orca-Math

run bash scripts/run_benchmarks.sh "1 2" "benchmark_outputs/orca" "scenarios-orca.yaml "none""
Benchmarks on Orca-Math show improvement from PaddingFree matches Transformers

23% improvement in training runtime with PaddingFree on full FT
PaddingFree however doesn't show improvement in train runtime with QPEFT but there is significant improvement when FOAK is applied, the issue is documented in Inconsistency in Padding-Free Benchmarks with Different Transformers Versions #70

Single Device

Framework Type	Num Device	Device Batch Size	Train Runtime (sec)	Throughput (toks/sec)	% Runtime Improvement
Full FT	1	4	362	1520	base
Full FT + PF	1	4	291	1874	+19

Two Device

Framework Type	Num Device	Device Batch Size	Train Runtime (sec)	Throughput (toks/sec)	% Runtime Improvement
Full FT	2	4	231	1188	base
Full FT + PF	2	4	177	1584	+23
BNB	2	4	392	709	base
BNB + PF	2	4	398	704	+0
BNB + FOAK	2	4	190	1468	base
BNB + FOAK + PF	2	4	155	1802	+18
GPTQ	2	4	388	720	base
GPTQ + PF	2	4	386	725	+0
GPTQ + FOAK	2	4	186	1359	base
GPTQ + FOAK + PF	2	4	158	1771	+15

fabianlim reviewed

View reviewed changes

plugins/attention-and-distributed-packing/src/fms_acceleration_aadp/flash_attn.py Show resolved Hide resolved

fabianlim reviewed

View reviewed changes

scripts/benchmarks/benchmark.py Outdated Show resolved Hide resolved

fabianlim reviewed

View reviewed changes

scripts/benchmarks/benchmark.py Outdated Show resolved Hide resolved

fabianlim reviewed

View reviewed changes

scripts/benchmarks/benchmark.py Outdated Show resolved Hide resolved

fabianlim reviewed

View reviewed changes

scripts/run_benchmarks.sh Outdated Show resolved Hide resolved

fabianlim requested changes

View reviewed changes

Contributor

fabianlim left a comment

have questions and concerns

tox.ini Outdated Show resolved Hide resolved

fabianlim reviewed

View reviewed changes

scripts/run_benchmarks.sh Outdated Show resolved Hide resolved

fabianlim reviewed

View reviewed changes

scripts/benchmarks/benchmark.py Outdated Show resolved Hide resolved

fabianlim mentioned this pull request

Add Acceleration Patcher and MultiPack Plugin #67

Merged

achew010 marked this pull request as ready for review

August 15, 2024 09:14

fabianlim reviewed

View reviewed changes

scripts/benchmarks/benchmark.py Outdated Show resolved Hide resolved

Contributor

fabianlim commented Aug 15, 2024

Please fix the DCO

fabianlim reviewed

View reviewed changes

scripts/benchmarks/scenarios-pretok.yaml Outdated Show resolved Hide resolved

fabianlim reviewed

View reviewed changes

scripts/run_benchmarks.sh Outdated Show resolved Hide resolved

fabianlim reviewed

View reviewed changes

scripts/benchmarks/scenarios-pretok.yaml Outdated Show resolved Hide resolved

fabianlim reviewed

View reviewed changes

scripts/benchmarks/scenarios-pretok.yaml Outdated Show resolved Hide resolved

Contributor

fabianlim commented Aug 16, 2024

@achew010 if we figure out what is the issue and if it requires some significant change in the design of the accelerated-peft plugin, this needs to be documented

achew010 force-pushed the orca-math-bench branch from 0fe0867 to 97529d2 Compare

August 19, 2024 01:47

achew010 and others added 4 commits

August 19, 2024 01:50


          add benchmarking on orca-math

1c85802

Signed-off-by: 1000850000 user <aaron.chew1@ibm.com>


          modifications to address PR changes

08fb97d

Signed-off-by: 1000850000 user <aaron.chew1@ibm.com>


          additional fixes to scenarios template

4d36f0c

Signed-off-by: 1000850000 user <aaron.chew1@ibm.com>


          Apply suggestions from code review

81f52ca

Co-authored-by: Yu Chin Fabian Lim <fabianlim@users.noreply.github.com>
Signed-off-by: 1000850000 user <aaron.chew1@ibm.com>

achew010 force-pushed the orca-math-bench branch from 02a11f9 to 81f52ca Compare

August 19, 2024 01:50

achew010 added 2 commits

August 19, 2024 03:33


          renamed scenarios template to specify dataset

4921aff

Signed-off-by: 1000850000 user <aaron.chew1@ibm.com>


          added orca benchmarks as ref

5d108a1

Signed-off-by: 1000850000 user <aaron.chew1@ibm.com>

fabianlim merged commit 48426a1 into foundation-model-stack:main

6 checks passed

fabianlim mentioned this pull request

feat: Add DataClass Arguments to Activate Padding-Free and MultiPack Plugin and FastKernels foundation-model-stack/fms-hf-tuning#280

Merged

2 tasks

achew010 mentioned this pull request

Allow Kernels for Full FT and Non-Quantized PEFT #79

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet