Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Add support for OffloadModel to enable training large models on 1 GPU. #432

Merged
merged 57 commits into from
Feb 26, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
f166609
clean start
blefaudeux Dec 30, 2020
fc1310a
Merge remote-tracking branch 'upstream/master' into offload_experimental
blefaudeux Jan 4, 2021
26cfd92
removing per layer split strategy, probably not that useful indeed
blefaudeux Jan 5, 2021
0363630
Merge remote-tracking branch 'upstream/master' into offload_experimental
blefaudeux Jan 5, 2021
ff44ddd
initial transformer benchmark
blefaudeux Jan 7, 2021
f20e3f8
Merge remote-tracking branch 'upstream/master' into offload_experimental
blefaudeux Jan 11, 2021
3bcea0a
hack, enable testing ViT + offload, python3 benchmarks/oss.py --epoc…
blefaudeux Jan 12, 2021
62c15e4
proper cuda streams and device, something off in terms of mems consum…
blefaudeux Jan 13, 2021
43c56cd
minor, stashing
blefaudeux Jan 13, 2021
042daa3
Merge branch 'master' into offload_experimental
blefaudeux Jan 22, 2021
850c5bf
unit test fix
blefaudeux Jan 22, 2021
52e0be4
Merge branch 'master' into offload_experimental
blefaudeux Feb 1, 2021
1f6c018
removing all the distributed parts
blefaudeux Feb 4, 2021
8490dd8
simpler test, needs debugging
blefaudeux Feb 5, 2021
9ec3892
working OOP, running a model which does not fit on the gpu memory
blefaudeux Feb 5, 2021
d4e929d
Merge branch 'master' into offload_experimental
blefaudeux Feb 5, 2021
8e92a4c
spring cleaning
blefaudeux Feb 5, 2021
6bfeaed
removing the ill-advised optimizer bits, better keep that orthogonal
blefaudeux Feb 5, 2021
e1c0a7a
[offload] Add support for activation offloading + other changes (#367)
anj-s Feb 12, 2021
c2ac144
[offload] Add support for fp16 training (#374)
anj-s Feb 12, 2021
bff0cdb
[offload] Add support for activation checkpointing for all layers. (#…
anj-s Feb 12, 2021
0ca26a2
add support for microbatches
Feb 17, 2021
0b70ffa
revert benchmark config changes
Feb 17, 2021
9cdea8b
add parametrization
Feb 19, 2021
38c541d
fix lint errors and tests
Feb 19, 2021
8e32380
skip test for 1.5
Feb 19, 2021
0d7201f
fix lint errors
Feb 19, 2021
cbe8acc
skip test if there are no GPUs
Feb 19, 2021
2dff98e
fix lint errors
Feb 19, 2021
239713d
fix lint errors
Feb 19, 2021
7867f4f
move experimental to the fairscale repo
Feb 22, 2021
256d4b4
lint error fixes
Feb 22, 2021
7bc20fc
modify test imports
Feb 22, 2021
9ad7c12
lint error fixes
Feb 22, 2021
78f4906
Merge branch 'move_experimental_to_fairscale' into offload_experimental
Feb 22, 2021
822e38e
move offload files to the experimental directory
Feb 22, 2021
c9a02be
move tests and benchmarks to their forlder
Feb 22, 2021
595399b
fix mypy errors
Feb 22, 2021
60cecaa
cp intermediate working benchmarks
Feb 23, 2021
cbfdb27
more changes
Feb 23, 2021
8870d17
Merge branch 'master' into offload_experimental
Feb 23, 2021
cea1426
Merge branch 'offload_experimental' into seq_benchmark
Feb 23, 2021
4306696
split benchmark configs
Feb 23, 2021
ce575df
Merge branch 'split-benchmark-configs' into seq_benchmark
Feb 23, 2021
697887c
remove print statements
Feb 23, 2021
e7336e9
fix lint errors
Feb 23, 2021
e73d04b
remove unused print
Feb 23, 2021
65f2f92
stress testing
Feb 24, 2021
2d0d7f5
remove unused file
Feb 24, 2021
b8c493f
change param nae
Feb 24, 2021
f02b6e2
fix merge conflicts
Feb 24, 2021
3867297
lint fixes
Feb 24, 2021
1eb8082
Merge branch 'master' into offload_experimental
Feb 25, 2021
8e56a5b
move file to the right folder
Feb 25, 2021
59199b9
offload_experimental
Feb 25, 2021
ef50486
add doc string
Feb 25, 2021
13bb24c
add error message
Feb 25, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion benchmarks/datasets/wikitext2_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ def data_process(raw_text_iter):
test_dataset = data_process(iter(io.open(test_filepath, encoding="utf8")))

def batchify(data):
batch_size = args.batch_size
batch_size = benchmark_config["batch_size"]
return _batchify(data, batch_size)

total_batch_size = _get_total_batch_size(benchmark_config, model_specs)
Expand Down
Loading