[TPU] XLA changes for finetuning #110

gkroiz · 2023-06-05T16:52:01Z

This PR adds changes specific for finetuning on TPUs.
Used TPU v4-8 with

log_interval = 1
devices = 4
batch_size = 64 / devices
micro_batch_size = 4
gradient_accumulation_steps = 4
num_epochs = 5

for stablelm-base-alpha-3b; resulting finetuning time ~= 4500s.

gkroiz · 2023-06-05T16:52:42Z

Would it be simpler to have a separate file for tpu finetuning (for example, adapter_tpu.py)?

gkroiz · 2023-06-05T16:52:59Z

cc @Liyang90 for review

gkroiz · 2023-06-05T16:55:10Z

TODO: if we decide to keep these changes in adapter.py, we also need to make the same adjustments in adapter_v2.py

carmocca

Some early questions!

finetune/adapter.py

gkroiz · 2023-06-05T20:29:36Z

When using TPUs, the current code in adapter.py won't work when using more than 8 cores (4 chips).

For example, let's say we want to run finetune on v4-64 (64 cores, 32 chips). We would need two device counts, one for local devices within each of the 8 workers and one for the total device count. The reasoning here is that the XLA fabric strategy needs to define devices=4 (local device count) but for batch_size, max_iters, and warmup_iters, we would want to define based on the total device count, which would be 32. For my local tests, I made this distinction by defining local_devices and total_devices, but I'm not sure if this leads to unnecessary confusion when using non-xla strategies. What are your thoughts @carmocca?

finetune/adapter.py

carmocca

I think making these distinctions is fine. Because they are still true in the general multi-node case, not just with XLA.

finetune/adapter.py

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

gkroiz · 2023-06-07T16:22:42Z

@carmocca I noticed this PR is now closed. Do we no longer need to xla changes?

carmocca · 2023-06-07T16:35:15Z

Sorry, it was an accident!

carmocca

Minor comment. Rest looks good

finetune/adapter.py

xla changes for adapter.py

3e94831

carmocca reviewed Jun 5, 2023

View reviewed changes

finetune/adapter.py Outdated Show resolved Hide resolved

finetune/adapter.py Outdated Show resolved Hide resolved

finetune/adapter.py Outdated Show resolved Hide resolved

gkroiz added 2 commits June 5, 2023 17:27

simplified strategy selection

6a3e010

force specific precision when using TPUs

7eac94b

Liyang90 reviewed Jun 5, 2023

View reviewed changes

finetune/adapter.py Outdated Show resolved Hide resolved

carmocca reviewed Jun 6, 2023

View reviewed changes

finetune/adapter.py Outdated Show resolved Hide resolved

finetune/adapter.py Outdated Show resolved Hide resolved

gkroiz and others added 2 commits June 5, 2023 20:09

Update finetune/adapter.py

917de38

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

Update finetune/adapter.py

eb2f5b9

Co-authored-by: Carlos Mocholí <carlossmocholi@gmail.com>

carmocca mentioned this pull request Jun 6, 2023

Remove warning on no_backward_sync with XLA strategy Lightning-AI/pytorch-lightning#17761

Merged

carmocca and others added 6 commits June 6, 2023 17:51

Revert no_backward_sync change

d578a68

Black fmt

41023df

Unused imports

e8b390f

Merge branch 'main' into finetuning_tpu

1cb3aa7

fmt

e84b425

rearranged function arguments

42d02f8

carmocca closed this in Lightning-AI/pytorch-lightning#17761 Jun 7, 2023

carmocca reopened this Jun 7, 2023

gkroiz added 2 commits June 7, 2023 17:33

fix tpu device count for multi-host TPU finetuning

73a7495

added xla changes to adapter_v2

ae14453

gkroiz marked this pull request as ready for review June 7, 2023 21:03

gkroiz requested a review from lantiga as a code owner June 7, 2023 21:03

gkroiz added 2 commits June 7, 2023 18:40

Merge branch 'main' into finetuning_tpu

22a6264

Merge branch 'main' into finetuning_tpu

068aa93

carmocca approved these changes Jun 10, 2023

View reviewed changes

finetune/adapter.py Outdated Show resolved Hide resolved

gkroiz and others added 2 commits June 10, 2023 20:25

minor changes

9b8e071

Merge branch 'main' into finetuning_tpu

1d1a8de

carmocca merged commit 7236f51 into Lightning-AI:main Jun 11, 2023

carmocca mentioned this pull request Jun 11, 2023

Implement LoRA for efficient finetuning #128

Merged

6 tasks

gkroiz deleted the finetuning_tpu branch June 12, 2023 05:09

This was referenced Jun 12, 2023

Set the config's block size as the max_seq_length data preparation and fine tuning scripts #127

Merged

[TPU] XLA changes for pretraining #133

Merged

rasbt added a commit that referenced this pull request Jun 12, 2023

apply #110

c3fc314

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[TPU] XLA changes for finetuning #110

[TPU] XLA changes for finetuning #110

gkroiz commented Jun 5, 2023

gkroiz commented Jun 5, 2023

gkroiz commented Jun 5, 2023

gkroiz commented Jun 5, 2023

carmocca left a comment

gkroiz commented Jun 5, 2023

carmocca left a comment

gkroiz commented Jun 7, 2023

carmocca commented Jun 7, 2023

carmocca left a comment

[TPU] XLA changes for finetuning #110

[TPU] XLA changes for finetuning #110

Conversation

gkroiz commented Jun 5, 2023

gkroiz commented Jun 5, 2023

gkroiz commented Jun 5, 2023

gkroiz commented Jun 5, 2023

carmocca left a comment

Choose a reason for hiding this comment

gkroiz commented Jun 5, 2023

carmocca left a comment

Choose a reason for hiding this comment

gkroiz commented Jun 7, 2023

carmocca commented Jun 7, 2023

carmocca left a comment

Choose a reason for hiding this comment