Add finetuning strategies for DeepSpeed #1377

ar90n · 2022-07-03T05:53:03Z

What does this PR do?

This PR provides some workarounds to use DeepSpeed in finetuning. In fact, DeepSpeed cannot work with pytorch-lightning completely because its parameter loading and storing don't work. So this PR added some fine-tuning strategies whose parameter loading and storing are omitted.

Fixes #1249

Before submitting

Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
Did you read the contributor guideline, Pull Request section?
Did you make sure your PR does only one thing, instead of bundling different changes together?
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests? [not needed for typos/docs]
Did you verify new and existing tests pass locally with your changes?
If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

Is this pull request ready for review? (if not, please submit in draft mode)

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

for more information, see https://pre-commit.ci

codecov · 2022-07-03T05:57:17Z

Codecov Report

Merging #1377 (acf3ae3) into master (0253d71) will decrease coverage by 0.01%.
The diff coverage is 77.77%.

@@            Coverage Diff             @@
##           master    #1377      +/-   ##
==========================================
- Coverage   92.90%   92.88%   -0.02%     
==========================================
  Files         286      286              
  Lines       12874    12891      +17     
==========================================
+ Hits        11960    11974      +14     
- Misses        914      917       +3

Flag	Coverage Δ
unittests	`92.88% <77.77%> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
flash/core/finetuning.py	`88.23% <76.47%> (-2.47%)`	⬇️
flash/core/utilities/imports.py	`91.47% <100.00%> (+0.04%)`	⬆️
flash/text/question_answering/model.py	`93.87% <0.00%> (-0.69%)`	⬇️
flash/core/serve/dag/task.py	`97.88% <0.00%> (+1.05%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

for more information, see https://pre-commit.ci

ar90n · 2022-07-03T11:17:32Z

I don't know how to fix Lightning-AI.lightning-flash (Examples) jobs. Could you give me some help?

krshrimali

Hi, @ar90n - Thank you so much for working on this PR. I really appreciate that you added the documentation and relevant tests. 🎉

LGTM (just added a couple of minor suggestions)!

Regarding the Example failures, please don't worry about them. Though the CI has been fixed now, so it should be all good.

docs/source/general/finetuning.rst

flash/core/finetuning.py

krshrimali · 2022-07-22T07:32:42Z

tests/core/test_finetuning.py

+@pytest.mark.parametrize(
+    "strategy_key, strategy_metadata",
+    [
+        ("no_freeze", None),
+        ("freeze", None),
+        ("freeze_unfreeze", 2),
+        ("unfreeze_milestones", ((5, 10), 15)),
+    ],
+)
+def test_deepspeed_finetuning_strategy_key(strategy_key, strategy_metadata):
+    deepspeed_strategy_key = f"{strategy_key}_deepspeed"
+
+    strategy = _FINETUNING_STRATEGIES_REGISTRY.get(key=strategy_key)(strategy_metadata=strategy_metadata).strategy
+    deepspeed_strategy = _FINETUNING_STRATEGIES_REGISTRY.get(key=deepspeed_strategy_key)(
+        strategy_metadata=strategy_metadata
+    ).strategy
+    assert strategy == deepspeed_strategy


Really nice tests! Thanks for adding them.

Co-authored-by: Kushashwa Ravi Shrimali <kushashwaravishrimali@gmail.com>

for more information, see https://pre-commit.ci

ar90n · 2022-07-22T11:24:34Z

Hi @krshrimali
Thanks for your review and suggestions! It's so helpful for me. Because my English is poor. I'm so glad that this PR was approved.

ar90n · 2022-07-22T14:00:03Z

I checked the reason for Lightning-AI.lightning-flash (Examples) and found the follwoings.

RuntimeError: CUDA error: the launch timed out and was terminated
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

I'm not familiar with them. It seems that the test process was terminated by its timeout.
In my local environment, this test passes. Please some help to solve this issue.

krshrimali · 2022-07-22T14:04:17Z

I checked the reason for Lightning-AI.lightning-flash (Examples) and found the follwoings.
RuntimeError: CUDA error: the launch timed out and was terminated
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
I'm not familiar with them. It seems that the test process was terminated by its timeout. In my local environment, this test passes. Please some help to solve this issue.

Hi, @ar90n - Please don't worry about it. I'll have to check once on my personal GPU, sometimes these failures can be flaky (because of resources not being available or anything else). I'll merge it once I'm done testing, but this is good to go! 🎉

krshrimali · 2022-07-27T04:01:44Z

The test passes locally on my GPU, let's merge this and monitor the CI. In case an alarm is raised, I'll attempt to fix it. Thanks, @ar90n for your hard-work and patience with this PR. 🎉

krshrimali · 2022-08-02T04:37:10Z

Just added the CHANGELOG entry, let's wait for the CI, and push it ASAP. <3

krshrimali · 2022-08-26T06:33:37Z

@ar90n - FYI, it took us some time to fix the CI, sorry for that. @ethanwharris is currently OOO for this week, so whenever he is back, he'll help merge this. 🎉 Thank you for your contribution, and patience.

ar90n and others added 2 commits July 3, 2022 05:43

Add finetuning strategies for DeepSpeed

2a4b251

[pre-commit.ci] auto fixes from pre-commit.com hooks

7e0f2e6

for more information, see https://pre-commit.ci

ar90n and others added 3 commits July 3, 2022 06:22

Fix type

1413a16

Add documents about finetunig with DeepSpeed

fdeb8fd

[pre-commit.ci] auto fixes from pre-commit.com hooks

7950b7f

for more information, see https://pre-commit.ci

ar90n marked this pull request as ready for review July 3, 2022 11:14

ar90n requested review from edenlightning, ethanwharris, Borda, tchaton, justusschock, carmocca and kaushikb11 as code owners July 3, 2022 11:14

ethanwharris added this to the 0.8.0 milestone Jul 11, 2022

Merge branch 'master' into feature/add-deepspeed-finetuning-strategies

c9fc02a

krshrimali self-requested a review as a code owner July 22, 2022 07:28

krshrimali approved these changes Jul 22, 2022

View reviewed changes

ar90n and others added 3 commits July 22, 2022 20:21

Update docs/source/general/finetuning.rst

fd7177d

Co-authored-by: Kushashwa Ravi Shrimali <kushashwaravishrimali@gmail.com>

Update flash/core/finetuning.py

ea90fb4

Co-authored-by: Kushashwa Ravi Shrimali <kushashwaravishrimali@gmail.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

9541b38

for more information, see https://pre-commit.ci

Borda assigned krshrimali Jul 26, 2022

krshrimali enabled auto-merge (squash) July 27, 2022 04:02

krshrimali added 2 commits July 28, 2022 12:11

Merge branch 'master' into feature/add-deepspeed-finetuning-strategies

ad10bd4

Add changelog entry

2b98655

Merge branch 'master' into feature/add-deepspeed-finetuning-strategies

591ae05

Merge branch 'master' into feature/add-deepspeed-finetuning-strategies

dfe2b39

mergify bot added the has conflicts label Aug 30, 2022

Merge branch 'master' into feature/add-deepspeed-finetuning-strategies

acf3ae3

mergify bot removed the has conflicts label Aug 31, 2022

krshrimali merged commit 0e9fdc0 into Lightning-Universe:master Aug 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add finetuning strategies for DeepSpeed #1377

Add finetuning strategies for DeepSpeed #1377

ar90n commented Jul 3, 2022 •

edited

Loading

codecov bot commented Jul 3, 2022 •

edited

Loading

ar90n commented Jul 3, 2022

krshrimali left a comment •

edited

Loading

krshrimali Jul 22, 2022

ar90n commented Jul 22, 2022

ar90n commented Jul 22, 2022

krshrimali commented Jul 22, 2022

krshrimali commented Jul 27, 2022

krshrimali commented Aug 2, 2022

krshrimali commented Aug 26, 2022

Add finetuning strategies for DeepSpeed #1377

Add finetuning strategies for DeepSpeed #1377

Conversation

ar90n commented Jul 3, 2022 • edited Loading

What does this PR do?

Before submitting

PR review

Did you have fun?

codecov bot commented Jul 3, 2022 • edited Loading

Codecov Report

ar90n commented Jul 3, 2022

krshrimali left a comment • edited Loading

Choose a reason for hiding this comment

krshrimali Jul 22, 2022

Choose a reason for hiding this comment

ar90n commented Jul 22, 2022

ar90n commented Jul 22, 2022

krshrimali commented Jul 22, 2022

krshrimali commented Jul 27, 2022

krshrimali commented Aug 2, 2022

krshrimali commented Aug 26, 2022

ar90n commented Jul 3, 2022 •

edited

Loading

codecov bot commented Jul 3, 2022 •

edited

Loading

krshrimali left a comment •

edited

Loading