Add image for better explanation to FSDP tutorial #2644

ChanBong · 2023-11-04T06:49:47Z

Fixes #2613

Description

The tutorial lacked an explanation for what's going on behind parameter sharding

Checklist

The issue that is being fixed is referred in the description (see above "Fixes #ISSUE_NUMBER")
Only one issue is addressed in this pull request
Labels from the issue that this PR is fixing are added to this pull request
No unnecessary issues are included into this pull request.

cc @wconstab @osalpekar @H-Huang @kwen2501 @sekyondaMeta @svekars @carljparker @NicolasHug @kit1980 @subramen

pytorch-bot · 2023-11-04T06:49:50Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/2644

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 077d1c0 with merge base f05f050 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

awgu · 2023-11-04T18:22:40Z

intermediate_source/FSDP_tutorial.rst

@@ -46,6 +46,15 @@ At a high level FSDP works as follow:
 * Run reduce_scatter to sync gradients
 * Discard parameters. 

+The key insight behind full parameter sharding is that we can decompose the all-reduce operations in DDP into separate reduce-scatter and all-gather operations.


I am not sure that this is the correct statement.

Even though an all-reduce can be decomposed as a reduce-scatter and all-gather, the current phrasing might suggest that DDP's gradient all-reduce is being decomposed into a gradient reduce-scatter and gradient all-gather. However, FSDP actually all-gathers parameters.

Whether or not this decomposition of all-reduce into reduce-scatter and all-gather is the key insight is not obvious to me. If we show this decomposition, we probably want more exposition.

I agree. Adding this picture would demand a clear explanation.

I am unsure what to write. If you can suggest something or direct me to where I can read about this topic, that'd be very helpful.

@awgu can you help @ChanBong with this?

Maybe something like the following:

One way to view FSDP's sharding is to decompose the DDP gradient all-reduce into reduce-scatter and all-gather. In particular, FSDP reduce-scatters gradients such that each rank has a shard of the gradients in backward, updates the corresponding shard of the parameters in the optimizer step, and all-gathers them in the next forward.

< Figure >

Sounds good. Thanks

awgu

Sounds good to me!

intermediate_source/FSDP_tutorial.rst

Co-authored-by: Andrew Gu <31054793+awgu@users.noreply.github.com>

NicolasHug

LGTM, thanks @ChanBong and @awgu for the review

Add image for better explanation

8fc5e9b

facebook-github-bot added the cla signed label Nov 4, 2023

github-actions bot added distributed medium docathon-h2-2023 and removed cla signed labels Nov 4, 2023

facebook-github-bot added the cla signed label Nov 4, 2023

svekars requested review from H-Huang and awgu November 4, 2023 16:18

awgu reviewed Nov 4, 2023

View reviewed changes

Merge branch 'main' into issue2613

7d07023

github-actions bot removed the cla signed label Nov 6, 2023

facebook-github-bot added the cla signed label Nov 6, 2023

Edit explanation for fsdp sharding

4e63666

github-actions bot removed the cla signed label Nov 11, 2023

facebook-github-bot added the cla signed label Nov 11, 2023

svekars requested a review from awgu November 11, 2023 21:00

awgu approved these changes Nov 13, 2023

View reviewed changes

intermediate_source/FSDP_tutorial.rst Outdated Show resolved Hide resolved

Update intermediate_source/FSDP_tutorial.rst

9e2916c

Co-authored-by: Andrew Gu <31054793+awgu@users.noreply.github.com>

NicolasHug approved these changes Nov 13, 2023

View reviewed changes

Merge branch 'main' into issue2613

077d1c0

NicolasHug changed the title ~~Add image for better explanation~~ Add image for better explanation to FSDP tutorial Nov 13, 2023

github-actions bot removed the cla signed label Nov 13, 2023

facebook-github-bot added the cla signed label Nov 13, 2023

svekars merged commit dc448c2 into pytorch:main Nov 13, 2023
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add image for better explanation to FSDP tutorial #2644

Add image for better explanation to FSDP tutorial #2644

ChanBong commented Nov 4, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Nov 4, 2023 •

edited

Loading

awgu Nov 4, 2023

ChanBong Nov 5, 2023

svekars Nov 7, 2023

awgu Nov 9, 2023

ChanBong Nov 11, 2023

awgu left a comment

NicolasHug left a comment

Add image for better explanation to FSDP tutorial #2644

Add image for better explanation to FSDP tutorial #2644

Conversation

ChanBong commented Nov 4, 2023 • edited by pytorch-bot bot Loading

Description

Checklist

pytorch-bot bot commented Nov 4, 2023 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/2644

✅ No Failures

awgu Nov 4, 2023

Choose a reason for hiding this comment

ChanBong Nov 5, 2023

Choose a reason for hiding this comment

svekars Nov 7, 2023

Choose a reason for hiding this comment

awgu Nov 9, 2023

Choose a reason for hiding this comment

ChanBong Nov 11, 2023

Choose a reason for hiding this comment

awgu left a comment

Choose a reason for hiding this comment

NicolasHug left a comment

Choose a reason for hiding this comment

ChanBong commented Nov 4, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Nov 4, 2023 •

edited

Loading