Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove assumption that padding only occurs on last rank #6974

Merged
merged 6 commits into from
Jan 31, 2025

Conversation

xylian86
Copy link
Contributor

As discussed in PR-6918, padding can occur on multiple ranks with large DP degrees.

For example, with:

  • Flattened tensor size: 266240
  • DP degree: 768
  • Alignment: 1536
  • Required padding: 1024 (1536 * 174 - 266240)
  • Per-rank partition size: 348 (1536 * 174 / 768)
  • The padding occurs on last three ranks.

This PR removes the single-rank padding assumption for more general cases.

@tjruwase
Copy link
Contributor

@xylian86, thanks for the quick solution!

@saforem2, can you please test this PR?

@saforem2
Copy link
Collaborator

yes will work on testing this today, thanks!

@tjruwase tjruwase added this pull request to the merge queue Jan 31, 2025
Merged via the queue into deepspeedai:master with commit 4fea41f Jan 31, 2025
12 checks passed
tjruwase added a commit that referenced this pull request Feb 6, 2025
As discussed in
[PR-6918](#6918), padding can
occur on multiple ranks with large DP degrees.

For example, with:
- Flattened tensor size: 266240
- DP degree: 768
- Alignment: 1536
- Required padding: 1024 (1536 * 174 - 266240)
- Per-rank partition size: 348 (1536 * 174 / 768)
- The padding occurs on last three ranks.

This PR removes the single-rank padding assumption for more general
cases.

---------

Co-authored-by: Sam Foreman <saforem2@gmail.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Signed-off-by: Olatunji Ruwase <olruwase@microsoft.com>
siqi654321 pushed a commit to siqi654321/DeepSpeed that referenced this pull request Feb 7, 2025
…6974)

As discussed in
[PR-6918](deepspeedai#6918), padding can
occur on multiple ranks with large DP degrees.

For example, with:
- Flattened tensor size: 266240
- DP degree: 768
- Alignment: 1536 
- Required padding: 1024 (1536 * 174 - 266240)
- Per-rank partition size: 348 (1536 * 174 / 768)
- The padding occurs on last three ranks. 

This PR removes the single-rank padding assumption for more general
cases.

---------

Co-authored-by: Sam Foreman <saforem2@gmail.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Signed-off-by: siqi <siqi@tecorigin.com>
@saforem2
Copy link
Collaborator

saforem2 commented Feb 7, 2025

This appears to be fixed.

I've added a new comment with details in the original PR

traincheck-team pushed a commit to traincheck-team/DeepSpeed that referenced this pull request Feb 9, 2025
…6974)

As discussed in
[PR-6918](deepspeedai#6918), padding can
occur on multiple ranks with large DP degrees.

For example, with:
- Flattened tensor size: 266240
- DP degree: 768
- Alignment: 1536 
- Required padding: 1024 (1536 * 174 - 266240)
- Per-rank partition size: 348 (1536 * 174 / 768)
- The padding occurs on last three ranks. 

This PR removes the single-rank padding assumption for more general
cases.

---------

Co-authored-by: Sam Foreman <saforem2@gmail.com>
Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com>
Co-authored-by: Olatunji Ruwase <olruwase@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants