Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bucket small tensors and collective operations into larger ones #5

Open
xrsrke opened this issue Oct 25, 2023 · 0 comments
Open

Bucket small tensors and collective operations into larger ones #5

xrsrke opened this issue Oct 25, 2023 · 0 comments
Assignees
Milestone

Comments

@xrsrke
Copy link
Owner

xrsrke commented Oct 25, 2023

Similar to https://github.com/facebookresearch/fairscale/blob/164cc0f3170b4a3951dd84dda29c3e1504ac4d6e/fairscale/internal/reduce_scatter_bucketer.py#L74. But we design it in a modular way.

  • Store tensors in a continuous memory space
  • Support partitioning a bucket across parallelism dimension
  • Wait for the bucket fill up and do a distributed operation in a bucket
  • Move a tensor out of a bucket
  • Reuse the bucket after flush it
@xrsrke xrsrke converted this from a draft issue Oct 25, 2023
@xrsrke xrsrke changed the title Store tensor in continuous memory Bucket small tensors and collective operations into larger ones Oct 25, 2023
@xrsrke xrsrke added the help wanted Extra attention is needed label Oct 25, 2023
@xrsrke xrsrke moved this from Todo to In Progress in pipegoose v1 Oct 25, 2023
@xrsrke xrsrke self-assigned this Oct 25, 2023
@xrsrke xrsrke added this to the v1 milestone Oct 28, 2023
@xrsrke xrsrke removed the help wanted Extra attention is needed label Nov 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

No branches or pull requests

1 participant