Do not materialize entire randperm in RandomSampler #103339

aviverma01 · 2023-06-09T19:43:38Z

In our DDP training workloads, each rank was initializing a RandomSampler for a dataset with a length of 3.5 billion items. We noticed that when this sampler was in scope, gc.collect calls were taking on the order of seconds to run, which would slow down the entire training iteration. This is because when we call torch.randperm(n).tolist(), we create a python list of 3.5 billion items, which massively slows down the periodic mark & sweep garbage collection.

This PR swaps out the .tolist() call with a .numpy() call and manually calls .item() on each element as it is being requested. This has two benefits:

The first call to RandomSampler::__next__ should be about twice as fast, since .numpy does not copy the contents of the original tensor
The runtime of gc.collect() calls no longer scales linearly with the size of the dataset passed to RandomSampler

I've attached some timeit samples to illustrate the speedups with this Pr:

Main (no GC):  51.72115747816861
Main (10 GC calls) 83.61965207383037
PR (no GC) 33.06403830461204
PR (10 GC calls) 33.959467427805066

Code

from timeit import timeit


baseline_no_gc = """
import torch

n = int(1e9)
steps = n // 100

x = torch.randperm(n).tolist()
x_iter = iter(x)

for i in range(steps):
    next(x_iter)
"""


baseline_gc = """
import torch
import gc
n = int(1e9)
steps = n // 100
gc_every = steps // 10

x = torch.randperm(n).tolist()
x_iter = iter(x)

for i in range(steps):
    next(x_iter)
    if i % gc_every == 0:
        gc.collect()
"""


numpy_no_gc = """
import torch
n = int(1e9)
steps = n // 100

x = torch.randperm(n).numpy()
x_iter = (i.item() for i in x)

for i in range(steps):
    next(x_iter)
"""

numpy_gc = """
import torch
import gc
n = int(1e9)
steps = n // 100
gc_every = steps // 10

x = torch.randperm(n).numpy()
x_iter = (i.item() for i in x)

for i in range(steps):
    next(x_iter)
    if i % gc_every == 0:
        gc.collect()
"""


if __name__ == "__main__":
    print("Main (no GC): ", timeit(baseline_no_gc, number=1))
    print("Main (10 GC calls)", timeit(baseline_gc, number=1))
    print("PR (no GC)",  timeit(numpy_no_gc, number=1))
    print("PR (10 GC calls)", timeit(numpy_gc, number=1))

pytorch-bot · 2023-06-09T19:43:41Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/103339

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8f5e445 with merge base bc2caa7 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2023-06-09T19:43:42Z

The committers listed above are authorized under a signed CLA.

✅ login: aviverma01 (7cf7c49, 6af9178, d9f151b, fe86601, 8f5e445)

vadimkantorov · 2023-06-09T19:49:17Z

torch/utils/data/sampler.py

-            yield from torch.randperm(n, generator=generator).tolist()[:self.num_samples % n]
+                indices = torch.randperm(n, generator=generator)
+                for i in indices:
+                    yield i.item()


one minor issue is that repeated .item() are slow #29973

so maybe tolist() is actually not that bad? (espeically if indices itself is materialized as tensor)

Thanks for the suggestion! I've updated the code to call .numpy on the tensor before iterating on it, which should avoid the slow .item() calls

Well, if you can afford an extra allocation, why not just yield from indices.tolist()? Because the indices Python list would take too much memory?

I don't know what's the current state of affaires on obligation of numpy dependency

Maybe worth filing a feature request to iterate Python items? Or supporting memoryview on tensors (so that it can be iterated)...

I've added some comments in the PR description, but the main issue is that calling .tolist() on a torch tensor of size 1billion+ adds a massive garbage collection overhead because we just allocated billions of individual python int objects that need to be managed separately. By using a numpy array instead, the garbage collector only needs to keep track of 1 object regardless of the dataset size, making garbage collection much faster during training.

i created also #103352

could this just be yield from torch.randperm(n, generator=generator).numpy() (+ some indexing) keeping the existing terse syntax?

Done in 7a7a1e9, note that there's an additional map call compared to the original comment to ensure that the type matches

@aviverma01 Should we also change the other branch, yield from torch.randint(high=n, size=(32,), dtype=torch.int64, generator=generator).tolist() ?

Done in b450690

facebook-github-bot · 2023-06-13T19:04:32Z

@kit1980 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

kit1980 · 2023-06-13T19:05:20Z

I've triggered more tests and also imported this internally to make sure nothing breaks.

aviverma01 · 2023-06-14T02:10:35Z

I've triggered more tests and also imported this internally to make sure nothing breaks.

Thanks @kit1980, would you able to help fix the "Meta Internal-Only Changes Check"? Also I think some of the tests may be flakey. Would it be possible to re-kick the failing tests?

kit1980 · 2023-06-14T02:16:38Z

@pytorchbot rebase

pytorchmergebot · 2023-06-14T02:19:07Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2023-06-14T02:19:14Z

Successfully rebased patch-1 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout patch-1 && git pull --rebase)

aviverma01 · 2023-06-14T18:02:38Z

@kit1980 Looks like there may be some remaining flakey tests, and I believe the "Meta Internal-Only" Changes Check is still failing. Any chance you could help/show me how to fix it?

facebook-github-bot · 2023-06-15T18:06:13Z

@kit1980 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

kit1980 · 2023-06-15T18:07:44Z

@mergebot merge -i

aviverma01 · 2023-06-16T16:45:25Z

@mergebot merge -i

@kit1980 do you know why the pytorchbot didn't pick up the merge request?

kit1980 · 2023-06-16T17:12:13Z

@aviverma01 sorry, I misspelled the bot name.

kit1980 · 2023-06-16T17:12:32Z

@pytorchmergebot merge

pytorchmergebot · 2023-06-16T17:14:17Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

qqaatw · 2023-10-25T22:27:50Z

Hi @aviverma01 @kit1980, the change this PR introduces seems to block the usage of manually specifying DataLoader's generator with a non-CPU device due to the numpy() operations:

gen = torch.Generator(device=torch.device("mps:0"))
data_loader = data.DataLoader(dataset, batch_size=8, shuffle=True, generator=gen)

The error:

TypeError: can't convert mps:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Was it intended to be? Or do you have any idea on this?

Thanks :)

kit1980 · 2023-10-26T18:51:12Z

@pytorchbot revert -m "Cause issues on MPS, and also fails without numpy" -c nosignal

kit1980 · 2023-10-26T18:51:48Z

I'm reverting this.

I've realized there is another issue with the PR, in fails without numpy, which is an optional dependency actually.

pytorchmergebot · 2023-10-26T18:53:09Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2023-10-26T18:53:19Z

@aviverma01 your PR has been successfully reverted.

This reverts commit d80174e. Reverted #103339 on behalf of https://github.com/kit1980 due to Cause issues on MPS, and also fails without numpy ([comment](#103339 (comment)))

…103339)" This reverts commit d80174e. Reverted pytorch#103339 on behalf of https://github.com/kit1980 due to Cause issues on MPS, and also fails without numpy ([comment](pytorch#103339 (comment)))

#112187) This reverts commit d80174e. Reverted #103339 on behalf of https://github.com/kit1980 due to Cause issues on MPS, and also fails without numpy ([comment](#103339 (comment))) Co-authored-by: PyTorch MergeBot <pytorchmergebot@users.noreply.github.com>

…103339)" This reverts commit d80174e. Reverted pytorch#103339 on behalf of https://github.com/kit1980 due to Cause issues on MPS, and also fails without numpy ([comment](pytorch#103339 (comment)))

pytorch-bot bot added the release notes: dataloader release notes category label Jun 9, 2023

pytorchbot added the open source label Jun 9, 2023

vadimkantorov reviewed Jun 9, 2023

View reviewed changes

aviverma01 force-pushed the patch-1 branch from 07e44d3 to e6b63e4 Compare June 9, 2023 21:50

aviverma01 marked this pull request as ready for review June 9, 2023 21:58

aviverma01 requested a review from ejguan as a code owner June 9, 2023 21:58

soulitzer added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 12, 2023

kit1980 approved these changes Jun 13, 2023

View reviewed changes

aviverma01 force-pushed the patch-1 branch from a214fa1 to b450690 Compare June 13, 2023 18:01

kit1980 added ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/inductor labels Jun 13, 2023

kit1980 self-assigned this Jun 13, 2023

aviverma01 force-pushed the patch-1 branch from fefd2f4 to 215ca37 Compare June 13, 2023 21:40

aviverma01 and others added 5 commits June 14, 2023 02:19

Do not materialize entire randperm in RandomSampler

7cf7c49

Call .numpy before iterating

6af9178

Use terse syntax

d9f151b

Use .numpy for replacement=True

fe86601

Break line up to pass linting

8f5e445

pytorchmergebot force-pushed the patch-1 branch from 215ca37 to 8f5e445 Compare June 14, 2023 02:19

pytorch deleted a comment from pytorch-bot bot Jun 15, 2023

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 16, 2023

pytorchmergebot added the merging label Jun 16, 2023

pytorchmergebot added Merged and removed merging labels Jun 16, 2023

pytorchmergebot closed this in d80174e Jun 16, 2023

BenjaminBossan mentioned this pull request Oct 23, 2023

Allow for samplers to be seedable and reproducable huggingface/accelerate#2057

Merged

5 tasks

qqaatw mentioned this pull request Oct 25, 2023

"Expected a 'mps:0' generator device but found 'cpu'" using shuffle=True on DataLoader #110820

Open

pytorchmergebot added the Reverted label Oct 26, 2023

kit1980 mentioned this pull request Oct 26, 2023

Revert "Do not materialize entire randperm in RandomSampler (#103339)" #112187

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not materialize entire randperm in RandomSampler #103339

Do not materialize entire randperm in RandomSampler #103339

aviverma01 commented Jun 9, 2023 •

edited

Loading

pytorch-bot bot commented Jun 9, 2023 •

edited

Loading

linux-foundation-easycla bot commented Jun 9, 2023 •

edited

Loading

vadimkantorov Jun 9, 2023 •

edited

Loading

aviverma01 Jun 9, 2023

vadimkantorov Jun 9, 2023 •

edited

Loading

vadimkantorov Jun 9, 2023

aviverma01 Jun 9, 2023 •

edited

Loading

vadimkantorov Jun 9, 2023

vadimkantorov Jun 10, 2023

aviverma01 Jun 13, 2023 •

edited

Loading

kit1980 Jun 13, 2023

aviverma01 Jun 13, 2023

facebook-github-bot commented Jun 13, 2023

kit1980 commented Jun 13, 2023

aviverma01 commented Jun 14, 2023

kit1980 commented Jun 14, 2023

pytorchmergebot commented Jun 14, 2023

pytorchmergebot commented Jun 14, 2023

aviverma01 commented Jun 14, 2023

facebook-github-bot commented Jun 15, 2023

kit1980 commented Jun 15, 2023

aviverma01 commented Jun 16, 2023

kit1980 commented Jun 16, 2023

kit1980 commented Jun 16, 2023

pytorchmergebot commented Jun 16, 2023

qqaatw commented Oct 25, 2023

kit1980 commented Oct 26, 2023

kit1980 commented Oct 26, 2023

pytorchmergebot commented Oct 26, 2023

pytorchmergebot commented Oct 26, 2023

Do not materialize entire randperm in RandomSampler #103339

Do not materialize entire randperm in RandomSampler #103339

Conversation

aviverma01 commented Jun 9, 2023 • edited Loading

pytorch-bot bot commented Jun 9, 2023 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/103339

✅ No Failures

linux-foundation-easycla bot commented Jun 9, 2023 • edited Loading

vadimkantorov Jun 9, 2023 • edited Loading

Choose a reason for hiding this comment

aviverma01 Jun 9, 2023

Choose a reason for hiding this comment

vadimkantorov Jun 9, 2023 • edited Loading

Choose a reason for hiding this comment

vadimkantorov Jun 9, 2023

Choose a reason for hiding this comment

aviverma01 Jun 9, 2023 • edited Loading

Choose a reason for hiding this comment

vadimkantorov Jun 9, 2023

Choose a reason for hiding this comment

vadimkantorov Jun 10, 2023

Choose a reason for hiding this comment

aviverma01 Jun 13, 2023 • edited Loading

Choose a reason for hiding this comment

kit1980 Jun 13, 2023

Choose a reason for hiding this comment

aviverma01 Jun 13, 2023

Choose a reason for hiding this comment

facebook-github-bot commented Jun 13, 2023

kit1980 commented Jun 13, 2023

aviverma01 commented Jun 14, 2023

kit1980 commented Jun 14, 2023

pytorchmergebot commented Jun 14, 2023

pytorchmergebot commented Jun 14, 2023

aviverma01 commented Jun 14, 2023

facebook-github-bot commented Jun 15, 2023

kit1980 commented Jun 15, 2023

aviverma01 commented Jun 16, 2023

kit1980 commented Jun 16, 2023

kit1980 commented Jun 16, 2023

pytorchmergebot commented Jun 16, 2023

Merge started

qqaatw commented Oct 25, 2023

kit1980 commented Oct 26, 2023

kit1980 commented Oct 26, 2023

pytorchmergebot commented Oct 26, 2023

pytorchmergebot commented Oct 26, 2023

aviverma01 commented Jun 9, 2023 •

edited

Loading

pytorch-bot bot commented Jun 9, 2023 •

edited

Loading

linux-foundation-easycla bot commented Jun 9, 2023 •

edited

Loading

vadimkantorov Jun 9, 2023 •

edited

Loading

vadimkantorov Jun 9, 2023 •

edited

Loading

aviverma01 Jun 9, 2023 •

edited

Loading

aviverma01 Jun 13, 2023 •

edited

Loading