Add --offload-to-disk support to minifier #100546

ezyang · 2023-05-03T14:53:16Z

Stack from ghstack (oldest at bottom):

When minifying extremely large repros, the minifier can run out of memory. This is because, for delta debugging, the minifier keeps a copy of every intermediate output in the network. This can easily put you over the memory limit for your GPU. To make matters worse, we cannot easily delta debug in such a situation, as delta debugging involves replacing intermediates with inputs, but doing so can cause an intermediate to become live longer than its actual extent in the original model (since inputs all have to be allocated up front).

The strategy in this PR is to use load_tensor from the previous PR to offer a low memory mode for delta debugging. Instead of putting intermediates as inputs, we instead load them in the middle of the graph in question. If, through DCE, the load_tensor ends up floating to the top of the graph, we can input-ify it. We now no longer save all intermediates in memory, but instead save them to disk. I used this to successfully minify the repro that helped us solve #100332

The testing is not very good. I can try to add more robust testing but it will involve a more involved refactor to FX minifier. Let me know if that's what you want.

Signed-off-by: Edward Z. Yang ezyang@meta.com

cc @soumith @voznesenskym @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @desertfire

Signed-off-by: Edward Z. Yang <ezyang@meta.com> [ghstack-poisoned]

pytorch-bot · 2023-05-03T14:53:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/100546

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c0b1355:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: dfd0c5a82f1970a3eaa655d5fd0864f5c2c4f6a0 Pull Request resolved: #100546

When minifying extremely large repros, the minifier can run out of memory. This is because, for delta debugging, the minifier keeps a copy of every intermediate output in the network. This can easily put you over the memory limit for your GPU. To make matters worse, we cannot easily delta debug in such a situation, as delta debugging involves replacing intermediates with inputs, but doing so can cause an intermediate to become live longer than its actual extent in the original model (since inputs all have to be allocated up front). The strategy in this PR is to use `load_tensor` from the previous PR to offer a low memory mode for delta debugging. Instead of putting intermediates as inputs, we instead load them in the middle of the graph in question. If, through DCE, the load_tensor ends up floating to the top of the graph, we can input-ify it. We now no longer save all intermediates in memory, but instead save them to disk. I used this to successfully minify the repro that helped us solve #100332 The testing is not very good. I can try to add more robust testing but it will involve a more involved refactor to FX minifier. Let me know if that's what you want. Signed-off-by: Edward Z. Yang <ezyangmeta.com> cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: 4fbdca8cb62fc90662111444c1d0a4740e4f5495 Pull Request resolved: #100546

When minifying extremely large repros, the minifier can run out of memory. This is because, for delta debugging, the minifier keeps a copy of every intermediate output in the network. This can easily put you over the memory limit for your GPU. To make matters worse, we cannot easily delta debug in such a situation, as delta debugging involves replacing intermediates with inputs, but doing so can cause an intermediate to become live longer than its actual extent in the original model (since inputs all have to be allocated up front). The strategy in this PR is to use `load_tensor` from the previous PR to offer a low memory mode for delta debugging. Instead of putting intermediates as inputs, we instead load them in the middle of the graph in question. If, through DCE, the load_tensor ends up floating to the top of the graph, we can input-ify it. We now no longer save all intermediates in memory, but instead save them to disk. I used this to successfully minify the repro that helped us solve #100332 The testing is not very good. I can try to add more robust testing but it will involve a more involved refactor to FX minifier. Let me know if that's what you want. Signed-off-by: Edward Z. Yang <ezyangmeta.com> cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: c1b36a2be4978eff7899aeae903280af2c9a72fd Pull Request resolved: #100546

torch/_functorch/fx_minifier.py

When minifying extremely large repros, the minifier can run out of memory. This is because, for delta debugging, the minifier keeps a copy of every intermediate output in the network. This can easily put you over the memory limit for your GPU. To make matters worse, we cannot easily delta debug in such a situation, as delta debugging involves replacing intermediates with inputs, but doing so can cause an intermediate to become live longer than its actual extent in the original model (since inputs all have to be allocated up front). The strategy in this PR is to use `load_tensor` from the previous PR to offer a low memory mode for delta debugging. Instead of putting intermediates as inputs, we instead load them in the middle of the graph in question. If, through DCE, the load_tensor ends up floating to the top of the graph, we can input-ify it. We now no longer save all intermediates in memory, but instead save them to disk. I used this to successfully minify the repro that helped us solve #100332 The testing is not very good. I can try to add more robust testing but it will involve a more involved refactor to FX minifier. Let me know if that's what you want. Signed-off-by: Edward Z. Yang <ezyangmeta.com> cc soumith voznesenskym penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 desertfire [ghstack-poisoned]

Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: d92580542615fe19bb3934a32ab303f11fe68b5c Pull Request resolved: #100546

ezyang · 2023-05-05T03:09:41Z

@pytorchbot merge

pytorchmergebot · 2023-05-05T03:13:35Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Add --offload-to-disk support to minifier

acd1583

Signed-off-by: Edward Z. Yang <ezyang@meta.com> [ghstack-poisoned]

ezyang mentioned this pull request May 3, 2023

Misc accuracy improvements on minifier #100447

Closed

ezyang mentioned this pull request May 3, 2023

Improve minifier printing to be more chatty when it makes sense #100486

Closed

pytorch-bot bot added the release notes: fx release notes category label May 3, 2023

ezyang mentioned this pull request May 3, 2023

Add load_storage #100519

Closed

github-actions bot added ciflow/inductor module: dynamo module: inductor labels May 3, 2023

github-actions bot requested review from albanD, antoniojkim, bdhirsh, jbschlosser, miladm, SherlockNoMad, voznesenskym and wconstab May 3, 2023 14:53

ezyang added a commit that referenced this pull request May 3, 2023

Add --offload-to-disk support to minifier

c3e899b

Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: dfd0c5a82f1970a3eaa655d5fd0864f5c2c4f6a0 Pull Request resolved: #100546

ezyang requested review from Chillee, anijain2305 and ngimel May 3, 2023 14:58

albanD removed their request for review May 3, 2023 15:37

ezyang added the ciflow/trunk Trigger trunk jobs on your pull request label May 4, 2023

ezyang added a commit that referenced this pull request May 4, 2023

Add --offload-to-disk support to minifier

3f52eca

Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: 4fbdca8cb62fc90662111444c1d0a4740e4f5495 Pull Request resolved: #100546

ezyang added a commit that referenced this pull request May 4, 2023

Add --offload-to-disk support to minifier

d4e55ff

Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: c1b36a2be4978eff7899aeae903280af2c9a72fd Pull Request resolved: #100546

bdhirsh reviewed May 4, 2023

View reviewed changes

torch/_functorch/fx_minifier.py Show resolved Hide resolved

anijain2305 approved these changes May 4, 2023

View reviewed changes

voznesenskym approved these changes May 5, 2023

View reviewed changes

ezyang added a commit that referenced this pull request May 5, 2023

Add --offload-to-disk support to minifier

cd77da0

Signed-off-by: Edward Z. Yang <ezyangmeta.com> ghstack-source-id: d92580542615fe19bb3934a32ab303f11fe68b5c Pull Request resolved: #100546

pytorchmergebot added the merging label May 5, 2023

pytorchmergebot added Merged and removed merging labels May 5, 2023

pytorchmergebot closed this in ee4cb4b May 5, 2023

facebook-github-bot deleted the gh/ezyang/2077/head branch June 8, 2023 17:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add --offload-to-disk support to minifier #100546

Add --offload-to-disk support to minifier #100546

ezyang commented May 3, 2023 •

edited

Loading

pytorch-bot bot commented May 3, 2023 •

edited

Loading

ezyang commented May 5, 2023

pytorchmergebot commented May 5, 2023

Add --offload-to-disk support to minifier #100546

Add --offload-to-disk support to minifier #100546

Conversation

ezyang commented May 3, 2023 • edited Loading

pytorch-bot bot commented May 3, 2023 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/100546

✅ No Failures

ezyang commented May 5, 2023

pytorchmergebot commented May 5, 2023

Merge started

ezyang commented May 3, 2023 •

edited

Loading

pytorch-bot bot commented May 3, 2023 •

edited

Loading