[benchmarks] Run some models with smaller batch sizes. #6542

ysiraichi · 2024-02-15T14:40:25Z

This PR adapts the code in PyTorch main repo, so that a few models are executed with smaller batch sizes. This is an effort towards making the benchmarking scripts behavior closer.

In summary, 3 new sets are introduced:

USE_SMALL_BATCH_SIZE: batch sizes for training
INFERENCE_SMALL_BATCH_SIZE: batch sizes for inference
DONT_CHANGE_BATCH_SIZE: models whose batch size can't be changed in the command line

cc @miladm

ysiraichi · 2024-02-15T14:40:46Z

This PR should only be merged after: #6518

frgossen · 2024-02-15T14:54:31Z

benchmarks/torchbench_model.py

@@ -144,6 +144,48 @@
    "hf_T5_generate",
 }

+# This list was extracted from PyTorch's repository: benchmarks/dynamo/torchbench.py
+FORCE_AMP_FOR_FP16_BF16_MODELS = {


These lists feel prone to divergence. Is this how PyTorch does it, too?

Yukio: can you extract these lists from where they are so that we can import them? That will eliminate any maintenance burden from us (i.e. I don't want us to have to manually keep these lists in sync with the ones in Pytorch.)
IIRC you did something similar with the deny list being extracted into a YAML file.

Yes. There is a YAML file for skipped models. These models aren't included in that. I guess I could make the change to include these in the PyTorch repo.

golechwierowicz · 2024-02-15T15:30:10Z

This is an effort towards making the benchmarking scripts behavior closer.

What is the point of this? Is this to just get numbers for Inductor as close to the HUD as possible?

ysiraichi · 2024-02-15T16:40:26Z

That too. But another thing is that this may also prevent OOMs that we are getting even on inductor.

ysiraichi · 2024-02-16T19:33:22Z

@frgossen @golechwierowicz do you think we can merge this PR?

frgossen · 2024-02-20T14:25:34Z

benchmarks/torchbench_model.py

 FORCE_FP16_FOR_BF16_MODELS = {"vision_maskrcnn"}

+# Some models have large dataset that doesn't fit in memory. Lower the batch
+# size to test the accuracy.
+# This list was extracted from PyTorch's repository: benchmarks/dynamo/torchbench.py


Do you think we can factor these lists out there and reuse them? That way we would be more robust against divergence.

frgossen · 2024-02-20T14:32:10Z

What worries me a little about PRs like this is that we will almost certainly diverge over time.
Until we can also operate on Cuda tensors, we likely have to work with our forked benchmarking script but we should try to reuse as much as we can from upstream PyTorch.

Do you think we can import and reuse the lists from PyTorch?

ysiraichi · 2024-02-20T21:47:59Z

Right. I will try to move those lists to a YAML file and, then, update this PR.

ysiraichi · 2024-02-21T08:35:59Z

Waiting for pytorch/pytorch#120299

frgossen · 2024-02-22T16:45:17Z

Sounds good. Ty!

ysiraichi · 2024-02-25T12:56:48Z

@frgossen @golechwierowicz I think this PR is ready for review. Could you review it when you have some time?

Here's a summary of the changes:

Deleted the introduced lists, since we already have the YAML file
Read and parse the YAML file once in the torchbench_model.py file
- Used by both TorchBenchModelLoader (skips) and TorchBenchModel (batch size)
Move find_near_file from torchbench_model.py to utils.py

frgossen

One comment.
Thanks you!

frgossen · 2024-02-26T15:32:36Z

benchmarks/torchbench_model.py

+  its lists of models into sets of models.
+  """
+
+  benchmarks_dynamo_dir = find_near_file(


I think we can make the assumtion that the xla root is at pytorch/xla. Allowing this flexibility with find_near_file feels like it will be hard to debug eventually

I think @zpcore had a setup where pytorch and xla were sibling folders.

Would it make sense to agree on one setup so keep things simpler?

Sure. I don't mind.
@zpcore Thoughts?

If we decide to make the assumption, we should specify it in https://github.com/pytorch/xla/blob/master/benchmarks/README.md

I will merge this PR, and open another one for this change.

Maybe the better solution is to add the file location in setup.py in the future. We can use import pkg_resources to find the file location.

ysiraichi requested review from cota, vanbasten23, frgossen, golechwierowicz and zpcore February 15, 2024 14:40

frgossen reviewed Feb 15, 2024

View reviewed changes

ysiraichi force-pushed the ysiraichi/fix-batch-size branch from aa6dd72 to d6101fa Compare February 16, 2024 13:58

ysiraichi mentioned this pull request Feb 19, 2024

Failing Torchbench Models: tracking issue #5932

Open

frgossen reviewed Feb 20, 2024

View reviewed changes

ysiraichi mentioned this pull request Feb 23, 2024

[benchmarks] Fix YAML file name. #6597

Merged

ysiraichi added 2 commits February 23, 2024 16:07

Use smaller batch sizes for some models.

4873571

Using YAML configuration.

10419ed

ysiraichi force-pushed the ysiraichi/fix-batch-size branch from d6101fa to 10419ed Compare February 23, 2024 19:22

Fix lint issues.

d38cc3f

frgossen approved these changes Feb 26, 2024

View reviewed changes

ysiraichi merged commit cd47390 into master Feb 27, 2024
17 of 18 checks passed

ysiraichi mentioned this pull request Feb 28, 2024

[benchmarks] Small fixes for benchmarking script. #6632

Merged

amithrm pushed a commit to amithrm/xla that referenced this pull request Mar 1, 2024

[benchmarks] Run some models with smaller batch sizes. (pytorch#6542)

40bc884

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[benchmarks] Run some models with smaller batch sizes. #6542

[benchmarks] Run some models with smaller batch sizes. #6542

ysiraichi commented Feb 15, 2024

ysiraichi commented Feb 15, 2024

frgossen Feb 15, 2024

ysiraichi Feb 15, 2024

cota Feb 16, 2024

ysiraichi Feb 17, 2024

golechwierowicz commented Feb 15, 2024

ysiraichi commented Feb 15, 2024

ysiraichi commented Feb 16, 2024

frgossen Feb 20, 2024

frgossen commented Feb 20, 2024

ysiraichi commented Feb 20, 2024

ysiraichi commented Feb 21, 2024

frgossen commented Feb 22, 2024

ysiraichi commented Feb 25, 2024

frgossen left a comment

frgossen Feb 26, 2024

ysiraichi Feb 26, 2024

frgossen Feb 26, 2024

ysiraichi Feb 26, 2024

vanbasten23 Feb 26, 2024

ysiraichi Feb 27, 2024

zpcore Feb 27, 2024

[benchmarks] Run some models with smaller batch sizes. #6542

[benchmarks] Run some models with smaller batch sizes. #6542

Conversation

ysiraichi commented Feb 15, 2024

ysiraichi commented Feb 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

golechwierowicz commented Feb 15, 2024

ysiraichi commented Feb 15, 2024

ysiraichi commented Feb 16, 2024

Choose a reason for hiding this comment

frgossen commented Feb 20, 2024

ysiraichi commented Feb 20, 2024

ysiraichi commented Feb 21, 2024

frgossen commented Feb 22, 2024

ysiraichi commented Feb 25, 2024

frgossen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment