Make `as_strided_copy` materialize a new tensor with `index`. #6624

ysiraichi · 2024-02-27T20:18:23Z

This PR implements arbitrary as_strided function by decomposing it into slicing+indexing. In summary, we slice the base tensor for complying with the given storage_offset, and then index a flattened version of the tensor, gathering the desired elements, based on the given size and strides. (more explanation in the code).

cc @miladm @JackCaoG @lezcano

lezcano · 2024-02-27T20:25:50Z

Haven't looked at the code in depth, but this sounds plausible. Will review tomorrow.

@bdhirsh we could use this to functionalize as_strided. With this same trick we can even write to the relevant view via index_put.

lezcano

This is a great algorithm! Thank you @ysiraichi!

I think it is correct modulo potential corner-cases that may pop up.

torch_xla/csrc/aten_xla_type.cpp

lezcano · 2024-02-28T11:30:38Z

torch_xla/csrc/aten_xla_type.cpp

+  if (storage_offset.has_value() && *storage_offset > 0) {
+    // If there's a storage_offset, slice this tensor, first.
+    tensor = slice_copy(tensor, 0, *storage_offset, c10::nullopt, 1);
+  }


You can do this, or simply add storage_offset to index_tensor at the end.

You are right. I kind of thought about it for a second and, for some reason, decided it wouldn't be correct. But, on second thoughts, it does make sense.

lezcano · 2024-02-28T11:34:12Z

torch_xla/csrc/aten_xla_type.cpp

+  // Flatten the tensor, so that it's easier to gather its elements.
+  tensor = view_copy_symint(tensor, {tensor.numel()});


Rather than this flattening + index, you can simply use torch.take.

torch_xla/csrc/aten_xla_type.cpp

lezcano

Logic lgtm

ysiraichi · 2024-02-29T13:49:40Z

@JackCaoG Could you take a look at this PR whenever you have some time?

ysiraichi · 2024-02-29T20:09:55Z

I believe these export tests are unrelated.

@JackCaoG @zpcore @frgossen @vanbasten23 @cota @golechwierowicz
Have you seen these, before?

JackCaoG · 2024-02-29T20:51:23Z

not really, @lsy323 do you know what this unbounded export test is doing?

ysiraichi · 2024-03-01T20:01:33Z

@JackCaoG @alanwaketan Could you take a look at this PR when you have some time?

alanwaketan

Generally LGTM, and only have one question.

alanwaketan · 2024-03-05T01:20:18Z

torch_xla/csrc/aten_xla_type.cpp

+  //     [[[0]]]
+  //
+  std::vector<int64_t> view_shape(dim, 1);
+  auto index_tensor =


I assume this is computed by cpu eager in the following code?

Yes. Given size, stride, and offset argument spec, we compute ahead of time the correct indices for materializing the tensor. No need for computing at runtime.

lsy323 · 2024-03-07T22:54:16Z

Hi @ysiraichi, I found this PR causes some performance regression on TPU v4-8 (Also can be repro in v3-8). The regression can be reproduced by running the following command:

python test/test_train_mp_imagenet.py --model=resnet50 --log_steps=200 --ddp --pjrt_distributed --fake_data --batch_size=256

When I'm at b8864fc5a5ba91640904b075d69aee0c5f9ceff4, the speed is:

Epoch 1 train begin 21:58:52
| Training Device=xla:0/2 Epoch=1 Step=0 Loss=6.89059 Rate=0.00 GlobalRate=0.00 Time=22:00:16
| Training Device=xla:0/3 Epoch=1 Step=0 Loss=6.89059 Rate=0.00 GlobalRate=0.00 Time=22:00:16
| Training Device=xla:0/0 Epoch=1 Step=0 Loss=6.89059 Rate=0.00 GlobalRate=0.00 Time=22:00:16
| Training Device=xla:0/1 Epoch=1 Step=0 Loss=6.89059 Rate=0.00 GlobalRate=0.00 Time=22:00:16
| Training Device=xla:0/0 Epoch=1 Step=200 Loss=0.04890 Rate=0.00 GlobalRate=0.00 Time=22:02:37
| Training Device=xla:0/1 Epoch=1 Step=200 Loss=0.04890 Rate=0.00 GlobalRate=0.00 Time=22:02:37
| Training Device=xla:0/3 Epoch=1 Step=200 Loss=0.04890 Rate=0.00 GlobalRate=0.00 Time=22:02:37
| Training Device=xla:0/2 Epoch=1 Step=200 Loss=0.04890 Rate=0.00 GlobalRate=0.00 Time=22:02:37
| Training Device=xla:0/1 Epoch=1 Step=400 Loss=0.01260 Rate=0.00 GlobalRate=0.00 Time=22:03:09
| Training Device=xla:0/0 Epoch=1 Step=400 Loss=0.01260 Rate=0.00 GlobalRate=0.00 Time=22:03:09
| Training Device=xla:0/2 Epoch=1 Step=400 Loss=0.01260 Rate=0.00 GlobalRate=0.00 Time=22:03:09
| Training Device=xla:0/3 Epoch=1 Step=400 Loss=0.01260 Rate=0.00 GlobalRate=0.00 Time=22:03:09

When I'm at 3abc21df7aaa176804d3cbbc60f5078d579831b7, it's much slower.

Epoch 1 train begin 22:11:06
| Training Device=xla:0/3 Epoch=1 Step=0 Loss=6.89059 Rate=0.00 GlobalRate=0.00 Time=22:17:32
| Training Device=xla:0/2 Epoch=1 Step=0 Loss=6.89059 Rate=0.00 GlobalRate=0.00 Time=22:17:32
| Training Device=xla:0/0 Epoch=1 Step=0 Loss=6.89059 Rate=0.00 GlobalRate=0.00 Time=22:17:32
| Training Device=xla:0/1 Epoch=1 Step=0 Loss=6.89059 Rate=0.00 GlobalRate=0.00 Time=22:17:32
| Training Device=xla:0/3 Epoch=1 Step=200 Loss=0.04890 Rate=0.00 GlobalRate=0.00 Time=22:31:14
| Training Device=xla:0/2 Epoch=1 Step=200 Loss=0.04890 Rate=0.00 GlobalRate=0.00 Time=22:31:14
| Training Device=xla:0/0 Epoch=1 Step=200 Loss=0.04890 Rate=0.00 GlobalRate=0.00 Time=22:31:14
| Training Device=xla:0/1 Epoch=1 Step=200 Loss=0.04890 Rate=0.00 GlobalRate=0.00 Time=22:31:14
| Training Device=xla:0/1 Epoch=1 Step=400 Loss=0.01260 Rate=0.00 GlobalRate=0.00 Time=22:37:41
| Training Device=xla:0/2 Epoch=1 Step=400 Loss=0.01260 Rate=0.00 GlobalRate=0.00 Time=22:37:41
| Training Device=xla:0/0 Epoch=1 Step=400 Loss=0.01260 Rate=0.00 GlobalRate=0.00 Time=22:37:41
| Training Device=xla:0/3 Epoch=1 Step=400 Loss=0.01260 Rate=0.00 GlobalRate=0.00 Time=22:37:41

I'm reverting this PR for now, since we are close to the 2.3 branch cut date (March 11th).

Could you please re-land the PR after the perf regression is resolved? Thanks a lot

…#6624)" This reverts commit 3abc21d.

This comment was marked as outdated.

Sign in to view

lezcano reviewed Feb 28, 2024

View reviewed changes

ysiraichi marked this pull request as ready for review February 28, 2024 14:26

ysiraichi requested a review from JackCaoG February 28, 2024 14:26

ysiraichi force-pushed the ysiraichi/fix-overlapped-asstride branch from bc6409c to 7afeb56 Compare February 28, 2024 14:32

lezcano reviewed Feb 28, 2024

View reviewed changes

torch_xla/csrc/aten_xla_type.cpp Show resolved Hide resolved

torch_xla/csrc/aten_xla_type.cpp Show resolved Hide resolved

lezcano approved these changes Feb 28, 2024

View reviewed changes

ysiraichi force-pushed the ysiraichi/fix-overlapped-asstride branch from 7afeb56 to 9ea4600 Compare February 29, 2024 13:48

ysiraichi requested a review from wonjoolee95 February 29, 2024 13:49

ysiraichi requested a review from alanwaketan February 29, 2024 13:50

ysiraichi added 7 commits March 1, 2024 11:56

Make as_strided_copy materialize a new tensor with index.

12a6ed1

Implement suggested changes.

73a3d5a

Fix 0-sized size and stride case.

2050390

Add more as_strided tests.

da48868

Fix tests.

d56b07e

Fix test skip.

b6b64c7

Add skip.

8bb2aea

ysiraichi force-pushed the ysiraichi/fix-overlapped-asstride branch from 4b66320 to 8bb2aea Compare March 1, 2024 15:32

ysiraichi mentioned this pull request Mar 4, 2024

Failing Torchbench Models: tracking issue #5932

Open

alanwaketan approved these changes Mar 5, 2024

View reviewed changes

alanwaketan merged commit 3abc21d into master Mar 5, 2024
18 checks passed

lsy323 added a commit that referenced this pull request Mar 7, 2024

Revert "Make as_strided_copy materialize a new tensor with index. (…

704fca3

…#6624)" This reverts commit 3abc21d.

lsy323 mentioned this pull request Mar 7, 2024

Revert "Make as_strided_copy materialize a new tensor with index.… #6693

Merged

ysiraichi mentioned this pull request Mar 8, 2024

Re-land: Make as_strided_copy materialize a new tensor with index. #6697

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `as_strided_copy` materialize a new tensor with `index`. #6624

Make `as_strided_copy` materialize a new tensor with `index`. #6624

ysiraichi commented Feb 27, 2024

This comment was marked as outdated.

lezcano commented Feb 27, 2024 •

edited

Loading

lezcano left a comment

lezcano Feb 28, 2024

ysiraichi Feb 28, 2024

lezcano Feb 28, 2024

lezcano left a comment

ysiraichi commented Feb 29, 2024 •

edited

Loading

ysiraichi commented Feb 29, 2024

JackCaoG commented Feb 29, 2024

ysiraichi commented Mar 1, 2024

alanwaketan left a comment

alanwaketan Mar 5, 2024

ysiraichi Mar 5, 2024

alanwaketan Mar 5, 2024

lsy323 commented Mar 7, 2024

		// Flatten the tensor, so that it's easier to gather its elements.
		tensor = view_copy_symint(tensor, {tensor.numel()});

Make as_strided_copy materialize a new tensor with index. #6624

Make as_strided_copy materialize a new tensor with index. #6624

Conversation

ysiraichi commented Feb 27, 2024

This comment was marked as outdated.

lezcano commented Feb 27, 2024 • edited Loading

lezcano left a comment

Choose a reason for hiding this comment

lezcano Feb 28, 2024

Choose a reason for hiding this comment

ysiraichi Feb 28, 2024

Choose a reason for hiding this comment

lezcano Feb 28, 2024

Choose a reason for hiding this comment

lezcano left a comment

Choose a reason for hiding this comment

ysiraichi commented Feb 29, 2024 • edited Loading

ysiraichi commented Feb 29, 2024

JackCaoG commented Feb 29, 2024

ysiraichi commented Mar 1, 2024

alanwaketan left a comment

Choose a reason for hiding this comment

alanwaketan Mar 5, 2024

Choose a reason for hiding this comment

ysiraichi Mar 5, 2024

Choose a reason for hiding this comment

alanwaketan Mar 5, 2024

Choose a reason for hiding this comment

lsy323 commented Mar 7, 2024

Make `as_strided_copy` materialize a new tensor with `index`. #6624

Make `as_strided_copy` materialize a new tensor with `index`. #6624

lezcano commented Feb 27, 2024 •

edited

Loading

ysiraichi commented Feb 29, 2024 •

edited

Loading