Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XLA tensor metadata differs from eager. #7983

Open
ysiraichi opened this issue Sep 10, 2024 · 0 comments
Open

XLA tensor metadata differs from eager. #7983

ysiraichi opened this issue Sep 10, 2024 · 0 comments

Comments

@ysiraichi
Copy link
Collaborator

Consider the following program:

def run(device):
    a = torch.rand(10).to(device)
    b = a[2::2]
    return (b.stride(), b.storage_offset(), b.is_contiguous(), b.as_strided((4,), (1,)))

>>> run("cpu")
((2,), 2, False, tensor([2, 3, 4, 5]))

>>> run("xla")
((1,), 0, True, tensor([0, 1, 2, 3], device='xla:0'))

Question: should XLA tensors' metadata reflect: (i) the actual data storage; or (ii) eager metadata?

I would argue that we should do (ii) instead of (i) for the following reasons:

  • Since the introduction of functionalization to PyTorch core, we call set_sizes_strides_offset at the end of each kernel for propagating computed reference metadata to functional tensors
    • There's a bug here, though (fix). It doesn't propagates to the inner XLA tensor, even though it forwards metadata function calls to its inner XLA tensor
  • I would expect the 4th return value (program above) to be the same on both eager and XLA
    • Whenever as_strided is called with no storage offset argument, it preserves the storage offset of the tensor
    • We need to know what the tensor's original storage offset was
  • Internally (AFAIU), we don't depend on neither strides nor storage offset. So, we could just make them represent whatever eager would have

Currently, I have this stack of PRs that do just that: propagate the computed reference metadata inside functional tensors. However, some tests are failing because of this assertion on the contiguity of the tensor. One way around this is to remove the assertion, leaving the metadata as (ii).

cc @miladm @JackCaoG @alanwaketan @wconstab

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant