Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

F.embedding_bag(..., mode='max') yields different results than PyTorch eager. #7588

Open
ysiraichi opened this issue Jun 28, 2024 · 2 comments
Assignees
Labels

Comments

@ysiraichi
Copy link
Collaborator

🐛 Bug

Running the script bellow yields in an AssertionError. This issue occurs only when requires_grad=False, which triggers the execution of _embedding_bag_forward_only.

EMB = 10
DIM = 5
N = 5

def fn(x, w, o):
    return torch.nn.functional.embedding_bag(x, w, o, mode="max")

x = torch.randint(0, EMB, (N,), dtype=torch.long)
w = torch.randn((EMB, DIM), requires_grad=False)
o = torch.tensor([0, 3], dtype=torch.long)

out = fn(x, w, o)
Xout = fn(x.to(xm.xla_device()), w.to(xm.xla_device()), o.to(xm.xla_device()))

assert torch.allclose(out, Xout.cpu()), f"{out=} not close to {Xout=}"
Traceback (most recent call last):
  File "examples/scratch.py", line 179, in <module>
    assert torch.allclose(out, Xout.cpu()), f"{out=} not close to {Xout=}"
AssertionError: out=tensor([[ 0.6277,  1.6069,  0.1294,  0.0666,  1.4192],
        [ 0.6289,  0.0599,  0.4328,  0.9031, -0.6462]]) not close to Xout=tensor([[0.6277, 1.6069, 0.1294, 0.0666, 1.4192],
        [0.6289, 0.0599, 0.4328, 0.9031, 0.0000]], device='xla:0')

Expected behavior

Results should be close.

Environment

  • Reproducible on XLA backend [CPU/TPU/CUDA]: CUDA
  • torch_xla version: 7d41035

Additional context

Upon further inspection, it feels like the issue has something to do with negative numbers. Wherever there was supposed to be a negative value in the output, I see it getting truncated to 0.

cc @miladm @JackCaoG @bhavya01

@bhavya01
Copy link
Collaborator

I can take a look at this.

@bhavya01 bhavya01 self-assigned this Jun 28, 2024
@miladm
Copy link
Collaborator

miladm commented Jun 28, 2024

looks like two of the output elements are inconsistent with one being zero.

do we know how this would perform on torch_xla eager? @bhavya01

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants