-
Notifications
You must be signed in to change notification settings - Fork 22.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce fast path in the CPU equal op #100024
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/100024
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 3e92a89: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This pull request was exported from Phabricator. Differential Revision: D45282119 |
aten/src/ATen/native/ReduceOps.cpp
Outdated
// ensuring the storage and strides exactly the same. | ||
if (self.sizes().equals(other.sizes()) | ||
&& self.strides().equals(other.strides()) | ||
&& self.storage().is_alias_of(other.storage()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since storage is untyped, you should check the the dtype() matches here as well.
A good test for this would be:
import torch
a = torch.rand((2, 2), dtype=torch.float)
b = a.view(dtype=torch.int32)
print(torch.equal(a, b))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Added
This pull request was exported from Phabricator. Differential Revision: D45282119 |
cc2794e
to
24306ba
Compare
24306ba
to
60fa585
Compare
This pull request was exported from Phabricator. Differential Revision: D45282119 |
60fa585
to
3119f66
Compare
This pull request was exported from Phabricator. Differential Revision: D45282119 |
3119f66
to
e7668cd
Compare
This pull request was exported from Phabricator. Differential Revision: D45282119 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D45282119 |
e7668cd
to
c93319b
Compare
&& self.storage_offset() == other.storage_offset() | ||
&& self.layout() == other.layout() | ||
&& self.is_neg() == other.is_neg() | ||
&& self.is_conj() == other.is_conj()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to test these three, they will have been handled before getting here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be safe, I am a bit concerned in some cases, if users directly this cpu_equal function directly, although this should be rare. Keeping these checks shouldn't hurt, or we concern about the overhead for these calls?
aten/src/ATen/native/ReduceOps.cpp
Outdated
// TensorIterator, it should be safe to have the following fast path by | ||
// ensuring the storage and strides exactly the same. | ||
if (self.dtype() == other.dtype() | ||
&& self.sizes().equals(other.sizes()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to test this, it's tested above
aten/src/ATen/native/ReduceOps.cpp
Outdated
// ensuring the storage and strides exactly the same. | ||
if (self.dtype() == other.dtype() | ||
&& self.sizes().equals(other.sizes()) | ||
&& self.strides().equals(other.strides()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A fastpath for this would be to instead assert both tensors are contiguous, before checking their strides
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might also want this to apply to cuda too.
Yeah, I will handle the CUDA one in the following PR. |
This pull request was exported from Phabricator. Differential Revision: D45282119 |
c93319b
to
5973cf1
Compare
5973cf1
to
9c9595c
Compare
This pull request was exported from Phabricator. Differential Revision: D45282119 |
9c9595c
to
ce625e6
Compare
This pull request was exported from Phabricator. Differential Revision: D45282119 |
1 similar comment
This pull request was exported from Phabricator. Differential Revision: D45282119 |
ce625e6
to
238234d
Compare
Summary: Pull Request resolved: pytorch#100024 When two tensors share the same storage, and strides, and no other flags, then we should consider this tensors as equal. We have another approach in pytorch#99703, which is directly check equality in the JIT loader. However, we may have to handle the flags like neg/conj explicitly. It's a bit hard to cover all the cases. Per discussion with davidberard98, in the flags like neg/conj should be handled by the dispatcher already (and the TensorIterator logic also proves this), so adding the fast path to CPU and CUDA ops should be a better/safer approach. Test Plan: buck2 test @//mode/opt //caffe2/test:torch -- --exact 'caffe2/test:torch - test_equal (test_torch.TestTorch)' Reviewed By: hyuen Differential Revision: D45282119 fbshipit-source-id: 18e939d236a6d84a79013317db8b2f715f4a3cff
238234d
to
3e92a89
Compare
This pull request was exported from Phabricator. Differential Revision: D45282119 |
`torch.equal(x, x)` should return false if one of `x` is a tenor of floats one of which is NaN. So, it renders some of the optimization proposed in #100024 invalid, though as result `torch.equal` will become much slower for identical floating point tensors. Add regression test that calls torch.equal for tensor containing NaN Fixes #111251 Pull Request resolved: #111699 Approved by: https://github.com/Skylion007, https://github.com/albanD
`torch.equal(x, x)` should return false if one of `x` is a tenor of floats one of which is NaN. So, it renders some of the optimization proposed in #100024 invalid, though as result `torch.equal` will become much slower for identical floating point tensors. Add regression test that calls torch.equal for tensor containing NaN Fixes #111251 Pull Request resolved: #111699 Approved by: https://github.com/Skylion007, https://github.com/albanD (cherry picked from commit 7709382)
`torch.equal(x, x)` should return false if one of `x` is a tenor of floats one of which is NaN. So, it renders some of the optimization proposed in #100024 invalid, though as result `torch.equal` will become much slower for identical floating point tensors. Add regression test that calls torch.equal for tensor containing NaN Fixes #111251 Pull Request resolved: #111699 Approved by: https://github.com/Skylion007, https://github.com/albanD (cherry picked from commit 7709382)
`torch.equal(x, x)` should return false if one of `x` is a tenor of floats one of which is NaN. So, it renders some of the optimization proposed in pytorch#100024 invalid, though as result `torch.equal` will become much slower for identical floating point tensors. Add regression test that calls torch.equal for tensor containing NaN Fixes pytorch#111251 Pull Request resolved: pytorch#111699 Approved by: https://github.com/Skylion007, https://github.com/albanD
`torch.equal(x, x)` should return false if one of `x` is a tenor of floats one of which is NaN. So, it renders some of the optimization proposed in pytorch#100024 invalid, though as result `torch.equal` will become much slower for identical floating point tensors. Add regression test that calls torch.equal for tensor containing NaN Fixes pytorch#111251 Pull Request resolved: pytorch#111699 Approved by: https://github.com/Skylion007, https://github.com/albanD
Summary: When two tensors share the same storage, and strides, and no other flags, then we should consider this tensors as equal.
Test Plan: buck2 test @//mode/opt //caffe2/test:torch -- --exact 'caffe2/test:torch - test_equal (test_torch.TestTorch)'
Differential Revision: D45282119