-
Notifications
You must be signed in to change notification settings - Fork 22.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TensorExpr eval: fix copying variables from pointers on big endian systems #96951
TensorExpr eval: fix copying variables from pointers on big endian systems #96951
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/96951
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 256ded0: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
torch/csrc/jit/tensorexpr/eval.cpp
Outdated
#define TYPE_CASE(Type, Name) \ | ||
case ScalarType::Name: { \ | ||
Type typed_data; \ | ||
size_t idx = (sizeof(Type) >= sizeof(void*)) ? 0 : (sizeof(void*) - sizeof(Type)); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose little-endian and big-endian have the same bit-width, and the only difference is how to store the data. Regarding this change, it seems like it serves not only big-endian but also a new platform. Right? Because I do not quite understand why you need to compare with sizeof(void*)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is intended for existing big endian platforms and tested only on s390x.
I guess it might be possible to simplify this expression if we assume that sizeof(Type) always <= sizeof(void*).
I'll update this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlekseiNikiforovIBM , the sizeof
a pointer could be 32bit on a 32bit system while its bit width should be less than sizeof(int64_t)
/sizeof(double)
.
2d628d7
to
f8cd7e9
Compare
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
need lintfix |
f8cd7e9
to
78e4e96
Compare
There's one more issue:
It looks like static assert fails on one of platforms for "double" type. As far as I'm aware, only "int" type is saved as pointer now, but I might be incorrect: Maybe, "double" type should be excluded here? Or only int types should remain? |
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
@EikanWang do you know? |
TE supports most torch datatypes including |
@AlekseiNikiforovIBM , This is a python interface that intends to be used from python directly. The initial idea was to enable the user could use Python to develop the TE program. Something like this - https://github.com/pytorch/pytorch/blob/master/test/test_tensorexpr_pybind.py. But the L833 code does not cover all the cases that have been supported by |
torch/csrc/jit/tensorexpr/eval.cpp
Outdated
case ScalarType::Name: { \ | ||
static_assert( \ | ||
sizeof(Type) <= sizeof(void*), \ | ||
"Can't read data bigger than sizeof(void*) from pointer"); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can not assume this statement - sizeof(Type) <= sizeof(void*)
It theoretically could support data types other than tensors and ints in "call" and "call_raw", but only code pieces and cases I see are actually only those 2 listed above. I don't see any implementation for basic types other than int, and current approach wouldn't work for types bigger than pointer anyway. Currently, this is only piece of code I could find which processes types other than tensors in related code. https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/eval.cpp#L1280-L1292 Here's the code reading previously stored value. Now, let's for example take "double" (int64_t would work same) as value instead of original "int", on 32bit systems. What I'm trying to say, is that for values longer than sizeof(void*) current approach won't work. |
78e4e96
to
51b18bf
Compare
51b18bf
to
95b7bcb
Compare
It is an issue of TE to support Let's separate the two things:
@AlekseiNikiforovIBM , How about let's only focus on the second thing for this PR to meet your requirement now? Regarding the first one, We need to sort out a good design to fix it but not this PR. |
I've tested how It goes through and I guess it always makes buffers, and check https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/eval.cpp#L1274-L1278 I see there are multiple checks failed. Any of them I should take a look in regards to this patch? I agree with separation of two topics, big endian support and If such places are in tests, they should fail for this pull request. And they'd have been failing on big endian systems right now, but they're passing. Please let me know if I missed new failing tests for this pull request too. So, this pull request fixes TE big endian support, and while it disables |
Then how about let's add something like follows to ensure the input is neither TORCH_CHECK(bufArg.dtype().scalar_type() != c10::ScalarType::Double);
TORCH_CHECK(bufArg.dtype().scalar_type() != c10::ScalarType::Long); With the guard, your code does not need to add Again, I need to highlight that Python float and int are translated to double and int64_t respectively. |
A few lines below it's already caught in default case, and exception would be thrown: Should it still be added? |
It means that we still need to support double and int64_t because the Python float and int are translated to 64bit as I mentioned before. |
@AlekseiNikiforovIBM , I submitted a PR to fix the 32-bit issue - #97669. |
95b7bcb
to
4efba05
Compare
@AlekseiNikiforovIBM my PR #97669 has been merged, please rebase this PR. |
torch/csrc/jit/tensorexpr/eval.cpp
Outdated
TORCH_CHECK(bufArg.dtype().scalar_type() != c10::ScalarType::Double); | ||
TORCH_CHECK(bufArg.dtype().scalar_type() != c10::ScalarType::Long); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TORCH_CHECK(bufArg.dtype().scalar_type() != c10::ScalarType::Double); | |
TORCH_CHECK(bufArg.dtype().scalar_type() != c10::ScalarType::Long); |
torch/csrc/jit/tensorexpr/eval.cpp
Outdated
static_assert( \ | ||
sizeof(Type) <= sizeof(void*), \ | ||
"Can't read data bigger than sizeof(void*) from pointer"); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static_assert( \ | |
sizeof(Type) <= sizeof(void*), \ | |
"Can't read data bigger than sizeof(void*) from pointer"); \ |
torch/csrc/jit/tensorexpr/eval.cpp
Outdated
static_assert( \ | ||
sizeof(Type) <= sizeof(void*), \ | ||
"Can't read data bigger than sizeof(void*) from pointer"); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static_assert( \ | |
sizeof(Type) <= sizeof(void*), \ | |
"Can't read data bigger than sizeof(void*) from pointer"); \ |
torch/csrc/jit/tensorexpr/eval.cpp
Outdated
/* only types not longer than pointer can be stored in pointer */ | ||
TYPE_CASE(uint8_t, Byte) | ||
TYPE_CASE(int8_t, Char) | ||
TYPE_CASE(int16_t, Short) | ||
TYPE_CASE(int, Int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/* only types not longer than pointer can be stored in pointer */ | |
TYPE_CASE(uint8_t, Byte) | |
TYPE_CASE(int8_t, Char) | |
TYPE_CASE(int16_t, Short) | |
TYPE_CASE(int, Int) | |
AT_FORALL_SCALAR_TYPES_AND3(Bool, Half, BFloat16, TYPE_CASE); |
4efba05
to
10051fe
Compare
Thanks, with your change the fix can be simplified. |
@@ -830,7 +830,7 @@ void initTensorExprBindings(PyObject* module) { | |||
value_ptrs.reserve(py::len(values)); | |||
for (const auto& value : values) { | |||
if (py::isinstance<py::int_>(value)) { | |||
value_ptrs.emplace_back(value.cast<int64_t>()); | |||
value_ptrs.emplace_back(value.cast<int>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, you cannot assume py::int_
is int
. https://github.com/pybind/pybind11/blob/80dc998efced8ceb2be59756668a7e90e8bef917/include/pybind11/pytypes.h#L1722-L1734
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, didn't know about that. Will update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be fine here to loop not only though values
, but through self.buffer_args()
at same time? In that case on big endian systems for py::int_
value can be casted to correct type, correctly stored in buffer, and later correctly read.
The problem with calculating correct offset in SimpleIREvaluator::bindArg
is that now void *data
is pointer to some buffer, but no size of buffer is passed. Either have to correctly fill this buffer from the start or add size of buffer as argument as well. Or just assume it's always 8 bytes, but that doesn't sound good to me.
…stems When copying data from pointers, only lowest bytes are copied. On little endian systems they are located at the beginning of pointer. On big endian systems they are located at the end of pointer. Place data correctly from the start in the buffer by casting to correct int type. This change fixes TestTensorExprPyBind::test_dynamic_shape and TestTensorExprPyBind::test_dynamic_shape_2d tests from test/test_tensorexpr_pybind.py on big endian systems.
10051fe
to
256ded0
Compare
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…stems (pytorch#96951) When copying data from pointers, only lowest bytes are copied. On little endian systems they are located at the beginning of pointer. On big endian systems they are located at the end of pointer. This change fixes TestTensorExprPyBind::test_dynamic_shape and TestTensorExprPyBind::test_dynamic_shape_2d tests from test/test_tensorexpr_pybind.py on big endian systems. Pull Request resolved: pytorch#96951 Approved by: https://github.com/ezyang, https://github.com/EikanWang
When copying data from pointers, only lowest bytes are copied. On little endian systems they are located at the beginning of pointer. On big endian systems they are located at the end of pointer.
This change fixes TestTensorExprPyBind::test_dynamic_shape and TestTensorExprPyBind::test_dynamic_shape_2d tests from test/test_tensorexpr_pybind.py on big endian systems.
cc @EikanWang @jgong5