Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorExpr eval: fix copying variables from pointers on big endian systems #96951

Conversation

AlekseiNikiforovIBM
Copy link
Collaborator

@AlekseiNikiforovIBM AlekseiNikiforovIBM commented Mar 16, 2023

When copying data from pointers, only lowest bytes are copied. On little endian systems they are located at the beginning of pointer. On big endian systems they are located at the end of pointer.

This change fixes TestTensorExprPyBind::test_dynamic_shape and TestTensorExprPyBind::test_dynamic_shape_2d tests from test/test_tensorexpr_pybind.py on big endian systems.

cc @EikanWang @jgong5

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 16, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/96951

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 256ded0:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: jit release notes category label Mar 16, 2023
@github-actions github-actions bot added the NNC label Mar 16, 2023
@ezyang
Copy link
Contributor

ezyang commented Mar 16, 2023

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 16, 2023
@ezyang ezyang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Mar 16, 2023
@ezyang ezyang requested review from EikanWang and jgong5 March 16, 2023 16:28
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

#define TYPE_CASE(Type, Name) \
case ScalarType::Name: { \
Type typed_data; \
size_t idx = (sizeof(Type) >= sizeof(void*)) ? 0 : (sizeof(void*) - sizeof(Type)); \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose little-endian and big-endian have the same bit-width, and the only difference is how to store the data. Regarding this change, it seems like it serves not only big-endian but also a new platform. Right? Because I do not quite understand why you need to compare with sizeof(void*).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is intended for existing big endian platforms and tested only on s390x.
I guess it might be possible to simplify this expression if we assume that sizeof(Type) always <= sizeof(void*).
I'll update this change.

Copy link
Collaborator

@EikanWang EikanWang Mar 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlekseiNikiforovIBM , the sizeof a pointer could be 32bit on a 32bit system while its bit width should be less than sizeof(int64_t)/sizeof(double).

@ezyang
Copy link
Contributor

ezyang commented Mar 17, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@ezyang
Copy link
Contributor

ezyang commented Mar 17, 2023

need lintfix

@AlekseiNikiforovIBM
Copy link
Collaborator Author

There's one more issue:

2023-03-17T14:46:37.7006716Z /opt/ndk/toolchains/llvm/prebuilt/linux-x86_64/bin/clang++ --target=armv7-none-linux-androideabi21 --gcc-toolchain=/opt/ndk/toolchains/llvm/prebuilt/linux-x86_64 --sysroot=/opt/ndk/toolchains/llvm/prebuilt/linux-x86_64/sysroot -DAT_PER_OPERATOR_HEADERS -DCAFFE2_BUILD_MAIN_LIB -DCPUINFO_SUPPORTED_PLATFORM=1 -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -Iaten/src -I../../aten/src -I. -I../../ -I../../caffe2/aten/src/TH -Icaffe2/aten/src/TH -Icaffe2/aten/src -Icaffe2/../aten/src -I../../torch/csrc -I../../third_party/miniz-2.1.0 -I../../third_party/kineto/libkineto/include -Ivulkan -I../../aten/../third_party/VulkanMemoryAllocator -I../../aten/src/ATen/.. -I../../third_party/FXdiv/include -I../../c10/.. -I../../third_party/pthreadpool/include -I../../third_party/cpuinfo/include -I../../aten/src/ATen/native/quantized/cpu/qnnpack/include -I../../aten/src/ATen/native/quantized/cpu/qnnpack/src -I../../third_party/cpuinfo/deps/clog/include -I../../third_party/NNPACK/include -I../../. -I/opt/ndk/sources/third_party/vulkan/src/include -I../../third_party/FP16/include -I../../third_party/fmt/include -I../../third_party/flatbuffers/include -isystem ../../third_party/XNNPACK/include -isystem /opt/ndk/sources/third_party/vulkan/src/common -isystem ../../cmake/../third_party/eigen -isystem ../../caffe2 -g -DANDROID -fdata-sections -ffunction-sections -funwind-tables -fstack-protector-strong -no-canonical-prefixes -mfpu=vfpv3-d16 -fno-addrsig -march=armv7-a -mthumb -Wa,--noexecstack -Wformat -Werror=format-security -stdlib=libc++ -frtti -fexceptions  -ffunction-sections -fdata-sections -DNO_EXPORT -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DUSE_VULKAN_WRAPPER -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_VULKAN -DUSE_VULKAN_API -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=braced-scalar-init -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wvla-extension -Wno-range-loop-analysis -Wno-pass-failed -Wno-error=pedantic -Wno-error=old-style-cast -Wconstant-conversion -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-missing-braces -Wunused-lambda-capture -Qunused-arguments -fcolor-diagnostics -fdiagnostics-color=always -fno-math-errno -fno-trapping-math -Werror=format -g0 -Oz -DNDEBUG  -fPIC -Wall -Wextra -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-type-limits -Wno-array-bounds -Wno-sign-compare -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-missing-braces -Wno-range-loop-analysis -fvisibility=hidden -O2 -pthread -std=gnu++17 -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/eval.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/eval.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/torch/csrc/jit/tensorexpr/eval.cpp.o -c ../../torch/csrc/jit/tensorexpr/eval.cpp
2023-03-17T14:46:37.7013943Z ../../torch/csrc/jit/tensorexpr/eval.cpp:1303:55: error: static_assert failed "Can't read data bigger than sizeof(void*) from pointer"
2023-03-17T14:46:37.7117196Z     AT_FORALL_SCALAR_TYPES_AND3(Bool, Half, BFloat16, TYPE_CASE);
2023-03-17T14:46:37.7118550Z     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
2023-03-17T14:46:37.7119211Z ../../c10/core/ScalarType.h:190:3: note: expanded from macro 'AT_FORALL_SCALAR_TYPES_AND3'
2023-03-17T14:46:37.7119677Z   _(int64_t, Long)                                                            \
2023-03-17T14:46:37.7120030Z   ^~~~~~~~~~~~~~~~
2023-03-17T14:46:37.7120585Z ../../torch/csrc/jit/tensorexpr/eval.cpp:1284:5: note: expanded from macro 'TYPE_CASE'
2023-03-17T14:46:37.7121076Z     static_assert(sizeof(Type) <= sizeof(void*),                  \
2023-03-17T14:46:37.7121454Z     ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2023-03-17T14:46:37.7122140Z ../../torch/csrc/jit/tensorexpr/eval.cpp:1303:55: error: static_assert failed "Can't read data bigger than sizeof(void*) from pointer"
2023-03-17T14:46:37.7123020Z     AT_FORALL_SCALAR_TYPES_AND3(Bool, Half, BFloat16, TYPE_CASE);
2023-03-17T14:46:37.7123430Z     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~
2023-03-17T14:46:37.7123990Z ../../c10/core/ScalarType.h:192:3: note: expanded from macro 'AT_FORALL_SCALAR_TYPES_AND3'
2023-03-17T14:46:37.7124461Z   _(double, Double)                                                           \
2023-03-17T14:46:37.7124788Z   ^~~~~~~~~~~~~~~~~
2023-03-17T14:46:37.7125338Z ../../torch/csrc/jit/tensorexpr/eval.cpp:1284:5: note: expanded from macro 'TYPE_CASE'
2023-03-17T14:46:37.7157284Z     static_assert(sizeof(Type) <= sizeof(void*),                  \
2023-03-17T14:46:37.7157715Z     ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It looks like static assert fails on one of platforms for "double" type. As far as I'm aware, only "int" type is saved as pointer now, but I might be incorrect:

https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/tensorexpr_init.cpp#L833

Maybe, "double" type should be excluded here? Or only int types should remain?

@ezyang
Copy link
Contributor

ezyang commented Mar 17, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@ezyang
Copy link
Contributor

ezyang commented Mar 17, 2023

@EikanWang do you know?

@EikanWang
Copy link
Collaborator

EikanWang commented Mar 18, 2023

Maybe, "double" type should be excluded here? Or only int types should remain?

TE supports most torch datatypes including double, and SimpleIREvaluator is a codegen of TE. Hence, we should not just support int types.

@EikanWang
Copy link
Collaborator

EikanWang commented Mar 18, 2023

https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/tensorexpr_init.cpp#L833

@AlekseiNikiforovIBM , This is a python interface that intends to be used from python directly. The initial idea was to enable the user could use Python to develop the TE program. Something like this - https://github.com/pytorch/pytorch/blob/master/test/test_tensorexpr_pybind.py. But the L833 code does not cover all the cases that have been supported by SimpleIREvaluator. SimpleIREvaluator could support most torch data types, including double.

Comment on lines 1283 to 1289
case ScalarType::Name: { \
static_assert( \
sizeof(Type) <= sizeof(void*), \
"Can't read data bigger than sizeof(void*) from pointer"); \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can not assume this statement - sizeof(Type) <= sizeof(void*)

@AlekseiNikiforovIBM
Copy link
Collaborator Author

It theoretically could support data types other than tensors and ints in "call" and "call_raw", but only code pieces and cases I see are actually only those 2 listed above.

I don't see any implementation for basic types other than int, and current approach wouldn't work for types bigger than pointer anyway.

https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/tensorexpr_init.cpp#L833

Currently, this is only piece of code I could find which processes types other than tensors in related code.
So, here we have "int" converted to "int64_t" and stored into pointer (void*). Let's call it pointer 1.

https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/eval.cpp#L1280-L1292

Here's the code reading previously stored value.
"data" is pointer (void*), let's call it pointer 2.
pointer 2 points to pointer 1, which contains "int64_t", our intended data, which was originally "int".
On 64bit systems pointer 2 is usually 8 bytes, pointer 1 also 8 bytes, int64_t is 8 bytes and int is usually 4 bytes. Everything good.
On 32bit systems pointers are usually 4 bytes, int64_t is 8 bytes and int isusually 4 bytes. While int64_t is truncated to 4 bytes when copying to pointer, considering originally it was 4 bytes int, everything ends well.

Now, let's for example take "double" (int64_t would work same) as value instead of original "int", on 32bit systems.
"double" is usually 8 bytes, even on 32bit systems.
So, pointer 2 (4 bytes) points to pointer 1 (4 bytes), but originally pointer 1 should have contained "double", which is 8 bytes, which it cannot do.

What I'm trying to say, is that for values longer than sizeof(void*) current approach won't work.
But currently I can find only code using "int" types, and luckly for those types it usually works.
I propose to fix "int" and smaller integer types use-cases, and to have current values binding system reworked when longer ints (int64_t) or floating-point types would actually be used.

@EikanWang
Copy link
Collaborator

EikanWang commented Mar 21, 2023

I propose to fix "int" and smaller integer types use-cases, and to have current values binding system reworked when longer ints (int64_t) or floating-point types would actually be used.

It is an issue of TE to support double/int64_t scalar on 32-bit system. But we still need to support other data types, not only the integer.
https://github.com/pytorch/pytorch/blob/master/test/test_tensorexpr.py#L1184-L1199

Let's separate the two things:

  • Enable TE to support big-endian
  • Fix the issue that TE does not support double/int64_t on a 32-bit system

@AlekseiNikiforovIBM , How about let's only focus on the second thing for this PR to meet your requirement now?

Regarding the first one, We need to sort out a good design to fix it but not this PR.

@AlekseiNikiforovIBM
Copy link
Collaborator Author

I've tested how
https://github.com/pytorch/pytorch/blob/master/test/test_tensorexpr.py#L1184-L1199 works. It looks like it uses different mechanism, and not affected by this patch at all.

It goes through ScriptFunction.__call__:
https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/python/script_init.cpp#L1385-L1401

and I guess it always makes buffers, and check !bufArg.isVar() is always true.

https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/eval.cpp#L1274-L1278

I see there are multiple checks failed. Any of them I should take a look in regards to this patch?

I agree with separation of two topics, big endian support and double/int64_t on 32bit systems support. But from what I can see, there are no places where anything except for int is used in TE through this mechanism and it's actually a var, not a buf.

If such places are in tests, they should fail for this pull request. And they'd have been failing on big endian systems right now, but they're passing. Please let me know if I missed new failing tests for this pull request too.

So, this pull request fixes TE big endian support, and while it disables double/int64_t support for some use cases, such use cases are most likely currently not covered by tests.

@EikanWang
Copy link
Collaborator

EikanWang commented Mar 27, 2023

Then how about let's add something like follows to ensure the input is neither Double nor Long for big endian system?

  TORCH_CHECK(bufArg.dtype().scalar_type() != c10::ScalarType::Double);
  TORCH_CHECK(bufArg.dtype().scalar_type() != c10::ScalarType::Long);

With the guard, your code does not need to add static_assert.

Again, I need to highlight that Python float and int are translated to double and int64_t respectively.

@AlekseiNikiforovIBM
Copy link
Collaborator Author

A few lines below it's already caught in default case, and exception would be thrown:

https://github.com/pytorch/pytorch/pull/96951/files#diff-33783d984927670883fec7121b94a5142e54bedf159d7b85af6800818e513d09R1311-R1312

Should it still be added?

@EikanWang
Copy link
Collaborator

A few lines below it's already caught in default case, and exception would be thrown:

https://github.com/pytorch/pytorch/pull/96951/files#diff-33783d984927670883fec7121b94a5142e54bedf159d7b85af6800818e513d09R1311-R1312

Should it still be added?

It means that we still need to support double and int64_t because the Python float and int are translated to 64bit as I mentioned before.

@EikanWang
Copy link
Collaborator

@AlekseiNikiforovIBM , I submitted a PR to fix the 32-bit issue - #97669.

@EikanWang
Copy link
Collaborator

@AlekseiNikiforovIBM my PR #97669 has been merged, please rebase this PR.

Comment on lines 1280 to 1282
TORCH_CHECK(bufArg.dtype().scalar_type() != c10::ScalarType::Double);
TORCH_CHECK(bufArg.dtype().scalar_type() != c10::ScalarType::Long);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TORCH_CHECK(bufArg.dtype().scalar_type() != c10::ScalarType::Double);
TORCH_CHECK(bufArg.dtype().scalar_type() != c10::ScalarType::Long);

Comment on lines 1287 to 1289
static_assert( \
sizeof(Type) <= sizeof(void*), \
"Can't read data bigger than sizeof(void*) from pointer"); \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static_assert( \
sizeof(Type) <= sizeof(void*), \
"Can't read data bigger than sizeof(void*) from pointer"); \

Comment on lines 1298 to 1300
static_assert( \
sizeof(Type) <= sizeof(void*), \
"Can't read data bigger than sizeof(void*) from pointer"); \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
static_assert( \
sizeof(Type) <= sizeof(void*), \
"Can't read data bigger than sizeof(void*) from pointer"); \

Comment on lines 1308 to 1312
/* only types not longer than pointer can be stored in pointer */
TYPE_CASE(uint8_t, Byte)
TYPE_CASE(int8_t, Char)
TYPE_CASE(int16_t, Short)
TYPE_CASE(int, Int)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* only types not longer than pointer can be stored in pointer */
TYPE_CASE(uint8_t, Byte)
TYPE_CASE(int8_t, Char)
TYPE_CASE(int16_t, Short)
TYPE_CASE(int, Int)
AT_FORALL_SCALAR_TYPES_AND3(Bool, Half, BFloat16, TYPE_CASE);

@AlekseiNikiforovIBM
Copy link
Collaborator Author

Thanks, with your change the fix can be simplified.

@@ -830,7 +830,7 @@ void initTensorExprBindings(PyObject* module) {
value_ptrs.reserve(py::len(values));
for (const auto& value : values) {
if (py::isinstance<py::int_>(value)) {
value_ptrs.emplace_back(value.cast<int64_t>());
value_ptrs.emplace_back(value.cast<int>());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, didn't know about that. Will update it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be fine here to loop not only though values, but through self.buffer_args() at same time? In that case on big endian systems for py::int_ value can be casted to correct type, correctly stored in buffer, and later correctly read.

The problem with calculating correct offset in SimpleIREvaluator::bindArg is that now void *data is pointer to some buffer, but no size of buffer is passed. Either have to correctly fill this buffer from the start or add size of buffer as argument as well. Or just assume it's always 8 bytes, but that doesn't sound good to me.

…stems

When copying data from pointers, only lowest bytes are copied.
On little endian systems they are located at the beginning of pointer.
On big endian systems they are located at the end of pointer.

Place data correctly from the start in the buffer by casting
to correct int type.

This change fixes TestTensorExprPyBind::test_dynamic_shape
and TestTensorExprPyBind::test_dynamic_shape_2d tests
from test/test_tensorexpr_pybind.py on big endian systems.
@ezyang ezyang requested review from ezyang and removed request for ezyang March 30, 2023 21:30
@ezyang
Copy link
Contributor

ezyang commented Apr 2, 2023

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

AlekseiNikiforovIBM added a commit to AlekseiNikiforovIBM/pytorch that referenced this pull request May 23, 2023
…stems (pytorch#96951)

When copying data from pointers, only lowest bytes are copied. On little endian systems they are located at the beginning of pointer. On big endian systems they are located at the end of pointer.

This change fixes TestTensorExprPyBind::test_dynamic_shape and TestTensorExprPyBind::test_dynamic_shape_2d tests from test/test_tensorexpr_pybind.py on big endian systems.

Pull Request resolved: pytorch#96951
Approved by: https://github.com/ezyang, https://github.com/EikanWang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged NNC open source release notes: jit release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants