Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_to_with_tensor cause segmentation fault on MTL in non_blocking mode #861

Closed
daisyden opened this issue Sep 3, 2024 · 1 comment
Closed
Assignees
Milestone

Comments

@daisyden
Copy link
Contributor

daisyden commented Sep 3, 2024

🐛 Describe the bug

On 22.04, this case cause segmentation fault, "PYTORCH_ENABLE_XPU_FALLBACK=1 PYTORCH_TEST_WITH_SLOW=1 gdb - -args Python -m pytest -v test_torch_xpu.py -k test_to_with_tensor".

test_torch_xpu.py::TestTorch::test_to_with_tensor
Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007ffff513f1d0 in c10::intrusive_ptr<c10::VariableVersion::VersionCounter, c10::detail::intrusive_target_default_null_typec10::VariableVersion::VersionCounter >::reset_() () from /home/gta/miniforge3/envs/build_check/
lib/python3.10/site-packages/torch/lib/libc10.so
(gdb) bt
#0 0x00007ffff513f1d0 in c10::intrusive_ptr<c10::VariableVersion::VersionCounter, c10::detail::intrusive_target_default_null_typec10::VariableVersion::VersionCounter >::reset_() ()
from /home/gta/miniforge3/envs/build_check/lib/python3.10/site-packages/torch/lib/libc10.so
#1 0x00007ffff513773e in c10::TensorImpl::~TensorImpl() () from /home/gta/miniforge3/envs/build_check/lib/python3.10/site-packages/torch/lib/libc10.so
#2 0x00007ffff5137a7d in c10::TensorImpl::~TensorImpl() () from /home/gta/miniforge3/envs/build_check/lib/python3.10/site-packages/torch/lib/libc10.so
#3 0x00007fffe9f097dd in torch::utils::(anonymous namespace)::internal_new_from_data(c10::TensorOptions, c10::ScalarType, std::optionalc10::Device, _object*, bool, bool, bool, bool) ()
from /home/gta/miniforge3/envs/build_check/lib/python3.10/site-packages/torch/lib/libtorch_python.so
#4 0x00007fffe9f0eb31 in torch::utils::tensor_ctor(c10::DispatchKey, c10::ScalarType, torch::PythonArgs&) () from /home/gta/miniforge3/envs/build_check/lib/python3.10/site-packages/torch/lib/libtorch_python.so
#5 0x00007fffe9a53efa in torch::autograd::THPVariable_tensor(_object*, _object*, _object*) () from /home/gta/miniforge3/envs/build_check/lib/python3.10/site-packages/torch/lib/libtorch_python.so
#6 0x00005555556985a6 in cfunction_call (func=0x7ffff520a2a0, args=, kwargs=) at /usr/local/src/conda/python-3.10.14/Objects/methodobject.c:543
#7 0x0000555555691a6b in _PyObject_MakeTpCall (tstate=0x555555909140, callable=0x7ffff520a2a0, args=, nargs=1, keywords=0x7ffff5c850f0) at /usr/local/src/conda/python-3.10.14/Objects/call.c:215
#8 0x000055555568dc53 in _PyObject_VectorcallTstate (kwnames=0x7ffff5c850f0, nargsf=, args=, callable=0x7ffff520a2a0, tstate=)
at /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:112
#9 _PyObject_VectorcallTstate (kwnames=0x7ffff5c850f0, nargsf=, args=, callable=0x7ffff520a2a0, tstate=) at /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:99
#10 PyObject_Vectorcall (kwnames=0x7ffff5c850f0, nargsf=, args=, callable=0x7ffff520a2a0) at /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:123
#11 call_function (kwnames=0x7ffff5c850f0, oparg=, pp_stack=, trace_info=0x7fffffff8c50, tstate=) at /usr/local/src/conda/python-3.10.14/Python/ceval.c:5893
#12 _PyEval_EvalFrameDefault (tstate=, f=0x7ffe5cd4eae0, throwflag=) at /usr/local/src/conda/python-3.10.14/Python/ceval.c:4231
#13 0x00005555556a4804 in _PyEval_EvalFrame (throwflag=0, f=0x7ffe5cd4eae0, tstate=0x555555909140) at /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
#14 _PyEval_Vector (kwnames=0x0, argcount=1, args=0x7fffffff8d38, locals=0x0, con=0x7ffe5c120170, tstate=0x555555909140) at /usr/local/src/conda/python-3.10.14/Python/ceval.c:5067
#15 _PyFunction_Vectorcall (kwnames=0x0, nargsf=1, stack=0x7fffffff8d38, func=0x7ffe5c120160) at /usr/local/src/conda/python-3.10.14/Objects/call.c:342
#16 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=1, args=0x7fffffff8d38, callable=0x7ffe5c120160, tstate=0x555555909140) at /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
#17 method_vectorcall (method=, args=0x7ffff7b04088, nargsf=, kwnames=) at /usr/local/src/conda/python-3.10.14/Objects/classobject.c:61
#18 0x000055555568b2b3 in do_call_core (kwdict=0x7ffe5be24080, callargs=0x7ffff7b04070, func=0x7ffe5be25940, trace_info=0x7fffffff8e60, tstate=) at /usr/local/src/conda/python-3.10.14/Python/ceval.c:5945
#19 _PyEval_EvalFrameDefault (tstate=, f=0x7ffe5be34220, throwflag=) at /usr/local/src/conda/python-3.10.14/Python/ceval.c:4277
#20 0x00005555556a4582 in _PyEval_EvalFrame (throwflag=0, f=0x7ffe5be34220, tstate=0x555555909140) at /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
#21 _PyEval_Vector (kwnames=, argcount=, args=0x7ffe5c1368c0, locals=0x0, con=0x7ffe5c68b890, tstate=0x555555909140) at /usr/local/src/conda/python-3.10.14/Python/ceval.c:5067
#22 _PyFunction_Vectorcall (kwnames=, nargsf=, stack=0x7ffe5c1368c0, func=0x7ffe5c68b880) at /usr/local/src/conda/python-3.10.14/Objects/call.c:342
#23 _PyObject_VectorcallTstate (kwnames=, nargsf=, args=0x7ffe5c1368c0, callable=0x7ffe5c68b880, tstate=0x555555909140) at /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
#24 method_vectorcall (method=, args=0x7ffe5c1368c8, nargsf=, kwnames=) at /usr/local/src/conda/python-3.10.14/Objects/classobject.c:53
#25 0x0000555555688850 in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=, args=0x7ffe5c1368c8, callable=0x7ffe5be25a40, tstate=0x555555909140)
at /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
#26 PyObject_Vectorcall (kwnames=0x0, nargsf=, args=0x7ffe5c1368c8, callable=0x7ffe5be25a40) at /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:123
#27 call_function (kwnames=0x0, oparg=, pp_stack=, trace_info=0x7fffffff9070, tstate=) at /usr/local/src/conda/python-3.10.14/Python/ceval.c:5893
#28 _PyEval_EvalFrameDefault (tstate=, f=0x7ffe5c136750, throwflag=) at /usr/local/src/conda/python-3.10.14/Python/ceval.c:4213
#29 0x0000555555698a2c in _PyEval_EvalFrame (throwflag=0, f=0x7ffe5c136750, tstate=0x555555909140) at /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
#30 _PyEval_Vector (kwnames=, argcount=, args=, locals=0x0, con=0x7ffff6dec320, tstate=0x555555909140) at /usr/local/src/conda/python-3.10.14/Python/ceval.c:5067
#31 _PyFunction_Vectorcall (func=0x7ffff6dec310, stack=, nargsf=, kwnames=) at /usr/local/src/conda/python-3.10.14/Objects/call.c:342
#32 0x0000555555688c5c in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=, args=0x555555e452e0, callable=0x7ffff6dec310, tstate=0x555555909140)
at /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
#33 PyObject_Vectorcall (kwnames=0x0, nargsf=, args=0x555555e452e0, callable=0x7ffff6dec310) at /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:123
#34 call_function (kwnames=0x0, oparg=, pp_stack=, trace_info=0x7fffffff9220, tstate=) at /usr/local/src/conda/python-3.10.14/Python/ceval.c:5893
#35 _PyEval_EvalFrameDefault (tstate=, f=0x555555e45120, throwflag=) at /usr/local/src/conda/python-3.10.14/Python/ceval.c:4198
#36 0x00005555556a4582 in _PyEval_EvalFrame (throwflag=0, f=0x555555e45120, tstate=0x555555909140) at /usr/local/src/conda/python-3.10.14/Include/internal/pycore_ceval.h:46
#37 _PyEval_Vector (kwnames=, argcount=, args=0x55555f3afbb8, locals=0x0, con=0x7ffff6dec4d0, tstate=0x555555909140) at /usr/local/src/conda/python-3.10.14/Python/ceval.c:5067
#38 _PyFunction_Vectorcall (kwnames=, nargsf=, stack=0x55555f3afbb8, func=0x7ffff6dec4c0) at /usr/local/src/conda/python-3.10.14/Objects/call.c:342
#39 _PyObject_VectorcallTstate (kwnames=, nargsf=, args=0x55555f3afbb8, callable=0x7ffff6dec4c0, tstate=0x555555909140) at /usr/local/src/conda/python-3.10.14/Include/cpython/abstract.h:114
#40 method_vectorcall (method=, args=0x55555f3afbc0, nargsf=, kwnames=) at /usr/local/src/conda/python-3.10.14/Objects/classobject.c:53
#41 0x00005555556898fa in _PyObject_VectorcallTstate (kwnames=0x7ffedaa80be0, nargsf=, args=, callable=0x7ffe5be259c0, tstate=0x555555909140)

A small case can reproduce this issue:

import torch
b = torch.tensor(5., device='xpu')
a = torch.tensor(5)
b.to(a, non_blocking=True)
b = torch.tensor(5., device='xpu')

Versions

latest version

@daisyden daisyden added this to the PT2.5 milestone Sep 6, 2024
@chuanqi129 chuanqi129 modified the milestones: PT2.5, PT2.6 Oct 14, 2024
@fengyuan14
Copy link
Contributor

User case defect. Need be aware of async execution and CPU tensor life cycle.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants