Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python bindings: segfault enumerating result of Traverse() from Usd.Stage #2160

Closed
jakedouglas opened this issue Jan 13, 2023 · 10 comments
Closed

Comments

@jakedouglas
Copy link

Description of Issue

Enumerating a Usd.PrimRange obtained from calling Traverse() on a loaded stage segfaults when a reference to the stage is not explicitly maintained. It seems as though it may be getting garbage collected prematurely.

Steps to Reproduce

Works fine:

from pxr import Usd

stage = Usd.Stage.Open("path/to/stage.usdz")
list(stage.Traverse())

Segfaults:

from pxr import Usd

list(Usd.Stage.Open("path/to/stage.usdz").Traverse())

Package Versions

Python 3.9

Seen on multiple systems:

  • v22.11 built from source on macOS 13.1 / Apple M1 Pro arm64
  • v22.8 from pypi on Ubuntu 22.04.1 LTS / x86_64

Result:

Process:               Python [5017]
Path:                  /Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/Resources/Python.app/Contents/MacOS/Python
Identifier:            com.apple.python3
Version:               3.9.6 (3.9.6)
Build Info:            python3-124000000000000~2119
Code Type:             ARM-64 (Native)
Parent Process:        zsh [711]
Responsible:           iTerm2 [588]
User ID:               501

Date/Time:             2023-01-13 08:44:01.2467 -0800
OS Version:            macOS 13.1 (22C65)
Report Version:        12
Anonymous UUID:        D42BDF3A-3F14-9408-3FA1-E792AEFF831A

Sleep/Wake UUID:       C9BC618B-978F-4CE5-A6DF-34A22A2E4568

Time Awake Since Boot: 320 seconds
Time Since Wake:       302 seconds

System Integrity Protection: enabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x0000000000000000
Exception Codes:       0x0000000000000001, 0x0000000000000000

Termination Reason:    Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process:   exc handler [5017]

VM Region Info: 0 is not in any region.  Bytes before following region: 4309024768
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      UNUSED SPACE AT START
--->  
      __TEXT                      100d68000-100d6c000    [   16K] r-x/r-x SM=COW  .../MacOS/Python

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   ???                           	               0x0 ???
1   libboost_python39.dylib       	       0x10126e5f4 boost::python::objects::instance_dealloc(_object*) + 44
2   Python3                       	       0x10146bdb8 0x1013d0000 + 638392
3   Python3                       	       0x10140fabc _Py_CheckFunctionResult + 136
4   libusd_ms.dylib               	       0x103270020 pxrInternal_v0_22__pxrReserved__::Tf_ModuleProcessor::_InvokeWithErrorHandling::operator()(_object*, _object*) const + 100
5   libboost_python39.dylib       	       0x10126f5e0 boost::python::objects::function::call(_object*, _object*) const + 696
6   libboost_python39.dylib       	       0x101271fa0 boost::detail::function::void_function_ref_invoker0<boost::python::objects::(anonymous namespace)::bind_return, void>::invoke(boost::detail::function::function_buffer&) + 28
7   libboost_python39.dylib       	       0x1012774b0 boost::python::detail::exception_handler::operator()(boost::function0<void> const&) const + 88
8   _tf.so                        	       0x101939f34 boost::detail::function::function_obj_invoker2<boost::_bi::bind_t<bool, boost::python::detail::translate_exception<pxrInternal_v0_22__pxrReserved__::TfBaseException, void (*)(pxrInternal_v0_22__pxrReserved__::TfBaseException const&)>, boost::_bi::list3<boost::arg<1>, boost::arg<2>, boost::_bi::value<void (*)(pxrInternal_v0_22__pxrReserved__::TfBaseException const&)> > >, bool, boost::python::detail::exception_handler const&, boost::function0<void> const&>::invoke(boost::detail::function::function_buffer&, boost::python::detail::exception_handler const&, boost::function0<void> const&) + 28
9   libboost_python39.dylib       	       0x101277204 boost::python::handle_exception_impl(boost::function0<void>) + 56
10  libboost_python39.dylib       	       0x101271a88 boost::python::objects::function_call(_object*, _object*, _object*) + 72
11  Python3                       	       0x10140fee8 _PyObject_MakeTpCall + 360
12  Python3                       	       0x1014129b0 0x1013d0000 + 272816
13  Python3                       	       0x10147701c 0x1013d0000 + 684060
14  Python3                       	       0x1013fa744 PyObject_GetIter + 32
15  Python3                       	       0x101430a10 0x1013d0000 + 395792
16  Python3                       	       0x1014355c8 0x1013d0000 + 415176
17  Python3                       	       0x101432414 0x1013d0000 + 402452
18  Python3                       	       0x1014f15e8 0x1013d0000 + 1185256
19  Python3                       	       0x1014eedd4 _PyEval_EvalFrameDefault + 23960
20  Python3                       	       0x1014f23d4 0x1013d0000 + 1188820
21  Python3                       	       0x1014e8f74 PyEval_EvalCode + 80
22  Python3                       	       0x101532b1c 0x1013d0000 + 1452828
23  Python3                       	       0x101530aec PyRun_SimpleFileExFlags + 820
24  Python3                       	       0x10154edf0 Py_RunMain + 1640
25  Python3                       	       0x10154f280 0x1013d0000 + 1569408
26  Python3                       	       0x10154f320 Py_BytesMain + 40
27  dyld                          	       0x190d57e50 start + 2544

cc @gavinmh

@spiffmon
Copy link
Member

spiffmon commented Jan 13, 2023

Hi @jakedouglas , given that you have assigned the stage to a python local, it seems unlikely that python would garbage collect the stage. Does this happen with a "known good" usdz asset, like any of those from the Apple QuickLook gallery? If it turns out to be specific to a particular usdz asset, if it is not too big, could you share it for further debugging? If it seems like a more systemic issue, it's something I do not know that we can reproduce, given that we test the pypi and other builds prior to publishing. In that case, are there any scenarios in which you can successfully open and traverse a UsdStage? Build (a simple) one programatically?

@jakedouglas
Copy link
Author

Two examples were provided for comparison:

  • One that assigns the stage to a python local and works fine
  • One that does not assign the stage to anything and segfaults

Let me know if I can further clarify.

@spiffmon
Copy link
Member

Ah, sorry, @jakedouglas - somehow I missed the second example! So

list(Usd.Stage.Open("path/to/stage.usdz").Traverse())

is not expected to work, because Traverse() returns a Usd.PrimRange(), which holds Usd.Prim objects that have only weak (non-ownership) pointers back to their originating Usd.Stage . A Stage owns its prims, and therefore the prims cannot have "strong" pointers back to the Stage. And since a Usd.PrimRange is a lightweight computed result of a Stage, it would be problematic for it to directly express an ownership stake in the Stage also.

Bottom line, to do anything with a Stage other than use its Export() or Flatten() methods, you need to hold the stage in a variable in python (OK, slight exception if you open the Stage into a Usd.StageCache, which does maintain an ownership stake in its cached Stages and must be explicitly flushed to release those stages).

I am, however, concerned that you got a segfault, rather than a python exception telling you about an attempt to dereference an expired UsdPrim. That's the behavior we observe, and expect.

@spitzak
Copy link

spitzak commented Jan 17, 2023

In C++ the result of Usd.Stage.Open() would not be destroyed until after the function list() returns. Is this not true of Python?

@jakedouglas
Copy link
Author

Thank you for explaining. I think it is probably unintuitive for Python users to have a Usd.PrimRange/Usd.Prim be invalid without having assigned the Usd.Stage to something explicitly, but I suppose that is a separate issue.

I do get the exception under Python 3.10, but segfault under Python 3.8 and 3.9.

What additional information can I provide to be helpful?

@jakedouglas
Copy link
Author

jakedouglas commented Jan 18, 2023

Notably this segfault also occurs using the universal2 wheel produced from the dev branch, obtained from the link in this comment. This would seem to indicate that it is unrelated to the way(s) in which I built the package myself.

~ % echo $PYTHONPATH

~ % pip install ~/Downloads/dist-mac-cp39/usd_core-23.2-cp39-none-macosx_10_9_universal2.whl
Processing ./Downloads/dist-mac-cp39/usd_core-23.2-cp39-none-macosx_10_9_universal2.whl
Installing collected packages: usd-core
Successfully installed usd-core-23.2
~ % python
Python 3.9.16 (main, Jan 17 2023, 08:33:06)
[Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from pxr import Usd
>>> list(Usd.Stage.Open("/Users/jakedouglas/Downloads/Office.usdz").Traverse())
zsh: segmentation fault  python
~ %

@spiffmon
Copy link
Member

We're taking a look to see if we can repro and diagnose, @jakedouglas ... in our internal builds, your python code produces a python exception as we expect(!)

@sunyab
Copy link
Contributor

sunyab commented Jan 18, 2023

Filed as internal issue #USD-7872

@sunyab
Copy link
Contributor

sunyab commented Jan 20, 2023

I did some tests using the manylinux2014 Docker image using usd-core 22.11 and got exceptions on Python 3.6 and 3.7 but segfaults on 3.8, 3.9, and 3.10. The exceptions looked like:

RuntimeError: Invalid range starting with expired 'Xform' prim </baked_mesh> 

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
SystemError: <Boost.Python.function object at 0x120aa50> returned a result with an error set

I'm wondering if there was some C API change in 3.8 that's affecting the way we (or boost::python) is translating exceptions now.

@sunyab
Copy link
Contributor

sunyab commented Jan 20, 2023

Oh nevermind, I've found the issue and am testing a fix that will avoid the segfault and cause an exception to be thrown consistently instead. This appears to be a long-standing issue and will likely not make the cut for the 23.02 release but will land in the dev branch shortly after.

pixar-oss pushed a commit that referenced this issue Feb 16, 2023
These functions are all intended for use in C++ code called
from Python bindings. In these cases, the GIL has already
been acquired, so the TfPyLocks in these functions were
redundant.

Fixes #2160

(Internal change: 2262124)
pixar-oss pushed a commit that referenced this issue Feb 16, 2023
__iter__ was missing a call to throw_error_already_set after
using the Python C API to set an error indicator, resulting
in a SystemError exception on Python 3.6 and 3.7 and segfaults
on 3.8+.

This fix replaces direct calls to the Python API with Tf
utility functions which ensure that throw_error_already_set
will be called in error cases.

Fixes #2160

(Internal change: 2262128)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants