Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault during julia.Julia() if PyTorch is already imported #125

Closed
jekbradbury opened this issue Sep 23, 2017 · 7 comments
Closed

Segfault during julia.Julia() if PyTorch is already imported #125

jekbradbury opened this issue Sep 23, 2017 · 7 comments

Comments

@jekbradbury
Copy link

jekbradbury commented Sep 23, 2017

The Python package PyTorch occasionally has import order issues with other large binary packages (throwing the static TLS error shown below for one import order but working with the other), but this time one order causes a segfault in julia.Julia(), preventing me from using both packages in the same program.
Otherwise pyjulia and PyCall work fine for me.

james.bradbury@jamesbradbu-wsl:~$ python
Python 3.6.1 |Continuum Analytics, Inc.| (default, Mar 22 2017, 19:54:23) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import julia
>>> import torch
>>> j = julia.Julia()
Segmentation fault
[...]
>>> import torch
>>> import julia
>>> j = julia.Julia()
Segmentation fault
[...]
>>> import julia
>>> j = julia.Julia()
WARNING: Method definition macroexpand(Module, Any) in module Compat at /home/james.bradbury/.julia/v0.6/Compat/src/Compat.jl:463 overwritten in module MacroTools at /home/james.bradbury/.julia/v0.6/MacroTools/src/utils.jl:64.
>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/james.bradbury/miniconda3/lib/python3.6/site-packages/torch/__init__.py", line 53, in <module>
    from torch._C import *
ImportError: dlopen: cannot load any more object with static TLS

Other possibly relevant versions:
PyCall: v1.15.0
pyjulia: master

>>> torch.__version__
'0.2.0_4'
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.6.0 (2017-06-19 13:05 UTC)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |  x86_64-unknown-linux-gnu

@jekbradbury
Copy link
Author

And here's the top of the GDB traceback. libTHC is the dylib that contains PyTorch's CUDA kernels:

#0  0x00007fffd1966577 in void std::__once_call_impl<std::_Bind_simple<void (*())()> >() ()
   from /home/james.bradbury/miniconda3/lib/python3.6/site-packages/torch/lib/libTHC.so.1
#1  0x00007ffff76c2a80 in pthread_once ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_once.S:103
#2  0x00007fffbcb585b9 in llvm::ManagedStaticBase::RegisterManagedStatic(void* (*)(), void (*)(void*)) const () from /home/james.bradbury/julia/usr/bin/../lib/libLLVM-3.9.so
#3  0x00007fffbcb33ebe in llvm::cl::OptionCategory::registerCategory() ()
   from /home/james.bradbury/julia/usr/bin/../lib/libLLVM-3.9.so
#4  0x00007fffbcb3ca42 in __static_initialization_and_destruction_0(int, int) [clone .constprop.319]
    () from /home/james.bradbury/julia/usr/bin/../lib/libLLVM-3.9.so
#5  0x00007ffff7dea2da in call_init (l=<optimized out>, argc=argc@entry=1, 
    argv=argv@entry=0x7fffffffd618, env=env@entry=0x740b50) at dl-init.c:78
#6  0x00007ffff7dea3c3 in call_init (env=<optimized out>, argv=<optimized out>, 
    argc=<optimized out>, l=<optimized out>) at dl-init.c:36
#7  _dl_init (main_map=main_map@entry=0x1228140, argc=1, argv=0x7fffffffd618, env=0x740b50)
    at dl-init.c:126
#8  0x00007ffff7deee00 in dl_open_worker (a=a@entry=0x7fffffffc1f8) at dl-open.c:577
#9  0x00007ffff7dea194 in _dl_catch_error (objname=objname@entry=0x7fffffffc1e8, 
    errstring=errstring@entry=0x7fffffffc1f0, mallocedp=mallocedp@entry=0x7fffffffc1e0, 
    operate=operate@entry=0x7ffff7deeb30 <dl_open_worker>, args=args@entry=0x7fffffffc1f8)
    at dl-error.c:187
#10 0x00007ffff7dee54b in _dl_open (
    file=0x7fffbecc1d10 "/home/james.bradbury/julia/usr/bin/../lib/libjulia.so.0.6", 
    mode=-2147483390, caller_dlopen=<optimized out>, nsid=-2, argc=1, argv=0x7fffffffd618, 
    env=0x740b50) at dl-open.c:661
#11 0x00007ffff74b202b in dlopen_doit (a=a@entry=0x7fffffffc410) at dlopen.c:66
#12 0x00007ffff7dea194 in _dl_catch_error (objname=0x622560, errstring=0x622568, 
    mallocedp=0x622558, operate=0x7ffff74b1fd0 <dlopen_doit>, args=0x7fffffffc410) at dl-error.c:187
#13 0x00007ffff74b262d in _dlerror_run (operate=operate@entry=0x7ffff74b1fd0 <dlopen_doit>, 
---Type <return> to continue, or q <return> to quit---
    args=args@entry=0x7fffffffc410) at dlerror.c:163
#14 0x00007ffff74b20c1 in __dlopen (file=<optimized out>, mode=<optimized out>) at dlopen.c:87
#15 0x00007ffff5b94354 in py_dl_open (self=<optimized out>, args=<optimized out>)
    at /home/ilan/minonda/conda-bld/python_1490230359189/work/Python-3.6.1/Modules/_ctypes/callproc.c:1328
#16 0x00007ffff7995ef9 in _PyCFunction_FastCallDict (func_obj=0x7ffff646b900, args=0x7ffff644ec10, 
    nargs=<optimized out>, kwargs=0x0) at Objects/methodobject.c:234

@jekbradbury
Copy link
Author

Update: this doesn't happen on a slightly older PyTorch binary (0.2.0_2). Maybe the change mentioned here as fixing a different dylib incompatibility broke something with libjulia?

@ihnorton
Copy link
Member

ihnorton commented Dec 4, 2017

Probably JuliaLang/julia#19606

@jekbradbury
Copy link
Author

It's been fixed in PyTorch version 0.3 (out this week), I think by reducing TLS usage; regardless it's definitely not pyjulia's fault so I think I'll close

@ViralBShah
Copy link
Contributor

Nice!

@zhangpiu
Copy link

zhangpiu commented Mar 2, 2018

Similar issue when I import torch in Julia using PyCall.
image

Import again, and get follow errors.
`julia> @pyimport torch

signal (11): Segmentation fault
while loading no file, in expression starting on line 467
unknown function (ip: 0x7f822c12ac52)
libshm_init at /usr/local/lib/python2.7/dist-packages/torch/lib/libshm.so (unknown line)
THPModule_initExtension at /pytorch/torch/csrc/Module.cpp:120
PyEval_EvalFrameEx at /usr/lib/x86_64-linux-gnu/libpython2.7.so (unknown line)
PyEval_EvalCodeEx at /usr/lib/x86_64-linux-gnu/libpython2.7.so (unknown line)
PyEval_EvalCode at /usr/lib/x86_64-linux-gnu/libpython2.7.so (unknown line)
PyImport_ExecCodeModuleEx at /usr/lib/x86_64-linux-gnu/libpython2.7.so (unknown line)
unknown function (ip: 0x7f82443c5f45)
unknown function (ip: 0x7f82443f79f8)
unknown function (ip: 0x7f82443c6658)
PyImport_ImportModuleLevel at /usr/lib/x86_64-linux-gnu/libpython2.7.so (unknown line)
unknown function (ip: 0x7f82443938ae)
PyObject_Call at /usr/lib/x86_64-linux-gnu/libpython2.7.so (unknown line)
PyObject_CallFunction at /usr/lib/x86_64-linux-gnu/libpython2.7.so (unknown line)
PyImport_Import at /usr/lib/x86_64-linux-gnu/libpython2.7.so (unknown line)
PyImport_ImportModule at /usr/lib/x86_64-linux-gnu/libpython2.7.so (unknown line)
macro expansion at /home/qunheadmin/.julia/v0.6/PyCall/src/exception.jl:78 [inlined]
pyimport at /home/qunheadmin/.julia/v0.6/PyCall/src/PyCall.jl:374
unknown function (ip: 0x7f8248b8fba2)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1926
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:75
eval at /buildworker/worker/package_linux64/build/src/interpreter.c:242
do_call at /buildworker/worker/package_linux64/build/src/interpreter.c:74
eval at /buildworker/worker/package_linux64/build/src/interpreter.c:242
eval_body at /buildworker/worker/package_linux64/build/src/interpreter.c:543
jl_interpret_toplevel_thunk at /buildworker/worker/package_linux64/build/src/interpreter.c:692
jl_toplevel_eval_flex at /buildworker/worker/package_linux64/build/src/toplevel.c:592
jl_toplevel_eval_in at /buildworker/worker/package_linux64/build/src/builtins.c:496
eval at ./boot.jl:235
unknown function (ip: 0x7f826553cd2f)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1926
eval_user_input at ./REPL.jl:66
unknown function (ip: 0x7f82655be18f)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1926
macro expansion at ./REPL.jl:97 [inlined]
#1 at ./event.jl:73
unknown function (ip: 0x7f8248b8a2af)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1926
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1424 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:267
unknown function (ip: 0xffffffffffffffff)
Allocations: 4513611 (Pool: 4512241; Big: 1370); GC: 8`

@MilesCranmer
Copy link
Collaborator

I'm still seeing this issue on macOS. Any tips for debugging @jekbradbury?

Here is the PyTorch thread with more info: pytorch/pytorch#78829

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants