Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Encounter segmentation faults in running the toy example on RHEL machines #561

Closed
ganfisher opened this issue Mar 6, 2024 · 14 comments · Fixed by #565
Closed

[BUG]: Encounter segmentation faults in running the toy example on RHEL machines #561

ganfisher opened this issue Mar 6, 2024 · 14 comments · Fixed by #565
Assignees
Labels
bug Something isn't working has workaround Has a workaround

Comments

@ganfisher
Copy link

ganfisher commented Mar 6, 2024

segmentation faults

Hi, I installed pysr with "conda create -n sr -c conda forge pysr", and then I was trying to run a script of the following toy example
import numpy as np
from pysr import *
X = 2 * np.random.randn(100, 5)
y = 2 * np.cos(X[:, 3]) + X[:, 0] ** 2 - 2
model = PySRRegressor(binary_operators=["+", "-", "*", "/"])
model.fit(X, y)
print(model)
When I ran the script for the first time, the juliapkg installed the dependences as well. However, no matter how many times I ran the script, the running always encountered the segmentation faults. The installation and running were implemented on other RHEL machines too, but the running still failed.

Version

0.17.1

Operating System

Linux

Package Manager

Conda

Interface

Script (i.e., python my_script.py)

Relevant log output

[110388] signal (11.1): Segmentation fault
in expression starting at none:2
initialize_env at /workspace/srcdir/gcc-13.2.0/libgomp/env.c:2062
_dl_init at /lib64/ld-linux-x86-64.so.2 (unknown line)
dl_open_worker at /lib64/ld-linux-x86-64.so.2 (unknown line)
_dl_catch_error at /lib64/ld-linux-x86-64.so.2 (unknown line)
_dl_open at /lib64/ld-linux-x86-64.so.2 (unknown line)
dlopen_doit at /lib64/libdl.so.2 (unknown line)
_dl_catch_error at /lib64/ld-linux-x86-64.so.2 (unknown line)
_dlerror_run at /lib64/libdl.so.2 (unknown line)
dlopen at /lib64/libdl.so.2 (unknown line)
ijl_load_dynamic_library at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/dlload.c:365
#dlopen#3 at ./libdl.jl:117
dlopen at ./libdl.jl:116 [inlined]
dlopen at ./libdl.jl:116
jfptr_dlopen_51541.1 at /home/jin2/bin/anaconda3/envs/sr/julia_env/pyjuliapkg/install/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
__init__ at /home/jin2/bin/anaconda3/envs/sr/julia_env/pyjuliapkg/install/share/julia/stdlib/v1.10/CompilerSupportLibraries_jll/src/CompilerSupportLibraries_jll.jl:62
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_module_run_initializer at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/toplevel.c:76
run_module_init at ./loading.jl:1134
register_restored_modules at ./loading.jl:1122
_include_from_serialized at ./loading.jl:1067
_require_search_from_serialized at ./loading.jl:1581
_require at ./loading.jl:1938
__require_prelocked at ./loading.jl:1812
jfptr___require_prelocked_80777.1 at /home/jin2/bin/anaconda3/envs/sr/julia_env/pyjuliapkg/install/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_f__call_in_world at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/builtins.c:831
#invoke_in_world#3 at ./essentials.jl:926 [inlined]
invoke_in_world at ./essentials.jl:923 [inlined]
_require_prelocked at ./loading.jl:1803
macro expansion at ./loading.jl:1790 [inlined]
macro expansion at ./lock.jl:267 [inlined]
__require at ./loading.jl:1753
jfptr___require_80742.1 at /home/jin2/bin/anaconda3/envs/sr/julia_env/pyjuliapkg/install/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
jl_f__call_in_world at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/builtins.c:831
#invoke_in_world#3 at ./essentials.jl:926 [inlined]
invoke_in_world at ./essentials.jl:923 [inlined]
require at ./loading.jl:1746
jfptr_require_80739.1 at /home/jin2/bin/anaconda3/envs/sr/julia_env/pyjuliapkg/install/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
do_call at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:126
eval_value at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:223
eval_stmt_value at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:174 [inlined]
eval_body at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:617
eval_body at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:544
jl_interpret_toplevel_thunk at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/interpreter.c:775
jl_toplevel_eval_flex at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/toplevel.c:934
jl_toplevel_eval_flex at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/toplevel.c:877
ijl_toplevel_eval_in at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/toplevel.c:985
ijl_eval_string at /cache/build/builder-amdci5-1/julialang/julia-release-1-dot-10/src/jlapi.c:117
ffi_call_unix64 at /home/jin2/bin/anaconda3/envs/sr/lib/python3.10/lib-dynload/../../libffi.so.8 (unknown line)
ffi_call_int at /home/jin2/bin/anaconda3/envs/sr/lib/python3.10/lib-dynload/../../libffi.so.8 (unknown line)
_call_function_pointer at /usr/local/src/conda/python-3.10.13/Modules/_ctypes/callproc.c:916 [inlined]
_ctypes_callproc at /usr/local/src/conda/python-3.10.13/Modules/_ctypes/callproc.c:1262
PyCFuncPtr_call at /usr/local/src/conda/python-3.10.13/Modules/_ctypes/_ctypes.c:4221
_PyObject_MakeTpCall at python (unknown line)
_PyEval_EvalFrameDefault at python (unknown line)
_PyFunction_Vectorcall at python (unknown line)
_PyEval_EvalFrameDefault at python (unknown line)
unknown function (ip: 0x56069e53986f)
PyEval_EvalCode at python (unknown line)
unknown function (ip: 0x56069e5409b9)
unknown function (ip: 0x56069e4a6a92)
_PyEval_EvalFrameDefault at python (unknown line)
_PyFunction_Vectorcall at python (unknown line)
_PyEval_EvalFrameDefault at python (unknown line)
_PyFunction_Vectorcall at python (unknown line)
_PyEval_EvalFrameDefault at python (unknown line)
_PyFunction_Vectorcall at python (unknown line)
_PyEval_EvalFrameDefault at python (unknown line)
_PyFunction_Vectorcall at python (unknown line)
_PyEval_EvalFrameDefault at python (unknown line)
_PyFunction_Vectorcall at python (unknown line)
unknown function (ip: 0x56069e4a60a7)
_PyObject_CallMethodIdObjArgs at python (unknown line)
unknown function (ip: 0x56069e3d7260)
_PyEval_EvalFrameDefault at python (unknown line)
unknown function (ip: 0x56069e53986f)
PyEval_EvalCode at python (unknown line)
unknown function (ip: 0x56069e5409b9)
unknown function (ip: 0x56069e4a6a92)
_PyEval_EvalFrameDefault at python (unknown line)
_PyFunction_Vectorcall at python (unknown line)
_PyEval_EvalFrameDefault at python (unknown line)
_PyFunction_Vectorcall at python (unknown line)
_PyEval_EvalFrameDefault at python (unknown line)
_PyFunction_Vectorcall at python (unknown line)
_PyEval_EvalFrameDefault at python (unknown line)
_PyFunction_Vectorcall at python (unknown line)
_PyEval_EvalFrameDefault at python (unknown line)
_PyFunction_Vectorcall at python (unknown line)
unknown function (ip: 0x56069e4a60a7)
_PyObject_CallMethodIdObjArgs at python (unknown line)
unknown function (ip: 0x56069e3d7260)
_PyEval_EvalFrameDefault at python (unknown line)
unknown function (ip: 0x56069e53986f)
PyEval_EvalCode at python (unknown line)
unknown function (ip: 0x56069e5409b9)
unknown function (ip: 0x56069e4a6a92)
_PyEval_EvalFrameDefault at python (unknown line)
_PyFunction_Vectorcall at python (unknown line)
_PyEval_EvalFrameDefault at python (unknown line)
_PyFunction_Vectorcall at python (unknown line)
_PyEval_EvalFrameDefault at python (unknown line)
_PyFunction_Vectorcall at python (unknown line)
_PyEval_EvalFrameDefault at python (unknown line)
_PyFunction_Vectorcall at python (unknown line)
_PyEval_EvalFrameDefault at python (unknown line)
_PyFunction_Vectorcall at python (unknown line)
unknown function (ip: 0x56069e4a60a7)
_PyObject_CallMethodIdObjArgs at python (unknown line)
unknown function (ip: 0x56069e3d7260)
_PyEval_EvalFrameDefault at python (unknown line)
unknown function (ip: 0x56069e53986f)
PyEval_EvalCode at python (unknown line)
unknown function (ip: 0x56069e569d19)
unknown function (ip: 0x56069e565122)
unknown function (ip: 0x56069e3fc4d0)
_PyRun_SimpleFileObject at python (unknown line)
_PyRun_AnyFileObject at python (unknown line)
Py_RunMain at python (unknown line)
Py_BytesMain at python (unknown line)
__libc_start_main at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x56069e52cd10)
Allocations: 2907 (Pool: 2899; Big: 8); GC: 0

Extra Info

No response

@ganfisher ganfisher added the bug Something isn't working label Mar 6, 2024
@MilesCranmer
Copy link
Owner

MilesCranmer commented Mar 6, 2024

This is very likely due to loading an incompatible shared library on your system. Usually there is some /lib64/libc.so that is very old and getting loaded by Python first. (I also see this on my institute RHEL)

The workaround is to force a modern libc to get loaded first.

@MilesCranmer
Copy link
Owner

MilesCranmer commented Mar 6, 2024

Can you try the solutions posted in #347? In particular you can try to do:

LD_PRELOAD=/path/to/modern/libstdc++.so.6 python my_script.py

where /path/to/modern/libstdc++.so.6 is some version of libstdc++.so.6 that you can find with

for lib in $(find $CONDA_PREFIX -iname "*.so"); do echo $lib; ldd $lib | grep libstdc++; done;

Also, if possible, is there any way you can create something I can reproduce your error in? Otherwise I am in the dark about the cause of this.

Maybe like a docker container with RHEL that you are able to reproduce the error inside, and therefore I can run?

@MilesCranmer
Copy link
Owner

Here's a working docker file based on RHEL. So it's at least not a universal RHEL issue, probably some old library somewhere that Python is picking up. The following dockerfile (based on RHEL 8) seems to work fine:

FROM redhat/ubi8-minimal:8.9-1108.1706795067

RUN microdnf install -y python3.11 python3.11-pip

RUN python3.11 -m pip install pysr==0.17.1

# Install Julia pre-requisites:
RUN python3.11 -c 'import pysr'

# PySR run:
CMD ["python3.11", "-c", "import pysr; pysr.PySRRegressor().fit([[1]], [1])"]

Build with

docker build -t pysr -f Dockerfile .

and test with

docker run --rm -it pysr

@ganfisher
Copy link
Author

Can you try the solutions posted in #347? In particular you can try to do:

LD_PRELOAD=/path/to/modern/libstdc++.so.6 python my_script.py

where /path/to/modern/libstdc++.so.6 is some version of libstdc++.so.6 that you can find with

for lib in $(find $CONDA_PREFIX -iname "*.so"); do echo $lib; ldd $lib | grep libstdc++; done;

Also, if possible, is there any way you can create something I can reproduce your error in? Otherwise I am in the dark about the cause of this.

Maybe like a docker container with RHEL that you are able to reproduce the error inside, and therefore I can run?

Thank you very much, Miles. I have checked the version of GLIBCXX for the conda environment, and it seems to be 3.4.30. I think the version is new enough to support most applications. As for your docker scheme, I will try it if possible. Since I will use pysr on our clusters, the deployment based on docker may be troublesome.
image

@ZehaoJin
Copy link
Contributor

ZehaoJin commented Mar 8, 2024

I am facing the same error. PySr used to work perfectly fine, but since I upgraded PySR to newest version 2 days ago, import pysr will cause that error. I have no idea from which release of PySR starts to have this issue, but seems like one of the updates breaks pysr on certain machines.

@MilesCranmer
Copy link
Owner

MilesCranmer commented Mar 8, 2024

Hm, very strange. It might be worth cross-posting this issue to https://github.com/JuliaPy/PythonCall.jl/issues.

@cjdoris do you know why this might be happening with JuliaCall but not PyJulia (previous package used by PySR)?

@MilesCranmer
Copy link
Owner

It seems like this might be the issue to track: JuliaPy/PythonCall.jl#464

@ZehaoJin
Copy link
Contributor

ZehaoJin commented Mar 8, 2024

172407 segmentation fault (core dumped) python #470 This one sounds relavent too. segmentation fault (core dumped) is the exact error I am getting after import pysr.

@MilesCranmer
Copy link
Owner

It sounds like it might be a Julia bug: JuliaLang/julia#53643 which is being patched as we speak 🎉

@ganfisher
Copy link
Author

It sounds like it might be a Julia bug: JuliaLang/julia#53643 which is being patched as we speak 🎉

I agree. That may be the most possible cause, since the error message contains no information directly related to pysr.

@ZehaoJin
Copy link
Contributor

Are we just waiting for Julia's next release and hopefully the issue will be resolved?

@MilesCranmer
Copy link
Owner

MilesCranmer commented Mar 11, 2024

A temporary workaround would be to downgrade Julia to 1.10.0. You can do this by modifying pysr/juliapkg.json to have:

{
    "julia": "=1.10.0",

and then reinstalling. Can you check if that solves it? If it does I can make a special branch that people can pip install git+... on.

@ZehaoJin
Copy link
Contributor

This trick seems to work! I got no errors now.

@MilesCranmer
Copy link
Owner

MilesCranmer commented Mar 11, 2024

I skip those broken Julia versions here: #565, fixing this issue. This is not actually a bug in Julia, but rather in libgomp – 1.10.1 and 1.10.2 are compiled against that broken libgomp.

The issue should be fixed (verify it installs any Julia except 1.10.1 and 1.10.2). If not please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working has workaround Has a workaround
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants