-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python] debug: access symbols for libarrow_python_flight.dylib #38519
Comments
Could you rebuild your PyArrow with |
### Rationale for this change It's not very clear that PyArrow will get built in release mode even if the linked Arrow C++ is built in debug mode. I added some text to the docs to make it more clear. An example of a user running into this is #38519. ### Are these changes tested? No as they're just minor docs changes. ### Are there any user-facing changes? This adds text to the Python docs. Authored-by: Bryce Mecum <petridish@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Hi @kou, thanks for the quick reply. Wouldn't this be achieved by this step as showed in the documentation via the --build-type parameter, rather than using the environment variable? python setup.py build_ext --build-type=$ARROW_BUILD_TYPE \
--bundle-arrow-cpp bdist_wheel If the answer is yes, then that is how I have been building pyarrow as well, if not I will try in a couple of hours and report back. |
|
I see, that is what I have been using, any other ideas? |
How about trying |
Okay I have tried rebuilding with the export PYARROW_BUILD_TYPE=debug but it didn't make any difference unfortunately. Still see only assembly. Is there a command I can run on the dylib to ensure it has been built correctly in debug mode? |
Could you show the command line you tried as-is and full log of the command line? |
Yep, okay so posting here only the pyarrow building part, let me know if you need to see the arrow one as well: (pyarrow-dev) XDWRL412MF:Projects dbarone$ pushd arrow/python
~/Projects/arrow/python ~/Projects ~/Projects ~/Projects
(pyarrow-dev) XDWRL412MF:python dbarone$ export PYARROW_WITH_PARQUET=1
(pyarrow-dev) XDWRL412MF:python dbarone$ export PYARROW_WITH_DATASET=1
(pyarrow-dev) XDWRL412MF:python dbarone$ export PYARROW_PARALLEL=4
(pyarrow-dev) XDWRL412MF:python dbarone$ export PYARROW_WITH_ARROW_FLIGHT=1
(pyarrow-dev) XDWRL412MF:python dbarone$ export PYARROW_WITH_FLIGHT=1
(pyarrow-dev) XDWRL412MF:python dbarone$ export PYARROW_BUNDLE_CYTHON_CPP=1
(pyarrow-dev) XDWRL412MF:python dbarone$ export PYARROW_BUILD_FLIGHT=1
(pyarrow-dev) XDWRL412MF:python dbarone$ export PYARROW_BUNDLE_ARROW_CPP=1
(pyarrow-dev) XDWRL412MF:python dbarone$ export PYARROW_BUILD_TYPE=debug
(pyarrow-dev) XDWRL412MF:python dbarone$ python setup.py build_ext --build-type=debug --bundle-arrow-cpp bdist_wheel &> output.log
This is the output.log file generated from the python command above. |
Your build used debug build. It seems that LLDB can't find source files for |
To be honest I don't fully understand what is the supposed build directory, looking at the output.log file I do see this one mentioned:
but I do see the actual source files in /Users/dbarone/Projects/arrow/python/build/lib.macosx-13-arm64-cpython-311/pyarrow/src/arrow/python/, I tried to set those using the following command: settings set target.source-map /Users/dbarone/Projects/arrow/python/build/temp.macosx-13-arm64-cpython-311/debug /Users/dbarone/Projects/arrow/python/build/lib.macosx-13-arm64-cpython-311/pyarrow/src/arrow/python/ but it didn't make any difference. |
Could you try |
…e#38522) ### Rationale for this change It's not very clear that PyArrow will get built in release mode even if the linked Arrow C++ is built in debug mode. I added some text to the docs to make it more clear. An example of a user running into this is apache#38519. ### Are these changes tested? No as they're just minor docs changes. ### Are there any user-facing changes? This adds text to the Python docs. Authored-by: Bryce Mecum <petridish@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
…e#38522) ### Rationale for this change It's not very clear that PyArrow will get built in release mode even if the linked Arrow C++ is built in debug mode. I added some text to the docs to make it more clear. An example of a user running into this is apache#38519. ### Are these changes tested? No as they're just minor docs changes. ### Are there any user-facing changes? This adds text to the Python docs. Authored-by: Bryce Mecum <petridish@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
I found this issue while trying to figure out how to debug C++ code inside PyArrow and I think I'm seeing something similar to the above. With a debug Arrow C++ and PyArrow build, I can hit a breakpoint but I don't see source code mapping: (venv) bryce@debian ~/s/a/a/python (main)> lldb -- $(which python)
Traceback (most recent call last):
File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'lldb'
(lldb) target create "/home/bryce/src/apache/arrow/python/venv/bin/python"
Current executable set to '/home/bryce/src/apache/arrow/python/venv/bin/python' (x86_64).
(lldb) b ConvertPySequence
Breakpoint 1: no locations (pending).
WARNING: Unable to resolve breakpoint to any actual locations.
(lldb) run
Process 66372 launched: '/home/bryce/src/apache/arrow/python/venv/bin/python' (x86_64)
Python 3.11.2 (main, Aug 26 2024, 07:20:54) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow as pa
1 location added to breakpoint 1
warning: (x86_64) /home/bryce/src/apache/arrow/python/venv/lib/python3.11/site-packages/numpy.libs/libgfortran-040039e1-0352e75f.so.5.0.0 No LZMA support found for reading .gnu_debugdata section
>>> pa.array([1])
Process 66372 stopped
* thread #1, name = 'python', stop reason = breakpoint 1.1
frame #0: 0x00007ffff6d6f510 libarrow_python.so`arrow::py::ConvertPySequence(_object*, _object*, arrow::py::PyConversionOptions, arrow::MemoryPool*)
libarrow_python.so`arrow::py::ConvertPySequence:
-> 0x7ffff6d6f510 <+0>: pushq %rbp
0x7ffff6d6f511 <+1>: movq %rsp, %rbp
0x7ffff6d6f514 <+4>: pushq %r15
0x7ffff6d6f516 <+6>: movq %rsi, %r15 One thing I notice is that libarrow_python.so doesn't have debug info like libarrow.so: (venv) bryce@debian ~/s/a/a/python (main)> file pyarrow/libarrow_python.so
pyarrow/libarrow_python.so: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, BuildID[sha1]=be0d7e9dcfba902687a81fec19e5565c4c1d3626, not stripped
(venv) bryce@debian ~/s/a/a/python (main)> file $ARROW_HOME/lib/libarrow.so.1800.0.0
/home/bryce/builds/arrow-x86_64/lib/libarrow.so.1800.0.0: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, BuildID[sha1]=282a70ef10aec0f43eb1f10ed176eaa4f993fab2, with debug_info, not stripped
(venv) bryce@debian ~/s/a/a/python (main)> (In the output above, libarrow_python.so is missing the string "debug_info" in its output). Source code mapping is working with with symbols in Arrow C++:
Any ideas @kou? |
Could you try Lines 170 to 171 in 0bdb5be
|
I did (and generally do). Let me record a full log of a clean build and share it here. |
Here's a full log of my install process: https://gist.github.com/amoeba/010c82a7818f8608aef84ca27864c679. |
Hrm. Of course it's now working as expected. i.e., I have source code mapping to PyArrow C++ source:
Edit: And the pyarrow_flight SO seems to have debug info so I'm going to see if the OP's issue works now.
|
Oh... It's strange... But it's an expected result... |
Maybe just caused by not having a totally clean environment. Thanks for the help @kou. I still wanted to check to see I had the right debug info for the source code mapping the OP was having trouble with and I think I do:
Specifically,
So I think that confirms my build is working in a way that would work for the OP's situaton. @donatobarone I know it's been a while since you were working on this but I think we should be able to help get you set up to debug the issue you were having. I'm going to close this issue for the moment but please comment and we can re-open it. |
Describe the usage question you have. Please include as many useful details as possible.
Hi,
I have been trying to get access to the symbols of this dylib for a while now and failed miserably. I have followed the guide for developers at this link https://arrow.apache.org/docs/dev/developers/python.html and I have successfully built and generated a pyarrow package locally that I can use in my pyenv. I was successfully able to print symbols and debug the source code files for some of the other dylib (e.g. libarrow_flight.1400.dylib) but when I get to libarrow_python_flight.dylib which is where I am getting a segmentation fault that I am trying to debug, I just see assembly.
I have built the lib with the following options:
cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \ -DCMAKE_INSTALL_LIBDIR=lib \ -DCMAKE_BUILD_TYPE=Debug \ -DARROW_BUILD_TESTS=ON \ -DARROW_COMPUTE=ON \ -DARROW_CSV=ON \ -DARROW_DATASET=ON \ -DARROW_FILESYSTEM=ON \ -DARROW_HDFS=ON \ -DARROW_JSON=ON \ -DARROW_PARQUET=ON \ -DARROW_WITH_BROTLI=ON \ -DARROW_WITH_BZ2=ON \ -DARROW_WITH_LZ4=ON \ -DARROW_WITH_SNAPPY=ON \ -DARROW_WITH_ZLIB=ON \ -DARROW_WITH_ZSTD=ON \ -DPARQUET_REQUIRE_ENCRYPTION=ON \ -DARROW_FLIGHT=ON \ ..
And with the following env variables when building pyarrow:
As visible from the screenshot I am able to see the source code and access the variables in that frame for some dylib, but not for the one I want:
Only assembly seems to be available.
I am working on a:
I appreciate any help you could give.
Thanks
Component(s)
C++, FlightRPC, Python
The text was updated successfully, but these errors were encountered: