Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] debug: access symbols for libarrow_python_flight.dylib #38519

Closed
donatobarone opened this issue Oct 30, 2023 · 18 comments
Closed

[Python] debug: access symbols for libarrow_python_flight.dylib #38519

donatobarone opened this issue Oct 30, 2023 · 18 comments

Comments

@donatobarone
Copy link

Describe the usage question you have. Please include as many useful details as possible.

Hi,
I have been trying to get access to the symbols of this dylib for a while now and failed miserably. I have followed the guide for developers at this link https://arrow.apache.org/docs/dev/developers/python.html and I have successfully built and generated a pyarrow package locally that I can use in my pyenv. I was successfully able to print symbols and debug the source code files for some of the other dylib (e.g. libarrow_flight.1400.dylib) but when I get to libarrow_python_flight.dylib which is where I am getting a segmentation fault that I am trying to debug, I just see assembly.

I have built the lib with the following options:

cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
        -DCMAKE_INSTALL_LIBDIR=lib \
        -DCMAKE_BUILD_TYPE=Debug \
        -DARROW_BUILD_TESTS=ON \
        -DARROW_COMPUTE=ON \
        -DARROW_CSV=ON \
        -DARROW_DATASET=ON \
        -DARROW_FILESYSTEM=ON \
        -DARROW_HDFS=ON \
        -DARROW_JSON=ON \
        -DARROW_PARQUET=ON \
        -DARROW_WITH_BROTLI=ON \
        -DARROW_WITH_BZ2=ON \
        -DARROW_WITH_LZ4=ON \
        -DARROW_WITH_SNAPPY=ON \
        -DARROW_WITH_ZLIB=ON \
        -DARROW_WITH_ZSTD=ON \
        -DPARQUET_REQUIRE_ENCRYPTION=ON \
        -DARROW_FLIGHT=ON \
        ..

And with the following env variables when building pyarrow:

export PYARROW_WITH_PARQUET=1
export PYARROW_WITH_DATASET=1
export PYARROW_PARALLEL=4
export PYARROW_WITH_ARROW_FLIGHT=1
export PYARROW_WITH_FLIGHT=1
export PYARROW_BUNDLE_CYTHON_CPP=1
export PYARROW_BUILD_FLIGHT=1
export PYARROW_BUNDLE_ARROW_CPP=1

As visible from the screenshot I am able to see the source code and access the variables in that frame for some dylib, but not for the one I want:
image

Only assembly seems to be available.

I am working on a:

  • Mac M1 with Ventura 13.6.
  • Python 3.11

I appreciate any help you could give.
Thanks

Component(s)

C++, FlightRPC, Python

@kou
Copy link
Member

kou commented Oct 31, 2023

Could you rebuild your PyArrow with export PYARROW_BUILD_TYPE=debug?

kou pushed a commit that referenced this issue Oct 31, 2023
### Rationale for this change

It's not very clear that PyArrow will get built in release mode even if the linked Arrow C++ is built in debug mode. I added some text to the docs to make it more clear. An example of a user running into this is #38519.

### Are these changes tested?

No as they're just minor docs changes.

### Are there any user-facing changes?

This adds text to the Python docs.

Authored-by: Bryce Mecum <petridish@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@donatobarone
Copy link
Author

Hi @kou, thanks for the quick reply. Wouldn't this be achieved by this step as showed in the documentation via the --build-type parameter, rather than using the environment variable?

python setup.py build_ext --build-type=$ARROW_BUILD_TYPE \
         --bundle-arrow-cpp bdist_wheel

If the answer is yes, then that is how I have been building pyarrow as well, if not I will try in a couple of hours and report back.

@kou
Copy link
Member

kou commented Oct 31, 2023

--build-type should work too.

@donatobarone
Copy link
Author

--build-type should work too.

I see, that is what I have been using, any other ideas?

@kou
Copy link
Member

kou commented Oct 31, 2023

How about trying PYARROW_BUILD_TYPE?

@donatobarone
Copy link
Author

Okay I have tried rebuilding with the export PYARROW_BUILD_TYPE=debug but it didn't make any difference unfortunately. Still see only assembly. Is there a command I can run on the dylib to ensure it has been built correctly in debug mode?

@kou
Copy link
Member

kou commented Nov 1, 2023

Could you show the command line you tried as-is and full log of the command line?

@donatobarone
Copy link
Author

Yep, okay so posting here only the pyarrow building part, let me know if you need to see the arrow one as well:

(pyarrow-dev) XDWRL412MF:Projects dbarone$ pushd arrow/python
~/Projects/arrow/python ~/Projects ~/Projects ~/Projects
(pyarrow-dev) XDWRL412MF:python dbarone$ export PYARROW_WITH_PARQUET=1
(pyarrow-dev) XDWRL412MF:python dbarone$ export PYARROW_WITH_DATASET=1
(pyarrow-dev) XDWRL412MF:python dbarone$ export PYARROW_PARALLEL=4
(pyarrow-dev) XDWRL412MF:python dbarone$ export PYARROW_WITH_ARROW_FLIGHT=1
(pyarrow-dev) XDWRL412MF:python dbarone$ export PYARROW_WITH_FLIGHT=1
(pyarrow-dev) XDWRL412MF:python dbarone$ export PYARROW_BUNDLE_CYTHON_CPP=1
(pyarrow-dev) XDWRL412MF:python dbarone$ export PYARROW_BUILD_FLIGHT=1
(pyarrow-dev) XDWRL412MF:python dbarone$ export PYARROW_BUNDLE_ARROW_CPP=1
(pyarrow-dev) XDWRL412MF:python dbarone$ export PYARROW_BUILD_TYPE=debug
(pyarrow-dev) XDWRL412MF:python dbarone$ python setup.py build_ext --build-type=debug --bundle-arrow-cpp bdist_wheel &> output.log

This is the output.log file generated from the python command above.

output.log

@kou
Copy link
Member

kou commented Nov 2, 2023

-- Build Type: DEBUG

Your build used debug build.

It seems that LLDB can't find source files for libarrow_python_flight.dylib. You may be able to use settings set target.source-map for it: https://lldb.llvm.org/use/map.html#remap-source-file-pathnames-for-the-debug-session

@donatobarone
Copy link
Author

To be honest I don't fully understand what is the supposed build directory, looking at the output.log file I do see this one mentioned:

Build output directory: /Users/dbarone/Projects/arrow/python/build/temp.macosx-13-arm64-cpython-311/debug

but I do see the actual source files in /Users/dbarone/Projects/arrow/python/build/lib.macosx-13-arm64-cpython-311/pyarrow/src/arrow/python/, I tried to set those using the following command:

settings set target.source-map /Users/dbarone/Projects/arrow/python/build/temp.macosx-13-arm64-cpython-311/debug /Users/dbarone/Projects/arrow/python/build/lib.macosx-13-arm64-cpython-311/pyarrow/src/arrow/python/

but it didn't make any difference.

@kou
Copy link
Member

kou commented Nov 7, 2023

Could you try cd python && rm -rf build && PYARROW_BUILD_TYPE=debug PYARROW_...=... pip install . instead of python setup.py build_ext ...?
It works for me.

loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…e#38522)

### Rationale for this change

It's not very clear that PyArrow will get built in release mode even if the linked Arrow C++ is built in debug mode. I added some text to the docs to make it more clear. An example of a user running into this is apache#38519.

### Are these changes tested?

No as they're just minor docs changes.

### Are there any user-facing changes?

This adds text to the Python docs.

Authored-by: Bryce Mecum <petridish@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…e#38522)

### Rationale for this change

It's not very clear that PyArrow will get built in release mode even if the linked Arrow C++ is built in debug mode. I added some text to the docs to make it more clear. An example of a user running into this is apache#38519.

### Are these changes tested?

No as they're just minor docs changes.

### Are there any user-facing changes?

This adds text to the Python docs.

Authored-by: Bryce Mecum <petridish@gmail.com>
Signed-off-by: Sutou Kouhei <kou@clear-code.com>
@amoeba
Copy link
Member

amoeba commented Sep 13, 2024

I found this issue while trying to figure out how to debug C++ code inside PyArrow and I think I'm seeing something similar to the above. With a debug Arrow C++ and PyArrow build, I can hit a breakpoint but I don't see source code mapping:

(venv) bryce@debian ~/s/a/a/python (main)> lldb -- $(which python)
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ModuleNotFoundError: No module named 'lldb'
(lldb) target create "/home/bryce/src/apache/arrow/python/venv/bin/python"
Current executable set to '/home/bryce/src/apache/arrow/python/venv/bin/python' (x86_64).
(lldb) b ConvertPySequence
Breakpoint 1: no locations (pending).
WARNING:  Unable to resolve breakpoint to any actual locations.
(lldb) run
Process 66372 launched: '/home/bryce/src/apache/arrow/python/venv/bin/python' (x86_64)
Python 3.11.2 (main, Aug 26 2024, 07:20:54) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow as pa
1 location added to breakpoint 1
warning: (x86_64) /home/bryce/src/apache/arrow/python/venv/lib/python3.11/site-packages/numpy.libs/libgfortran-040039e1-0352e75f.so.5.0.0 No LZMA support found for reading .gnu_debugdata section
>>> pa.array([1])
Process 66372 stopped
* thread #1, name = 'python', stop reason = breakpoint 1.1
    frame #0: 0x00007ffff6d6f510 libarrow_python.so`arrow::py::ConvertPySequence(_object*, _object*, arrow::py::PyConversionOptions, arrow::MemoryPool*)
libarrow_python.so`arrow::py::ConvertPySequence:
->  0x7ffff6d6f510 <+0>: pushq  %rbp
    0x7ffff6d6f511 <+1>: movq   %rsp, %rbp
    0x7ffff6d6f514 <+4>: pushq  %r15
    0x7ffff6d6f516 <+6>: movq   %rsi, %r15

One thing I notice is that libarrow_python.so doesn't have debug info like libarrow.so:

(venv) bryce@debian ~/s/a/a/python (main)> file pyarrow/libarrow_python.so
pyarrow/libarrow_python.so: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, BuildID[sha1]=be0d7e9dcfba902687a81fec19e5565c4c1d3626, not stripped
(venv) bryce@debian ~/s/a/a/python (main)> file $ARROW_HOME/lib/libarrow.so.1800.0.0
/home/bryce/builds/arrow-x86_64/lib/libarrow.so.1800.0.0: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, BuildID[sha1]=282a70ef10aec0f43eb1f10ed176eaa4f993fab2, with debug_info, not stripped
(venv) bryce@debian ~/s/a/a/python (main)> 

(In the output above, libarrow_python.so is missing the string "debug_info" in its output).

Source code mapping is working with with symbols in Arrow C++:

(lldb) b Result
Breakpoint 2: 1093 locations.
(lldb) run
There is a running process, kill it and restart?: [Y/n] Y
Process 67013 exited with status = 9 (0x00000009) 
Process 67276 launched: '/home/bryce/src/apache/arrow/python/venv/bin/python' (x86_64)
Python 3.11.2 (main, Aug 26 2024, 07:20:54) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow as pa
Process 67276 stopped
* thread #1, name = 'python', stop reason = breakpoint 2.384
    frame #0: 0x00007ffff3d31e12 libarrow.so.1800`arrow::Result<arrow::fs::FileSystemFactoryRegistry::Registered>::Result(this=0x0000000000c7c5e8, value=0x00007fffffffbee0) at result.h:178:29
   175    // NOTE `Result(U&& value)` above should be sufficient, but some compilers
   176    // fail matching it.
   177    Result(T&& value) noexcept {  // NOLINT(runtime/explicit)
-> 178      ConstructValue(std::move(value));
   179    }
   180 
   181    /// Copy constructor.

Any ideas @kou?

@kou
Copy link
Member

kou commented Sep 13, 2024

Could you try PYARROW_BUILD_TYPE=debug environment variable when you build PyArrow?

arrow/python/setup.py

Lines 170 to 171 in 0bdb5be

self.build_type = os.environ.get('PYARROW_BUILD_TYPE',
'release').lower()

@amoeba
Copy link
Member

amoeba commented Sep 13, 2024

I did (and generally do). Let me record a full log of a clean build and share it here.

@amoeba
Copy link
Member

amoeba commented Sep 13, 2024

Here's a full log of my install process: https://gist.github.com/amoeba/010c82a7818f8608aef84ca27864c679.

@amoeba
Copy link
Member

amoeba commented Sep 13, 2024

Hrm. Of course it's now working as expected. i.e., I have source code mapping to PyArrow C++ source:

* thread #1, name = 'python', stop reason = breakpoint 1.1
    frame #0: 0x00007ffff6ad8b3b libarrow_python.so`arrow::py::ConvertPySequence(obj=0x00007fffec16c6c0, mask=0x000000000095bcc0, options=PyConversionOptions @ 0x00007fffffffcf80, pool=0x00007ffff631e280) at python_to_arrow.cc:1233:16
   1230 Result<std::shared_ptr<ChunkedArray>> ConvertPySequence(PyObject* obj, PyObject* mask,
   1231                                                         PyConversionOptions options,
   1232                                                         MemoryPool* pool) {
-> 1233   PyAcquireGIL lock;
   1234
   1235   PyObject* seq = nullptr;
   1236   OwnedRef tmp_seq_nanny;

Edit: And the pyarrow_flight SO seems to have debug info so I'm going to see if the OP's issue works now.

venv ❯ file pyarrow/libarrow_python_flight.so 
pyarrow/libarrow_python_flight.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=2bd5047145654279c0c69b71559c84e07d4b54b3, with debug_info, not stripped

@kou
Copy link
Member

kou commented Sep 13, 2024

Oh... It's strange... But it's an expected result...

@amoeba
Copy link
Member

amoeba commented Sep 13, 2024

Maybe just caused by not having a totally clean environment. Thanks for the help @kou.

I still wanted to check to see I had the right debug info for the source code mapping the OP was having trouble with and I think I do:

(lldb) image lookup -v -n Authenticate
warning: (x86_64) /home/bryce/src/apache/arrow/python/pyarrow/libarrow_python_flight.so 0x0004c3a5: DW_AT_specification(0x000352af) has no decl
3 matches found in /home/bryce/src/apache/arrow/python/pyarrow/libarrow_python_flight.so:
        Address: libarrow_python_flight.so[0x0000000000027526] (libarrow_python_flight.so.PT_LOAD[1]..text + 1750)
        Summary: libarrow_python_flight.so`arrow::py::flight::PyClientAuthHandler::Authenticate(arrow::flight::ClientAuthSender*, arrow::flight::ClientAuthReader*) at flight.cc:66:85
         Module: file = "/home/bryce/src/apache/arrow/python/pyarrow/libarrow_python_flight.so", arch = "x86_64"
    CompileUnit: id = {0x00000000}, file = "/home/bryce/src/apache/arrow/python/pyarrow/src/arrow/python/flight.cc", language = "c++14"
       Function: id = {0x0004b6c3}, name = "arrow::py::flight::PyClientAuthHandler::Authenticate(arrow::flight::ClientAuthSender*, arrow::flight::ClientAuthReader*)", mangled = "_ZN5arrow2py6flight19PyClientAuthHandler12AuthenticateEPNS_6flight16ClientAuthSenderEPNS3_16ClientAuthReaderE", range = [0x00007fffe7fe6526-0x00007fffe7fe656f)
       FuncType: id = {0x0004b6c3}, byte-size = 0, decl = flight.cc:65:8, compiler_type = "class arrow::Status (class arrow::flight::ClientAuthSender *, class arrow::flight::ClientAuthReader *) const"
         Blocks: id = {0x0004b6c3}, range = [0x7fffe7fe6526-0x7fffe7fe656f)
      **LineEntry: [0x00007fffe7fe6526-0x00007fffe7fe653e): /home/bryce/src/apache/arrow/python/pyarrow/src/arrow/python/flight.cc:66:85**
         Symbol: id = {0x000004ce}, range = [0x00007fffe7fe6526-0x00007fffe7fe656f), name="arrow::py::flight::PyClientAuthHandler::Authenticate(arrow::flight::ClientAuthSender*, arrow::flight::ClientAuthReader*)", mangled="_ZN5arrow2py6flight19PyClientAuthHandler12AuthenticateEPNS_6flight16ClientAuthSenderEPNS3_16ClientAuthReaderE"
       Variable: id = {0x0004b6e2}, name = "this", type = "arrow::py::flight::PyClientAuthHandler *const", location = DW_OP_fbreg -64, decl = 
       Variable: id = {0x0004b6ee}, name = "outgoing", type = "arrow::flight::ClientAuthSender *", location = DW_OP_fbreg -72, decl = flight.cc:65:75
       Variable: id = {0x0004b6fe}, name = "incoming", type = "arrow::flight::ClientAuthReader *", location = DW_OP_fbreg -80, decl = flight.cc:66:75
        Address: libarrow_python_flight.so[0x000000000002710e] (libarrow_python_flight.so.PT_LOAD[1]..text + 702)
        Summary: libarrow_python_flight.so`arrow::py::flight::PyServerAuthHandler::Authenticate(arrow::flight::ServerAuthSender*, arrow::flight::ServerAuthReader*) at flight.cc:41:85
         Module: file = "/home/bryce/src/apache/arrow/python/pyarrow/libarrow_python_flight.so", arch = "x86_64"
    CompileUnit: id = {0x00000000}, file = "/home/bryce/src/apache/arrow/python/pyarrow/src/arrow/python/flight.cc", language = "c++14"
       Function: id = {0x0004bbe7}, name = "arrow::py::flight::PyServerAuthHandler::Authenticate(arrow::flight::ServerAuthSender*, arrow::flight::ServerAuthReader*)", mangled = "_ZN5arrow2py6flight19PyServerAuthHandler12AuthenticateEPNS_6flight16ServerAuthSenderEPNS3_16ServerAuthReaderE", range = [0x00007fffe7fe610e-0x00007fffe7fe6157)
       FuncType: id = {0x0004bbe7}, byte-size = 0, decl = flight.cc:40:8, compiler_type = "class arrow::Status (class arrow::flight::ServerAuthSender *, class arrow::flight::ServerAuthReader *) const"
         Blocks: id = {0x0004bbe7}, range = [0x7fffe7fe610e-0x7fffe7fe6157)
      LineEntry: [0x00007fffe7fe610e-0x00007fffe7fe6126): /home/bryce/src/apache/arrow/python/pyarrow/src/arrow/python/flight.cc:41:85
         Symbol: id = {0x0000047e}, range = [0x00007fffe7fe610e-0x00007fffe7fe6157), name="arrow::py::flight::PyServerAuthHandler::Authenticate(arrow::flight::ServerAuthSender*, arrow::flight::ServerAuthReader*)", mangled="_ZN5arrow2py6flight19PyServerAuthHandler12AuthenticateEPNS_6flight16ServerAuthSenderEPNS3_16ServerAuthReaderE"
       Variable: id = {0x0004bc06}, name = "this", type = "arrow::py::flight::PyServerAuthHandler *const", location = DW_OP_fbreg -64, decl = 
       Variable: id = {0x0004bc12}, name = "outgoing", type = "arrow::flight::ServerAuthSender *", location = DW_OP_fbreg -72, decl = flight.cc:40:75
       Variable: id = {0x0004bc22}, name = "incoming", type = "arrow::flight::ServerAuthReader *", location = DW_OP_fbreg -80, decl = flight.cc:41:75
        Address: libarrow_python_flight.so[0x000000000002c612] (libarrow_python_flight.so.PT_LOAD[1]..text + 22466)
        Summary: libarrow_python_flight.so`arrow::flight::ServerAuthHandler::Authenticate(arrow::flight::ServerCallContext const&, arrow::flight::ServerAuthSender*, arrow::flight::ServerAuthReader*) at server_auth.h:63:18
         Module: file = "/home/bryce/src/apache/arrow/python/pyarrow/libarrow_python_flight.so", arch = "x86_64"
    CompileUnit: id = {0x00000000}, file = "/home/bryce/src/apache/arrow/python/pyarrow/src/arrow/python/flight.cc", language = "c++14"
       Function: id = {0x0004c3a5}, name = "arrow::flight::ServerAuthHandler::Authenticate(arrow::flight::ServerCallContext const&, arrow::flight::ServerAuthSender*, arrow::flight::ServerAuthReader*)", mangled = "_ZN5arrow6flight17ServerAuthHandler12AuthenticateERKNS0_17ServerCallContextEPNS0_16ServerAuthSenderEPNS0_16ServerAuthReaderE", range = [0x00007fffe7feb612-0x00007fffe7feb658)
       FuncType: id = {0x0004c3a5}, byte-size = 0, decl = server_auth.h:63:18, compiler_type = "class arrow::Status (const class arrow::flight::ServerCallContext &, class arrow::flight::ServerAuthSender *, class arrow::flight::ServerAuthReader *) const"
         Blocks: id = {0x0004c3a5}, range = [0x7fffe7feb612-0x7fffe7feb658)
      LineEntry: [0x00007fffe7feb612-0x00007fffe7feb62e): /home/bryce/builds/arrow-x86_64/include/arrow/flight/server_auth.h:63:18
         Symbol: id = {0x0000026a}, range = [0x00007fffe7feb612-0x00007fffe7feb658), name="arrow::flight::ServerAuthHandler::Authenticate(arrow::flight::ServerCallContext const&, arrow::flight::ServerAuthSender*, arrow::flight::ServerAuthReader*)", mangled="_ZN5arrow6flight17ServerAuthHandler12AuthenticateERKNS0_17ServerCallContextEPNS0_16ServerAuthSenderEPNS0_16ServerAuthReaderE"
       Variable: id = {0x0004c3c4}, name = "this", type = "arrow::flight::ServerAuthHandler *const", location = DW_OP_fbreg -32, decl = 
       Variable: id = {0x0004c3d0}, name = "context", type = "const arrow::flight::ServerCallContext &", location = DW_OP_fbreg -40, decl = server_auth.h:63:56
       Variable: id = {0x0004c3df}, name = "outgoing", type = "arrow::flight::ServerAuthSender *", location = DW_OP_fbreg -48, decl = server_auth.h:64:49
       Variable: id = {0x0004c3ee}, name = "incoming", type = "arrow::flight::ServerAuthReader *", location = DW_OP_fbreg -56, decl = server_auth.h:64:77
<<< truncated >>>

Specifically,

LineEntry: [0x00007fffe7fe6526-0x00007fffe7fe653e): /home/bryce/src/apache/arrow/python/pyarrow/src/arrow/python/flight.cc:66:85

So I think that confirms my build is working in a way that would work for the OP's situaton.

@donatobarone I know it's been a while since you were working on this but I think we should be able to help get you set up to debug the issue you were having. I'm going to close this issue for the moment but please comment and we can re-open it.

@amoeba amoeba closed this as completed Sep 13, 2024
@amoeba amoeba changed the title debug: access symbols for libarrow_python_flight.dylib [Python] debug: access symbols for libarrow_python_flight.dylib Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants