Skip to content

Debugging tips

Isaiah Norton edited this page Apr 29, 2024 · 14 revisions

Building with debug symbols

  • python setup.py develop --debug or
  • pip install --use-pep517 --config-setting="--build-option=--debug"

Cython debugging for TileDB-Py

Debugging on Linux

  • Cython provides a cython-aware gdb frontend, cygdb:

  • However, gdb/cygdb are only practically useful on Linux, because gdb does not work well on newer versions of macOS.

  • checking the version of libtiledb in the running python process:

    • import os; os.getpid() to get the pid from python
    • in shell, pmap <pid> | grep libtiledb

Debugging on macOS

  • It is reasonably practical to single-step debug small sections of the Cython-generated C++ code. Some familiarity with the CPython object model is very helpful here.

  • The Cython option Cython.Compiler.Options.``emit_code_comments controls whether Cython emits a copy of the source code into the output C++ file; this is on by default and should be enabled for debugging. Each line of C++ code will be preceded by a commented-out version of the source Cython code.

  • Each context block in the generated C++ will have the corresponding line number in the original Cython code. So, start from a Cython line number, find that block, and set a breakpoint at the line below the context comment in the generated libtiledb.cpp.

  • In order to see all of the python code corresponding to C++ code while single-stepping, it is recommended to increase the lldb code-listing verbosity:

    (lldb) settings set stop-line-count-before 8
    
  • Start the python interpreter under lldb and run a command which will invoke the targeted section of Cython/C++ code.

    • or run a script (potentially w/ args). Assuming LINENO in libtiledb.cpp as per above:
    $ lldb -- python -i MYSCRIPT.py
    (lldb) b libtiledb.cpp:LINENO
    >>> import tiledb
    >>> [run command to trigger breakpoint, then step, view values, etc.]
    
    • To print Cython PyObject* variables in the debugger, install the following LLDB script: https://github.com/malor/cpython-lldb

    • Then, within a libtiledb.cpp frame:

      • individual PyObject* variables should pretty-print with p, for example: p __pyx_v_uri
      • the LLDB command frame variable will show known variables in the frame

- Ideally, the Cython code will have primitive types which can be printed with the usual lldb p(rint) command. However, to print the contents of a PyObject* inside the debugger, see the following discussion; these commands may be called in the debugger: - https://stackoverflow.com/questions/5356773/python-get-string-representation-of-pyobject

  • checking the version of libtiledb in the running python process:
    • import os; os.getpid()
    • in shell: vmmap -p <pid> | grep libtiledb

Misc debugging

  • Given a memory address, ADDR, ctypes may be used to read value(s) from that address:

    >>> import ctypes
    >>> p = ctypes.cast(ADDR, ctypes.POINTER(ctypes.c_uint64))
    >>> p[0], p[1]
        ^ equivalent to *p *(p+1) etc.
    
  • Defining the following function will allow most tests to be copy-pasted into the REPL from test_libtiledb.py, and run directly:

    >>> import tiledb, numpy as np
    >>> self = lambda: None; self.path = lambda x: os.path.join("/tmp", x)
    >>> [paste non-indented test block, and run]
    

Debugging on macOS with gdb (note: does not currently work):

Analyzing reference count problems

Given a function (in pure python) which creates a DenseArray:

def foo():
  arr = tiledb.DenseArray(...)
  import pdb; pdb.set_trace()

Entering pdb at this point, we can print out the array:

(Pdb) p arr
<tiledb.libtiledb.DenseArray object at 0x000000123456789>

Copy the address!

Now, set a breakpoint (or repeat pdb.set_trace()) in a location where we expect the refcount of arr to be zero -- for example, some location after the function return. At that point we can check the refcount and referrers as follows:

(Pdb) import ctypes, sys
(Pdb) o = ctypes.cast(0x000000123456789, ctypes.py_object)
(Pdb) o
py_object(<tiledb.libtiledb.DenseArray object at 0x000000123456789>)
(Pdb) sys.getrefcount(o.value)
?
(Pdb) gc.get_referrers(o.value)
[...]

(note that ctypes.cast(<addr>, ctypes.py_object) does not increase the refcount of the target object -- which can be verified by assigning a second variable to the identical ctypes.cast call.

Running against libtiledb with address sanitizer

TileDB-Py can be run with libtiledb compiled aginst address sanitizer, by using the --enable-sanitizer=address TileDB bootstrap option, and then preloading the ASAN library before running TileDB-Py:

export LD_PRELOAD=/usr/lib64/libasan.so.4.0.0

(path above is for CentOS 7 / AL2; paths will vary based on Linux distribution)

Building with address sanitizer

TileDB-Py may be built with address sanitizer support using the following exports before running setup.py:

export LFLAGS="-fsanitize=address"
export CXXFLAGS="-fsanitize=address -g -fno-omit-frame-pointer"