Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to call PyEval_InitThreads() on Python < 3.7 #37

Open
gri6507 opened this issue Oct 23, 2020 · 12 comments
Open

Need to call PyEval_InitThreads() on Python < 3.7 #37

gri6507 opened this issue Oct 23, 2020 · 12 comments

Comments

@gri6507
Copy link

gri6507 commented Oct 23, 2020

I have a C++ library with one function that takes a function pointer (let's call it register(my_callback_t cb)) as a callback. I want to wrap this with PyBindGen so that I can call the register function from Python, passing in a Python function object as the function to call in the callback.

Based on examples, I created the ForwardWrapperBase derived class for the my_callback_t function type, where I map the Python caller's PyObject * for the callback to the internal C++ callback. Then, when the internal C++ callback is called by my library, I manually call the Python object again, using the following pattern:

PyGILState_STATE gstate;
gstate = (PyEval_ThreadsInitialized() ? PyGILState_Ensure() : (PyGILState_STATE) 0);

// Convert C++ arguments to Python objects
PyObject *arglist = Py_BuildValue("(sssO)", id.c_str(), name.c_str(), value.c_str(), valid ? Py_True: Py_False);

// Using the ID, find the PyObject * which corresponds to the registered callback
PyObject *result = PyObject_CallObject(cb_map.at(id), arglist);
Py_DECREF(arglist);

if (NULL == result)
{
    if (NULL == PyErr_Occurred())
    {
        printf("@@@ PythonBinging: Unknown error occurred in subscription callback\n");
    }
    else
    {
        PyErr_Print();
    }
}
Py_XDECREF(result);

if (PyEval_ThreadsInitialized())
{
  PyGILState_Release(gstate);
}

The above works well for the most part, but occasionally causes SEGFLT in the main python thread after the entirety of both the Python and C++ callbacks complete.

The two questions I have are:

  1. Do you see something wrong with the above?
  2. I noticed that there is a thing called ReverseWrapperBase in PyBindGen, which seems to be related to C code calling back into Python. However, I cannot figure out how to use it and there do not appear to be any examples using it. Can you please provide an example?
@gjcarneiro
Copy link
Owner

I don't have a good example of ReverseWrapperBase, the closest is pybinden's own sources:

class CppVirtualMethodProxy(ReverseWrapperBase):
    """
    Class that generates a proxy virtual method that calls a similarly named python method.
    """
    ...

I think it would probably be overkill to use ReverseWrapperBase if it's just one or two methods in your entire codebase. All pybindgen does is automate writing the C source code for an extension, it's not magic.

As for the code you posted, it looks alright to me! You can try running with valgrind, maybe it will spot something.

Could it be something specific to shutdown sequence of events? Are you absolutely sure that object cb_map is not still alive and trying to trigger callbacks from threads after the python interpreter state is already finalised or being finalised?

Or else, are you sure cb_map.at(id) always contains a strong reference to a python object? Could there be an invalid memory access or race condition there?

@gri6507
Copy link
Author

gri6507 commented Oct 26, 2020

Thank you for the feedback!

are you sure cb_map.at(id) always contains a strong reference to a python object? Could there be an invalid memory access or race condition there

I get the feeling that you're pretty close to the problem here, but I am struggling to find the solution. If you don't mind, can you please consider any possible errors in the following:

My python program is

import sys
from time import sleep
import mymodule

def cov_cb(s_id, s_name, s_value, b_valid):
    pass
    # print("In cov_cb: '" + str(s_id) + "' '" + str(s_name) + "' '" + str(s_value) + "' '" + str(b_valid) + "'")

for id in mymodule.get_ids():
    status = mymodule.register(cov_cb,  id)
    if status != mymodule.NO_ERRORS:
        print("Unable to register")
        sys.exit()
        
print("@@@ Sleeping now")
print("@@@ At end refcount=%d" % sys.getrefcount(cov_cb))
while True:
    sleep(1)

The relevant C extension module code (stripped down and simplified) is

#include <map>
#include <mutex>

static std::mutex register_cb_map_mtx;
static std::map<std::string, std::map<std::string, PyObject *>> register_cb_map;

PyObject * _wrap_PySoitsTransceiver_register__0(PySoitsTransceiver *self, PyObject *args, PyObject *kwargs, PyObject **return_exception)
{
    PyObject *py_retval;
    momodule::status retval;
    PyThreadState *py_thread_state = NULL;
    PyObject *register_cb;
    const char *id = NULL;
    Py_ssize_t id_len;
    std::string id_std;
    const char *keywords[] = {"register_cb", "id", NULL};

    if (!PyArg_ParseTupleAndKeywords(args, kwargs, (char *) "Os", (char **) keywords, &register_cb, &id, &id_len)) {
        {
            PyObject *exc_type, *traceback;
            PyErr_Fetch(&exc_type, return_exception, &traceback);
            Py_XDECREF(exc_type);
            Py_XDECREF(traceback);
        }
        return NULL;
    }    
    if (!PyCallable_Check(register_cb)) {
        PyErr_SetString(PyExc_TypeError, "register_cb_t parameter must be callable");
        {
            PyObject *exc_type, *traceback;
            PyErr_Fetch(&exc_type, return_exception, &traceback);
            Py_XDECREF(exc_type);
            Py_XDECREF(traceback);
        }
        return NULL;
    }
    
    // Increase the ref count here, but DO NOT decrease it here. That will be done in the `unregister` function call
    Py_INCREF(register_cb);

    const std::string id_s(std::string(id));
    std::scoped_lock<std::mutex> lk(register_cb_map_mtx);
    if (0 != register_cb_map.count(id_s))
    {    
      fprintf(stderr, "ERROR: do not allow multiple registrations with same id\n");
      abort();
    }
    register_cb_map.insert(std::make_pair(id_s, register_cb));

    if (PyEval_ThreadsInitialized ())
         py_thread_state = PyEval_SaveThread();

    retval = self->obj->reg(_wrap_register_cb, id_s);

    if (py_thread_state)
         PyEval_RestoreThread(py_thread_state);

    py_retval = Py_BuildValue((char *) "i", retval);
    return py_retval;
}

void _wrap_register_cb(std::string id, std::string name, std::string value, bool valid)
{
    std::scoped_lock<std::mutex> lk(register_cb_map_mtx);
    if (0 != register_cb_map.count(id))
    {
        PyGILState_STATE gstate;
        gstate = (PyEval_ThreadsInitialized() ? PyGILState_Ensure() : (PyGILState_STATE) 0);

        PyObject *arglist = Py_BuildValue("(sssO)", id.c_str(), name.c_str(), value.c_str(), valid ? Py_True: Py_False);
        PyObject *result = PyObject_CallObject(register_cb_map.at(name).at(id), arglist);
        Py_DECREF(arglist);

        if (NULL == result)
        {
            if (NULL == PyErr_Occurred())
            {
                printf("Unknown error occurred in callback\n");
            }
            else
            {
                PyErr_Print();
            }
        }
        else
        {
            Py_DECREF(result);
        }

        if (PyEval_ThreadsInitialized())
        {
            PyGILState_Release(gstate);
        }
    }
    else
    {
        printf("Unexpected callback for id='%s'\n", id.c_str());
    }
}

The callbacks are done from the linked library which implements the reg method, starts a new thread and generates the callbacks as needed. What I noticed is that the callbacks come in just fine while the Python for loop is busy registering thousands of entries. However, as soon as it's done (I added the print statements to ensure), I get a segfault shortly thereafter.

@gjcarneiro
Copy link
Owner

Well, I see the Py_INCREF() of the object inside the map, which is what is correct.

OTOH, this is a map with another map inside, but the code you pasted does register_cb_map.insert(std::make_pair(id_s, register_cb));, which I think doesn't account for the second inner map? My C++ is a little rusty, I could be missing something, or you over-simplified the code for pasting here...

Otherwise, I suggest you try debugging tools -- valgrind and gdb -- they may help understand why and where is the segfault.

@gri6507
Copy link
Author

gri6507 commented Oct 26, 2020

OTOH, this is a map with another map inside

I apologize. You are correct. Like I said above, I was trying to simplify the code to convey only the relevant pieces. I do indeed have a map of maps in the real implementation. It is my error that I forgot to cleanup that remaining reference to that. However, I do not believe this is related to my issue at hand.

@gri6507
Copy link
Author

gri6507 commented Oct 27, 2020

In case you're curious, I have exhausted all of my tools in troubleshooting this. I've distilled the problem down to as simple a case as possible, not even using PyBindGen. You can see my question about that very simple test case at https://stackoverflow.com/questions/64559322/python-extension-module-with-callbacks-into-python

@gjcarneiro
Copy link
Owner

I tested your code, and the only problem I found was extreme contention for the deq_mtx mutex. This is caused by sleeping while holding the lock.

If you're going to sleep, you should release the lock, sleep, then re-acquire it.

After that, it runs on my box (python 3.8.5) without any issues. Even valgrind detects no memory problems.

The modified worker code that works for me is:

void static worker()
{
  size_t x = 0;
  bool do_sleep = false;
  while(true)
  {
  	if (do_sleep) {
  		usleep(1000);
  		do_sleep = false;
  	}
    std::scoped_lock<std::mutex> lk(deq_mtx);
    if (deq.size() == 0)
    {
      do_sleep = true;
      continue;
    }

    auto info = deq.front();
    deq.pop_front();
    for (unsigned long i=0; i<info.num_cb; i++)
    {
      internal_cb(info.id, std::to_string(x++));
    }
  }
}

@gri6507
Copy link
Author

gri6507 commented Oct 27, 2020

Thank you for taking a look. I appreciate your recommendation to not keep the C++ mutex while sleeping. It simply doesn't make sense to hold that resource while sleeping. However, I do not understand how that could cause a deadlock.

Let's say the worker pthread hold deq_mtx and goes to sleep. This allows another pthread to wake up, including the possibility of it being the main Python thread. If that main python thread happens to make a call to the extension module doit() method, that would also try to take the same mutex, but will be held off. That should not be a deadlock. The only way I can see this becoming a deadlock is if the PyGILState_Ensure() method from inside the callback requires the main python thread to run for any length of time. But I did not think that was the case. Am I wrong on that?

Also, when you said it "worked just fine" for you, for how long did you run it? Did you try running multiple times? How many CPUs do you have on the test machine?

At least when I run it on my machine (Ubuntu 18.04 with 3.6.9 Python), I get various types of failures including segfaults, and PyEval_SaveThread: NULL tstate . Were you able to see any of those in any of your runs?

@gjcarneiro
Copy link
Owner

I did not experience deadlock exactly, simply observed it taking a long time Saving thread and Restoring thread. Running in gdb and interrupted confirmed that it was taking a long time acquiring a lock.

I forgot, also changed the test program, to sleep(0.001) inside the for loop. That also makes a difference. The program seems to stall without it.

Otherwise, I have 4 cores, I ran it a few times, with and without valgrind, never seen any trace of a segfault.

Note: didn't use your script to compile, as I have a different python version. What I did was create a simple setup.py:

from distutils.core import setup, Extension

module1 = Extension('mymod',
                    sources = ['test_python_cmodule.cpp'])

setup (name = 'PackageName',
       version = '1.0',
       description = 'This is a demo package',
       ext_modules = [module1])

Then build/install the module (inside virtualenv) with CFLAGS="-O0 -g -std=c++17" python setup.py install.

Could it be one of your compiler/linker options causing issues? 🤷‍♂️

@gri6507
Copy link
Author

gri6507 commented Oct 27, 2020

I believe I just found the problem. When I ran this in Python 3.8.5 like you, everything worked. Going back to 3.6.9 I got the errors (I used pyenv to easily toggle). I believe that if you were to repeat this test under 3.6.9, you will see the same problem.

@gri6507
Copy link
Author

gri6507 commented Oct 28, 2020

The observation that the problem only happens with Python 3.6 but not newer versions led to the solution (posted on comp.lang.python) which is that in these older versions of Python, PyEval_InitThreads() must be called explicitly at least once in the main thread. In Python3.7 it is automatically called by Python, with the actual function deprecated in Python 3.9, slated for complete removal from public API in Python 3.11.

So, to tie this information back to PyBindGen, it would be very nice if the auto-generated code did this automatically, perhaps in the SharedLibrary initialiazation, like so

static void __attribute__((constructor)) my_module_init()
{
  #if PY_VERSION_HEX < 0x03070000
    PyEval_InitThreads();
  #endif
}

This way, callbacks work even for older Python versions. Of course, if you add such functionality, you should probably also add some additional public API to allow developers to supplement the default code with whatever else they wanted

@gjcarneiro
Copy link
Owner

That's interesting, thanks for the tip.

@gjcarneiro gjcarneiro changed the title How to use ReverseWrapperBase Need to call PyEval_InitThreads() on Python < 3.7 Oct 28, 2020
@gjcarneiro
Copy link
Owner

I suspect that simply "import threading" in Python 3.6 does the necessary call to PyEval_InitThreads().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants