Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Default member function argument causes leak #69

Closed
qnzhou opened this issue Sep 23, 2022 · 10 comments
Closed

[BUG]: Default member function argument causes leak #69

qnzhou opened this issue Sep 23, 2022 · 10 comments

Comments

@qnzhou
Copy link
Contributor

qnzhou commented Sep 23, 2022

Problem description

I noticed that when a class member function has a default argument value that is of a enum class type, nanobind will warn that the enum type is leaked. For example, here is the output from the attached code:

% python -c "import nanobind_example"
nanobind: leaked 3 instances!
nanobind: leaked 1 types!
 - leaked type "Color"
nanobind: this is likely caused by a reference counting issue in the binding code.

In contrast, default value for function arguments does not seem to cause this warning.

Reproducible example code

#include <nanobind/nanobind.h>

#include <iostream>

namespace nb = nanobind;
using namespace nb::literals;

enum class Color { Red, Green, Blue };

class ColorMap {
   public:
    ColorMap() = default;
};

NB_MODULE(nanobind_example_ext, m) {
    nb::enum_<Color>(m, "Color")
        .value("Red", Color::Red)
        .value("Green", Color::Green)
        .value("Blue", Color::Blue);

    nb::class_<ColorMap>(m, "ColorMap")
        .def(nb::init<>())
        .def(
            "print_color",
            [](ColorMap& self, Color c) {
                switch (c) {
                    case Color::Red:
                        std::cout << "red" << std::endl;
                        break;
                    case Color::Green:
                        std::cout << "green" << std::endl;
                        break;
                    case Color::Blue:
                        std::cout << "blue" << std::endl;
                        break;
                }
            },
            "color"_a = Color::Red); // <-- This causes leak.
}
@MatthiasKohl
Copy link
Contributor

I tracked down another case which causes a leak, and I believe that it is very similar to this bug:

If you define any type (class/enum/etc.) in an extension module, then import it along with PyTorch and add a function signature like this : foo(arg: typing.Optional[extension.CustomType] = None) then I'm getting a leak.

Here is a reproducer in python, the extension module is simply called leak_ext and placed in a package leak_ext and only defines a simple type named Enum (I've tested with both nb::class_ and nb::enum_):

from leak_ext import leak_ext

import torch
import typing

def test(t: typing.Optional[leak_ext.Enum] = None):
    print(t)


def main():
    test(leak_ext.Enum.A)
    test(None)
    assert False, "oops"


if __name__ == '__main__':
    main()

What's interesting is that in my environment, I don't get any leak if I remove the torch import or even if I just remove the typing.Optional part.

This is with a standard PyTorch container (version 1.13 alpha), and then pip install wheel setuptools scikit-build>=0.13.1 cmake>=3.20.1,!=3.23.0 nanobind>=0.0.3 ninja

@MatthiasKohl
Copy link
Contributor

@wjakob do you have any idea how to track/debug such leaks ? Since we have fairly simple reproducers here, I wouldn't mind looking into this, but I tried stepping through the code with gdb and didn't really get anything from that for now.

@MatthiasKohl
Copy link
Contributor

After looking into a few other bugs like #19 , it looks like these leaks originate from other libraries in most cases.
Do you think it makes sense to be able to turn off the leak report with a build flag ?
For users who aren't familiar with nanobind it would be quite confusing to get a large leak report, even though for the developer of an extension module, it can be very useful.

@wjakob
Copy link
Owner

wjakob commented Oct 13, 2022

@qnzhou : I think I know what is happening. But the weird thing is that I couldnot reproduce the bug with the code snippet from the first commit. I had to change it to introduce a true cyclic dependency as follows:

#include <nanobind/nanobind.h>

namespace nb = nanobind;
using namespace nb::literals;

enum class Color { Red };

NB_MODULE(nanobind_example_ext, m) {
    nb::enum_<Color>(m, "Color")
        .value("Red", Color::Red)
        .def("dummy", [](Color, Color) { }, "arg"_a = Color::Red);
}

The issue here is that the garbage collector was not aware of a reference cycle involving function default arguments. I made a tentative patch in PR #86 that tries to fix this issue. A request for you @qnzhou : could you please verify that you still have the issue without PR #86 on the latest master branch, and that it appears after applying the changes from PR #86? Many thanks!

@MatthiasKohl I am not sure what is going on with typing in combination with big libraries (pytorch, pandas). It seems to me that there is really a bug somewhere in there, which is not an issue with pybind11. The issue you point out is not related to the one mentioned by @qnzhou

@wjakob
Copy link
Owner

wjakob commented Oct 13, 2022

@MatthiasKohl : Found the issue. It's a LRU cache in typing that has already come up as problematic before (https://bugs.python.org/issue28649). The leak can be avoided by calling

for cleanup in typing._cleanups:
    cleanup()

before interpreter shutdown.

Here is a simplified example adapted from your snippet:

import sys
sys.path.append('tests')
import test_classes_ext as t

import torch
import typing

def test(t: typing.Optional[t.Dog] = None):
    print(t)

# Uncomment to stop leak warnings
# for cleanup in typing._cleanups:
#     cleanup()

qnzhou added a commit to qnzhou/nanobind_example that referenced this issue Oct 13, 2022
@qnzhou
Copy link
Contributor Author

qnzhou commented Oct 13, 2022

A request for you @qnzhou : could you please verify that you still have the issue without PR #86 on the latest master branch,

Yes, I have created a repo to reproduce the problem:

pip install git+ssh://git@github.com/qnzhou/nanobind_example.git

and once installed, importing the package causes the error:

% python -c "import nanobind_example"
nanobind: leaked 3 instances!
nanobind: leaked 1 types!
 - leaked type "nanobind_example.nanobind_example_ext.Color"
nanobind: this is likely caused by a reference counting issue in the binding code.

This is tested on MacOS 10.15.7 and python 3.9.12.

and that it appears after applying the changes from PR #86? Many thanks!

With #86 applied, the error is gone. Many thnaks!

@wjakob
Copy link
Owner

wjakob commented Oct 13, 2022

Great, thanks -- I will merge #86 then.

@MatthiasKohl , here is a snippet you can include in your codebase to fix the leak issue:

import atexit

def cleanup():
    import typing
    for cleanup in typing._cleanups:
        cleanup()

atexit.register(cleanup)

@wjakob wjakob closed this as completed Oct 13, 2022
@MatthiasKohl
Copy link
Contributor

@wjakob great, I'll test this tomorrow. Thank you for helping out !

@wjakob
Copy link
Owner

wjakob commented Oct 13, 2022

I filed an issue with cpython: python/cpython#98253

@MatthiasKohl
Copy link
Contributor

This was indeed the issue for my case, so thanks for filing that issue with cpython ! For now, the atexit cleanup works as intended

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants