Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak on pyo3 object created within Rust #2853

Closed
haixuanTao opened this issue Jan 3, 2023 · 9 comments
Closed

Memory leak on pyo3 object created within Rust #2853

haixuanTao opened this issue Jan 3, 2023 · 9 comments
Labels

Comments

@haixuanTao
Copy link
Contributor

Bug Description

I think there is a memory leak when some args are used by a call_method1 in a Rust function that is being packaged and called from Python. Those args does not get removed until the very end of the function.

In my case, the args in question were PyBytes and PyDict. I do not know if it is specifically linked to this type.

To fix it, I have manually removed the reference with:

                unsafe {
                    ffi::Py_DECREF(bytes.as_ptr());
                    ffi::Py_DECREF(input_dict.as_ptr());
                }

See: dora-rs/dora#168
This fix was inspired by: #1801

Steps to Reproduce

You'll find a minimal reproducible example here: https://github.com/haixuanTao/pyo3_memory_leak

The expected behaviour is that the PyBytes get unallocated at each loop step.

Backtrace

No response

Your operating system and version

Linux Ubuntu 20.04

Your Python version (python --version)

Python 3.7.9

Your Rust version (rustc --version)

rustc 1.66.0 (69f9c33d7 2022-12-12)

Your PyO3 version

0.16

How did you install python? Did you use a virtualenv?

conda

Additional Info

No response

@haixuanTao haixuanTao added the bug label Jan 3, 2023
@mejrs
Copy link
Member

mejrs commented Jan 3, 2023

It doesn't leak memory for me (windows 11, python 3.10, pyo3 main). Can you try with a more recent pyo3 version?

Also, merging dora-rs/dora#168 is a bad idea, this will blow up really badly if it either doesn't always happen or if you switch to a pyo3 version that doesn't leak.

@mejrs
Copy link
Member

mejrs commented Jan 3, 2023

Also, can you reproduce this without using numpy? We can't do anything if that has the memory leak.

@haixuanTao
Copy link
Contributor Author

I'm pretty sure, Numpy does not leak on numpy.array. I just use it as a placeholder, but, you can try with another module that can use a large bytearray.

For info It also leak on python 3.10, pyo3 0.17.3 for me.

2023-01-03.16-29-36.mp4

I would love another solution that does not use unsafe and therefore does not blow up, but it has been the only solution i have found.

@adamreichold
Copy link
Member

I can reproduce this running CPython 3.10 on Linux as well. But I also do not think this is a memory leak, at least not in the reproducer: Since the loop is running inside a #[pyfunction], the GIL is held for the whole duration that this function is being called (from Python) and the calls to Python::with_gil are no-ops and hence they do not flush the associated GILPool.

Manually using a nested GILPool for each iteration of the loop as described in the guide at https://pyo3.rs/main/memory.html#gil-bound-memory, e.g.

diff --git a/src/lib.rs b/src/lib.rs
index fa1d470..bfbe167 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -14,6 +14,9 @@ fn leak() {
     for _ in 0..10 {
         let a: Vec<u8> = vec![0; 40_000_000];
         Python::with_gil(|py| -> PyResult<()> {
+            let pool = unsafe { py.new_pool() };
+            let py = pool.python();
+
             let bytes = PyBytes::new(py, &a);
             let input_dict = PyDict::new(py);
             input_dict.set_item("data", bytes).unwrap();

does resolve this for me.

This can also be understood in a more roundabout way by explicitly releasing the GIL outside the loop, e.g.

#[pyfunction]
fn leak() {
    Python::with_gil(|py| {
        let object: Py<PyAny> = Py::from(py.import("numpy").unwrap());

        py.allow_threads(|| {
            for _ in 0..30 {
                let a: Vec<u8> = vec![0; 40_000_000];
                Python::with_gil(|py| -> PyResult<()> {
                    let bytes = PyBytes::new(py, &a);
                    let input_dict = PyDict::new(py);
                    input_dict.set_item("data", bytes).unwrap();

                    let _status_enum = object.call_method1(py, "array", (bytes,)).unwrap();

                    std::thread::sleep(Duration::from_secs(1));
                    Ok(())
                })
                .unwrap();
            }
        });
    });
}

which also removes the leakage.

So I guess the more pressing question is whether the issue in the reproducer is the same as in dora-rs and whether it needs to use a nested GILPool or explicitly release the GIL.

@mejrs
Copy link
Member

mejrs commented Jan 3, 2023

Ah, that explains why it didn't leak for me, because that's not quite what I did:

use pyo3::prelude::*;
use pyo3::types::*;


fn leak() {
    let object = Python::with_gil(|py| {
        let object: Py<PyAny> = Py::from(py.import("numpy").unwrap());
        object
    });

        let a: Vec<u8> = vec![0; 40_000_000];
        Python::with_gil(|py| -> PyResult<()> {
            let bytes = PyBytes::new(py, &a);
            let input_dict = PyDict::new(py);
            input_dict.set_item("data", bytes).unwrap();

            let _status_enum = object.call_method1(py, "array", (bytes,)).unwrap();

            py.check_signals().unwrap();
            Ok(())
        })
        .unwrap();
    
}

fn main(){
    for i in 0..100000{
        leak();
        println!("{i}");
    }
}

@haixuanTao
Copy link
Contributor Author

Oh, I did not know that Python::with_gil is a no-ops inside a #[pyfunction]. I feel like this is not obvious within the documentation.

I think it is even less obvious that because the gil is hold throughout the function, all pyobject will live throughout the whole function beyond their lifetime, making memory management and therefore leak really easy.

We chose to use pyfunction as it allows us to distribute dora more easily as we don't have to link libpython.

I'm going to investigate what is the best fit for us.

Thanks for the quick response.

@davidhewitt
Copy link
Member

Please note that the gil-scoped memory growth is a known "feature" of the current PyO3 API design, which will eventually be resolved #1056 - I hope to begin iterating on a new design soon.

For now, I will close this issue as a duplicate.

@davidhewitt davidhewitt closed this as not planned Won't fix, can't repro, duplicate, stale Jan 4, 2023
@haixuanTao
Copy link
Contributor Author

I think that mentioning #1056 would be great in the https://pyo3.rs/main/memory.html#gil-bound-memory documentation with a special remark for python extension. I can write the PR if you guys think it's appropriate.

@davidhewitt
Copy link
Member

Sure thing, any time you are willing to spend to help document this for future users is welcome! I very much hope to resolve this for good this year 😁

haixuanTao added a commit to haixuanTao/pyo3 that referenced this issue Jan 7, 2023
Adding a special case of memory management when writing an extension.

This is a documentation of: PyO3#1056 and PyO3#2853
davidhewitt pushed a commit to haixuanTao/pyo3 that referenced this issue Jan 10, 2023
Adding a special case of memory management when writing an extension.

This is a documentation of: PyO3#1056 and PyO3#2853
bors bot added a commit that referenced this issue Jan 10, 2023
2864: Add a section on memory management for `extension` r=davidhewitt a=haixuanTao

Adding a special case of memory management when writing an extension.

This is a documentation of: #1056 and #2853


Co-authored-by: Haixuan Xavier Tao <tao.xavier@outlook.com>
Co-authored-by: David Hewitt <1939362+davidhewitt@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants