Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Numpy Array to C++ - take ownership of data #3126

Open
MartinPerry opened this issue Jul 16, 2021 · 10 comments
Open

[QUESTION] Numpy Array to C++ - take ownership of data #3126

MartinPerry opened this issue Jul 16, 2021 · 10 comments

Comments

@MartinPerry
Copy link

Is it possible to pass data from numpy to C++ and take ownership of the memory, so its no longer managed by Python? I have large Numpy matrix and I dont want to copy memory. I can use py::buffer_info and get pointer to the data, but the pointer is not valid when Python is shut down. Another reason is I want to release data from C++ side once I no longer need them.

@jiwaszki
Copy link
Contributor

If you don't want to copy use PYBIND11_MAKE_OPAQUE(). You can read about it here: https://pybind11.readthedocs.io/en/stable/advanced/cast/stl.html#making-opaque-types

@petrochemical
Copy link

Is it possible to allocate the memory on the C++ side and pass it back to Python?

You can return a buffer from C++ to python like this:

return pybind11::buffer_info(...)

On the Python side, this return value can be used directly as a numpy array.

@MartinPerry
Copy link
Author

If you don't want to copy use PYBIND11_MAKE_OPAQUE(). You can read about it here: https://pybind11.readthedocs.io/en/stable/advanced/cast/stl.html#making-opaque-types

I am not quite sure, how to use this with numpy. Do you have some example?

In python I have a very simple example:

import numpy as np
arr = np.array([1, 2, 3, 4, 5])

and I want to pass arr to C++ so that C++ will "own" the data and data wont be freed when Python interpreter is finalized. Also, I dont want to create copy of data


Is it possible to allocate the memory on the C++ side and pass it back to Python?

You can return a buffer from C++ to python like this:

return pybind11::buffer_info(...)

On the Python side, this return value can be used directly as a numpy array.

I cannot init memory in C++ via buffer_info, because I have numpy array as output from other library.

@PierreMarchand20
Copy link

@MartinPerry I have the same exact question, did you manage to solve your issue?

@MartinPerry
Copy link
Author

@PierreMarchand20 Unfortunately, no

@ptbxzrt
Copy link

ptbxzrt commented Mar 17, 2022

@MartinPerry @PierreMarchand20 I have the same exact question, did you manage to solve your issue?

@skovaka
Copy link

skovaka commented May 9, 2022

I may have figured out a solution to this! It looks like the array pointer remains valid as long as the py::buffer_info object returned by buffer.request() exists. I've written a simple wrapper for transferring a 1D NumPy array into C++ (without copying) by storing the buffer_info object in an instance variable:

template<typename T>                                  
struct PyArray {                                      
                                                      
    py::buffer_info info;                             
    T *data;                                          
    size_t size;                                      
                                                      
    PyArray(py::array_t<T> arr) :                     
        info { arr.request() },                       
	data { static_cast<T*>(info.ptr) },           
	size { static_cast<size_t>(info.shape[0]) } {}
                                                      
    PyArray(const PyArray &) = delete;                
    PyArray &operator=(const PyArray &) = delete;     
    PyArray(PyArray&&) = default;                     
	
    //...

Note that py::buffer_info is not copyable, so I had to delete the copy constructors and define the move constructor in PyArray. This does limit how PyArray can be used, but it should work as long as you always pass by reference.

I've tested this by creating a NumPy array in Python, using it to initialize a PyArray, deleting the original NumPy array, and then confirming that the PyArray still works. This same test fails if info is local to the constructor. I'm no expert in Python memory management, so I'm not 100% sure it this will work in all circumstances (e.g. when Python is "shut down"), but hope it helps!

@bogdan-lab
Copy link
Contributor

bogdan-lab commented Sep 2, 2022

I had similar problem.
We typically use python in our C++ code base in the following way:

py::scoped_interpreter guard{};
py::dict locals;

py::exec(R"(python code here)", py::globals(), locals);

In order to move data from numpy ndarray calculated in python snippet to C++ I wrote the following function:

template <typename T>
arma::Col<T> MoveFromNumpyArray(pybind11::object obj) {
  // Cannot use dynamic cast here because there are no virtual functions in
  // pybind interface
  auto np_array = static_cast<pybind11::array>(obj);
  // In order to correctly extract data from numpy array its data type should be
  // the same as T
  assert(np_array.dtype() == pybind11::dtype::of<T>());
  auto* data_ptr = static_cast<T*>(np_array.mutable_data());
  assert(np_array.size() >= 0);
  auto size = static_cast<arma::uword>(np_array.size());
  np_array.release();
  return {data_ptr, size, /*copy_aux_memory=*/false,
          /*strict=*/false};
}

In my case common use of the function will be:

py::scoped_interpreter guard{};
py::dict locals;

py::exec(R"(
import numpy as np
x = np.array((1,2,3,4), dtype=int)
)", py::globals(), locals);

auto data = MoveFromNumpyArray<int>(local["x"]);

Therefore I need to cast from pybind11::object into pybind11::array and hope that user will pass numpy array as argument.
Also it is very important that user will specialize template to the type, which corresponds to numpy.ndarray.dtype.

arma::Col<T> in current example is basically std::vector<T>, which has constructor for building itself on top of the given pointer without any copy.

The solution is not ideal, because it relies on the user in two crucial things, but it works. At leas on my tests)

@songh11
Copy link

songh11 commented Sep 8, 2022

I may have figured out a solution to this! It looks like the array pointer remains valid as long as the py::buffer_info object returned by buffer.request() exists. I've written a simple wrapper for transferring a 1D NumPy array into C++ (without copying) by storing the buffer_info object in an instance variable:

template<typename T>                                  
struct PyArray {                                      
                                                      
    py::buffer_info info;                             
    T *data;                                          
    size_t size;                                      
                                                      
    PyArray(py::array_t<T> arr) :                     
        info { arr.request() },                       
	data { static_cast<T*>(info.ptr) },           
	size { static_cast<size_t>(info.shape[0]) } {}
                                                      
    PyArray(const PyArray &) = delete;                
    PyArray &operator=(const PyArray &) = delete;     
    PyArray(PyArray&&) = default;                     
	
    //...

Note that py::buffer_info is not copyable, so I had to delete the copy constructors and define the move constructor in PyArray. This does limit how PyArray can be used, but it should work as long as you always pass by reference.

I've tested this by creating a NumPy array in Python, using it to initialize a PyArray, deleting the original NumPy array, and then confirming that the PyArray still works. This same test fails if info is local to the constructor. I'm no expert in Python memory management, so I'm not 100% sure it this will work in all circumstances (e.g. when Python is "shut down"), but hope it helps!

Hello, Thanks for the method. I try this method but I make a new numpy array after delete the old one, PyArray::data is replaced by the new array. Have you tried this?

@songh11
Copy link

songh11 commented Sep 8, 2022

I have the same exact question. How can I malloc a buffer from c++ and use it On the Python side?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants