Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add NumPy Scalars #3544

Open
wants to merge 303 commits into
base: master
Choose a base branch
from
Open

Add NumPy Scalars #3544

wants to merge 303 commits into from

Conversation

sun1638650145
Copy link

Description

Compared with using py::buffer to provide processing py::numpy_scalar is more concise and easier to use. I think PR #2060 is a very good solution, but it seems to be abandoned. So I made a simple modification and resubmitted a new PR.

@rwgk
Copy link
Collaborator

rwgk commented Dec 10, 2021

Logging my thoughts after looking for a bit.

That's a lot of code (cost).
What's the benefit?

It makes sense to distinguish between ints of different sizes etc. for large arrays, but how often does it matter for scalars in real life? Passing a one-element array could do the trick, too. Probably a minor nuisance that's worth spending time and code on only if it becomes wide-spread clutter.

I will wait for other maintainers to chime in.

@sun1638650145
Copy link
Author

@rwgk I understand your thoughts, thanks for your suggestions.

@sun1638650145
Copy link
Author

It is not sure whether the PR will be merged, so if you want to use this feature, you can directly use this patch.

@bjodah
Copy link
Contributor

bjodah commented Feb 1, 2022

I won't speak to the cost/benefit ratio. But I thought I could mention that I have found myself writing a small wrapper function like this:

template<typename T>
py::object create_numpy_scalar(T val) {
    // usage requires initialized NumPy C-API (call _imoprt_array() before use)
    py::object dt = py::dtype::of<T>();
    PyObject * scal = PyArray_Scalar(&val, (PyArray_Descr*)dt.ptr(), py::int_(sizeof(T)).ptr());
    return py::reinterpret_steal<py::object>(scal);
}

to wrap e.g. a long double scalar to be passed to a python function, e.g. def my_jacobian(some_scalar, some_vector, an_out_matrix). Of course, passing a ndarray with shape == (1,) as some_scalar and writing an extra line of code in my_jacobian to convert it to a numpy scalar (if needed for the semantics in that function) should work equally well I suppose?

@sun1638650145
Copy link
Author

  1. The cost of 300 lines seems to be very high, but in fact most of them are repetitive code to support different precision int, float and complex (these macro definitions really can't use functions to reduce code measured).

  2. I understand the situation you mentioned. I used py::buffer to achieve the same function before, but the function implemented in this way has 3 obvious shortcomings.

    • When using the __doc__ function on the Python side, the data type is lost, and buffer is always displayed instead of e.g. np.int32.
    • Representing a scalar with an array of [1, 1] alone also makes the readability on the Python side worse.
    • The readability of the code on the C++ side becomes extremely low, and only a lot of comments can be written to explain why this is done.

@bjodah, thank you very very much. My original idea was to transfer this part of the cost directly from the user to us. Whether it is pybind or Python, I hope it is simple and easy to use. English is not my native language; please excuse typing errors.

@ax3l
Copy link
Collaborator

ax3l commented Feb 18, 2022

That's a lot of code (cost).
What's the benefit?

I have been bitten by this in the past as well and the current inconsistency for 0-dim data (scalars) makes generic ND data handling on API interfaces pretty wild/complicated.

As long as this goes in the separate numpy.h header, I think better numpy scalar support is reasonable to add.

@henryiii
Copy link
Collaborator

I'm personally fine with following numpy more closely (that is, adding features already supported by numpy, within reason).

@rwgk
Copy link
Collaborator

rwgk commented Feb 18, 2022

As long as this goes in the separate numpy.h header, I think better numpy scalar support is reasonable to add.

Thanks for pointing out @ax3l, I hadn't given that enough weight previously.

I see the CI is currently very unhappy (42 failing). @sun1638650145, after I see you got it back to green, I'll run the Google global testing with this PR. Could you please tag me with an CI All Green message or similar when it's ready?

@sun1638650145
Copy link
Author

As long as this goes in the separate numpy.h header, I think better numpy scalar support is reasonable to add.

Thanks for pointing out @ax3l, I hadn't given that enough weight previously.

I see the CI is currently very unhappy (42 failing). @sun1638650145, after I see you got it back to green, I'll run the Google global testing with this PR. Could you please tag me with an CI All Green message or similar when it's ready?

OK, I'll try to fix the bug as soon as I can.

@henryiii
Copy link
Collaborator

FYI, we are still okay with this if you can fix the bug(s?). (Invalid buffer descriptor)

@sun1638650145
Copy link
Author

Hello, I’m not very familiar with using MSVC and resolving errors on the Windows platform. Due to the reasons mentioned above, I can only manually update each time to ensure it is at least usable on my own project. I hope someone can help me.

@rwgk
Copy link
Collaborator

rwgk commented Sep 12, 2024

Hello, I’m not very familiar with using MSVC and resolving errors on the Windows platform. Due to the reasons mentioned above, I can only manually update each time to ensure it is at least usable on my own project. I hope someone can help me.

I'll unsubscribe myself here for the time being. Taking this as an opportunity to explain how I approach looking at pybind11 issues and PRs in general:

In the back of my mind: I only have so many minutes per day. How do I decide what's important, and what's not?

  • I find it completely impossible to monitor issues. It'd be almost a full-time job and distract me from the work I'm getting paid for.
  • I make an effort to look at all PRs, because the people creating a PR have made at least a significant effort to contribute back. It's important enough to them.
  • If I feel a PR helps the community at large, and it's within easy reach for me, I might step in spontaneously to help see it through.
  • But in general, if a PR author is unable or unwilling to see it through, to me that means: It's not actually important enough, even to them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants