-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Did you know that py::buffer_info::format has a different meaning on Windows? #1908
Comments
The same is true on a 32-bit Docker image (
whereas on 64-bit Linux:
Windows, regardless of whether it's 32-bit or 64-bit, take the same meanings as 32-bit Linux. |
Idioms like the following are successful on 32-bit Linux, 64-bit Linux, 32-bit Windows, and 64-bit Windows (and 64-bit MacOS; haven't tested 32-bit MacOS). It's possible that the if (format_.compare("d") == 0) {
out = std::make_shared<PrimitiveType>(parameters_, PrimitiveType::float64);
}
else if (format_.compare("f") == 0) {
out = std::make_shared<PrimitiveType>(parameters_, PrimitiveType::float32);
}
#if defined _MSC_VER || defined __i386__
else if (format_.compare("q") == 0) {
#else
else if (format_.compare("l") == 0) {
#endif
out = std::make_shared<PrimitiveType>(parameters_, PrimitiveType::int64);
}
#if defined _MSC_VER || defined __i386__
else if (format_.compare("Q") == 0) {
#else
else if (format_.compare("L") == 0) {
#endif
out = std::make_shared<PrimitiveType>(parameters_, PrimitiveType::uint64);
}
#if defined _MSC_VER || defined __i386__
else if (format_.compare("l") == 0) {
#else
else if (format_.compare("i") == 0) {
#endif
out = std::make_shared<PrimitiveType>(parameters_, PrimitiveType::int32);
}
#if defined _MSC_VER || defined __i386__
else if (format_.compare("L") == 0) {
#else
else if (format_.compare("I") == 0) {
#endif
out = std::make_shared<PrimitiveType>(parameters_, PrimitiveType::uint32);
}
else if (format_.compare("h") == 0) {
out = std::make_shared<PrimitiveType>(parameters_, PrimitiveType::int16);
}
else if (format_.compare("H") == 0) {
out = std::make_shared<PrimitiveType>(parameters_, PrimitiveType::uint16);
}
else if (format_.compare("b") == 0) {
out = std::make_shared<PrimitiveType>(parameters_, PrimitiveType::int8);
}
else if (format_.compare("B") == 0 || format_.compare("c") == 0) {
out = std::make_shared<PrimitiveType>(parameters_, PrimitiveType::uint8);
}
else if (format_.compare("?") == 0) {
out = std::make_shared<PrimitiveType>(parameters_, PrimitiveType::boolean);
} |
I believe this is a combination of C/C++ not defining the sizes of integer types and the way numpy handles this. Is there any particular issue when using this from pybind11, that doesn't match C/C++/numpy? Or do you suggest we add something to the docs somewhere, or ...? |
Ultimately, all I need is a work-around, so if I'm given a If this does get figured out, documentation is probably all that's needed, maybe a small section at the end of this page, since that's where we learn about the |
This does seem to mainly be a numpy feature/issue, though, where the meaning of Testing out with 64-bit
32-bit:
See e.g., here: https://numpy.org/doc/stable/reference/arrays.scalars.html#built-in-scalar-types I guess we could refer to these numpy docs, in some warning, when talking about |
I could. In the meantime, I'll use this space to try figuring things out, out in the open. To try to get a handle on all the possible numeric types NumPy can handle, I scanned all objects attached to the >>> [x for x in dir(numpy)
... if isinstance(getattr(numpy, x), type) and issubclass(getattr(numpy, x), numpy.generic)] then excluded any types that are not a leaf in the class hierarchy with: [x for x in dir(numpy)
... if isinstance(getattr(numpy, x), type) and issubclass(getattr(numpy, x), numpy.generic)] Accounting for multiple names for the same type objects, nothing on this list has been left out of the above. Creating an array of each of these types, passing it into
On all systems where I could test it (Linux 64-bit, MacOS 64-bit, Windows 32-bit, and Windows 64-bit), Python 2.7 differed from Python 3.6+ only in that
A bug that we recently encountered involves a distinction between >>> numpy.int32, numpy.int64, numpy.intp, numpy.longlong
(<class 'numpy.int32'>, <class 'numpy.int64'>, <class 'numpy.int64'>, <class 'numpy.longlong'>) |
I guess because |
Thanks! And yes, please do; no problem at all to aggregate everything together in one place!
Nice overview! So the main thing to note - I believe - is that Note that you can compare
By some weird coincidence, I once made a PR on numpy about these
If I'm not mistaken, there are some issues/PRs open on this, that we haven't gotten to yet. |
I think your conclusion is wrong here. https://en.cppreference.com/w/cpp/language/types Unix-like OS's always have |
Huh, interesting. But I don't think it changes anything. The thing is just that |
@jpivarski I believe your table effectively arises from the (rather complex) set of aliasing rules from this header file: In #1329, I thought I had aligned this with how it was defined in the latest version of NumPy (at least in time of writing); however, it does seem like that assumption may have been rather brittle, as noted by this issue. @YannickJadoul pointed it out in this review comment: I'm just wondering if there's a robust way to test this out in CI... |
For workarounds, I've pivoted from trying to interpret each For instance, else if (fmt == std::string("b") ||
fmt == std::string("h") ||
fmt == std::string("i") ||
fmt == std::string("l") ||
fmt == std::string("q")) {
if (itemsize == 1) {
return dtype::int8;
}
else if (itemsize == 2) {
return dtype::int16;
}
else if (itemsize == 4) {
return dtype::int32;
}
else if (itemsize == 8) {
return dtype::int64;
} (Sometimes, there's also an endianness character, As you can see, I've mapped it to a platform-independent enum and I just use the enum subsequently (but keep the That's what #1329 is about, right? If pybind provides a platform-independent enum, then there's nothing to document—we should just use that enum, right? |
I believe this makes sense, indeed. Maybe there ought to be a better way to handle/expose this in pybind11, though. I suppose this mapping should be known at compile time?
pybind11 doesn't use numpy's headers though. (I suppose since we don't want to depend on them? I don't know, actually.) Instead we access some hidden Python capsule in numpy that contains function pointers to the API. So this might be easier said that done. |
From what I can tell, according to https://numpy.org/doc/stable/reference/arrays.scalars.html#built-in-scalar-types, there's a few things pybind could improve.
On my distro, numpy headers are in
It's the |
The replication of NumPy header contents was a conscious decision to break a problematic dependency. Otherwise every project that supports a function call with a |
Agree with all of these.
Only for compilation, right; and if you're using |
By the way, the decision to avoid a NumPy dependency does make it easier to build - that's a difference between Cython wheel builds and PyBind11. With Cython, you have have min NumPy dependencies, something like this:
When you build a wheel, you have to have the oldest version of NumPy supported, since a wheel built with a newer NumPy can't be used by an older NumPy, and older NumPy's don't have wheels/support for newer Pythons. Now modern tooling, like PEP 518's |
Python integer format char is ambiguous and platform dependent. PyBind11 `format_descriptor<...>::format()` always returns "q" and "Q" for 64bit integers, independent of the platform. Compatible passed-in Python buffers on the other hand might also have the equivalent format "l" or "L" set. See pybind/pybind11#1806 and pybind/pybind11#1908 for details. This fix introduces a special case for integer format comparisons, just checking size and signedness.
FWIW, I had a run-in with this issue, too, but then found that there is already a solution, which is made more visible & accessible under #4674 ( |
In Python, implementing type resolution across Windows and Linux/macOS is tricky, as the same `l` specifier can map to both `int32` and `int64`. Moreover, expanding to to 8-byte integers the `u8` specifiers can mean both 8-bit and 8-byte integers, causing confusion. The same issue is discussed in detail in the PyBind11 repo: pybind/pybind11#1908
On Windows (both 32-bit and 64-bit Python),
returns
"l"
for anumpy.int32
array and"q"
for anumpy.int64
array, whereas MacOS and Linux return"i"
and"l"
, respectively. (That is,"l"
is ambiguous.)To be safe against misinterpretations, I'm using
to convert whatever I have into
int64_t
(and similarly forint32_t
), if necessary. (Then, choosingint32_t
vsint64_t
based on platform is just an optimization, not needed for correctness.)The text was updated successfully, but these errors were encountered: