-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-101291: Low level opt-in API for pylong #101685
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minor typo.
Include/cpython/longintrepr.h
Outdated
/* Return 1 if the argument fits into a (N-1) bit integer, | ||
* where N is the number of bits in a intptr_t, and the value | ||
* can be extracted efficiently by _PyLong_GetSingleDigitValue | ||
* NOTE: Some values that could into (N-1) bit integer will return 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* NOTE: Some values that could into (N-1) bit integer will return 0. | |
* NOTE: Some values that could fit into (N-1) bit integer will return 0. |
The What does |
I worked on some Python/C calling benchmarks last year. We could test those against a Cython hacked to use this API. |
So, what we currently do in Cython is that we generate a switch statement for The Note that we generate independent unpacking functions for different C integer/float sizes, so we'd need an inline function for int, long, ssize_t and the respective unsigned types as well, in order to cut down the switch statement at compile time. I certainly agree that the single digit case is an important special case. However, we currently do a lot more in the cases where the target C integer (or float) size is substantially larger than the PyLong digit size. Switching to the proposed API would get us half the way, and we'd end up with using the official public API, this new inofficial API, and the bare struct access, depending on the compile time environment/config and runtime value. The advantage then seems to be that users could still benefit from fast single-digit access if they chose to opt out of the direct struct access. If CPython doesn't want to go all the way of providing different inline conversion functions for different C target type sizes (as Cython does), then separating out the single-digit case still sounds like a reasonable option to provide to users, I think. |
The concept of a "digit" is something we want to get away from in the API. What matters is whether the Python int small enough to fit into a machine int ( Given the highly skewed distribution of ints, false negatives to should be OK for larger numbers. |
In the API or in the implementation? I'm asking because the digit's were never really part of the API but an implementation detail that CPython and Cython were both making good use of. As long as they remain a part of the implementation, I don't see why Cython shouldn't make use of them. |
Both. But it needs to be in the API first, or we won't be able to do so in the implementation.
Because there is no advantage in doing so. It doesn't improve performance, it just makes maintenance harder. |
I'm now inclined to make this part of the full API. With some way to opt in to the faster version. Something like this: /* Probably need a better name for this macro :) */
#ifdef Py_GIVE_ME_THE_FAST_VERSION_I_DONT_MIND_RECOMPILING_EVERY_RELEASE
static inline int
PyLong_IsSingleDigit(PyLongObject* op) {
assert(PyLong_Check(op));
Py_ssize_t signed_size = op->long_value.ob_size;
return -1 <= signed_size && signed_size <= 1;
}
#else
PyAPI_FUNC(int) PyLong_IsSingleDigit(PyLongObject* op);
#endif @encukou does this sound like a reasonable approach? |
What do you mean by "every release"?
|
Exactly. And that's the level of cross-version compatibility that CPython currently provides to its users and what Cython currently provides to its users (with the default C configuration). In theory, you could deprecate any non-stable API function tomorrow and remove it in CPython 3.13 or 3.14. Depending on which function you chose, that could end up being a blow to the existing ecosystem, but it wouldn't really be different from removing any other publicly exposed C function that is used outside of CPython itself. Personally, I don't see much of an advantage of an additional private-but-exposed API. If it's exposed, and fills a purpose (which it probably does, because it exists), why can't others use it? As long as CPython sticks to adding and removing functions only in minor (3.x.0) releases, that's just like any other kind of functionality that a specific CPython 3.x release series supports. And keeping the C-API stable for the lifetime of a minor release series seems to be widely agreed on. For a broader compatibility range, the current CPython offer is the limited API and the stable ABI, but that's a different beast and we're not discussing that here. |
Every minor release. |
It does need to be guarded by a macro, though. Otherwise you can't use it in a stable ABI build. |
You'll want something like: #if !defined(Py_LIMITED_API) || Py_LIMITED_API+0 >= 0x030C0000
PyAPI_FUNC(int) PyLong_IsSingleDigit(PyLongObject* op);
#endif
#ifndef Py_LIMITED_API
// (Instead of the #ifndef guard, you can put this in a Include/cpython/*.h file that's guarded as a whole)
static inline int
_PyLong_IsSingleDigit(PyLongObject* op) {
assert(PyLong_Check(op));
Py_ssize_t signed_size = op->long_value.ob_size;
return -1 <= signed_size && signed_size <= 1;
}
#define PyLong_IsSingleDigit _PyLong_IsSingleDigit
#endif and then in the #undef PyLong_IsSingleDigit
...
PyLong_IsSingleDigit(PyLongObject* op) {
return _PyLong_IsSingleDigit(PyLongObject* op);
} |
Note that |
Sure. However, if we provide the function regardless of whether or not |
There's a single flag that both limits the API you can use, and gives you a stable ABI build. (you = an extension author)
You're right that it's not the clearest naming. But for users, one without the other doesn't really make much sense.
On February 20, 2023 6:18:50 PM GMT+01:00, Mark Shannon ***@***.***> wrote:
Sure.
However, if we provide the function regardless of whether or not `Py_LIMITED_API` is defined, then `Py_LIMITED_API` doesn't limit the API.
Rather, it seems that defining `Py_LIMITED_API` changes the ABI not the API. This seems confusing.
--
Reply to this email directly or view it on GitHub:
#101685 (comment)
You are receiving this because you were mentioned.
Message ID: ***@***.***>
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
|
An unstable ABI with a stable API makes a lot of sense, to me at least. The The reason the API counts of stable, even though the ABI might not be, is that the unstable form can be chosen at build time without any change to the code, much as we do for |
Stable API with unstable ABI is the default. The C API is covered by Python's general backwards compatibility policy (min. 2 years of deprecation warnings for breaking changes).
Py_LIMITED_API will expose only the API that compiles to stable ABI. Nowadays it does more than simply provide a subset of the API -- it selects slow/stable implementations, as in my suggestion above -- but that's not really reason enough to go renaming the flag.
On February 20, 2023 6:38:03 PM GMT+01:00, Mark Shannon ***@***.***> wrote:
> But for users, one without the other doesn't really make much sense.
An unstable ABI with a stable API makes a lot of sense, to me at least.
The `inline` form of `PyLong_IsSingleDigit` depends on the layout of the `int` object, which will change in 3.12, so its ABI is unstable. The non-inline form is a published symbol and is stable.
The reason the API counts of stable, even though the ABI might not be, is that the unstable form can be chosen at build time without any change to the code, much as we do for `Py_DEBUG`.
--
Reply to this email directly or view it on GitHub:
#101685 (comment)
You are receiving this because you were mentioned.
Message ID: ***@***.***>
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
|
Now, I don't think the functions with undefined behavior should be in the stable ABI. These also hardcode the fact that an efficient “digit” fits in |
I don't see what this PR offers other than confusion for someone trying to read our API. For starters, I don't think anyone who isn't used to CPython internals will assume the correct meaning for "digit" (and even most of us will need to double check whether it means a single element from our internal storage or... a digit). For small numbers, If we want to skip the type check and just crash on a non-longobject, then we could add that API. But the size check isn't optional, so it may as well be encapsulated with extracting the value, as it currently is. I don't see why it should get into the limited API. The point of that API set isn't to grow over time - it's to remain stable, so that extensions can build once and be compatible with many versions. Every time we add a function to it, we hurt that. In some cases, it's worth it, but I don't see why I'm sympathetic to extension authors who want to extract values larger than I have very little sympathy for extension authors who need to check whether the value fits into a |
Almost all the API has undefined behavior if you don't fulfill the pre-conditions.
A large integer has to be made up of digits. So the existence of "digits" isn't really an implementation detail. Feel free free to suggest better names, but remember that "small" int is already in use for a very restricted set of values around 0. If we don't provide functions like these, then Cython, NumPy, MyPyC, etc will just use the C struct.
The functions take |
Yes, but I believe that new additions to the limited API should be “nicer” than the status quo.
How would you use this API?
Those don't use the limited API. And it seems everyone who can afford to call a DLL function would be fine with |
Which is probably true, and why I proposed I thought you were saying that these functions should be part of the limited API. If we are to have CPython perform well and have Cython, etc be available during the alpha and beta phases, we need a low-level API. |
It is considerably faster because With the internals of longs hidden behind these functions, we can improve the layout and will only need a single memory read. faster-cpython/ideas#548 |
At least the inlining should be possible for
Sorry, that was a misunderstanding :( |
"Perfect" is not the term I would use 🙂 What we need is an API that includes low-level API functions, but not structs and internal functions. |
That's comparing the inlined version to the non-inlined version. Not the same thing as comparing a single expensive call to two cheap ones. Two successive functions calls don't need any more register spilling than a single call. |
Because that's the change they want: for the fast path of |
@zooba I would still prefer if those are two separate functions. Suppose I am trying to fetch a 64 bit unsigned integer from Python on a 64 bit machine. Then the full-blown Of course most values are likely to still be compact. So what I want to be able to do is to first try the fast path for compact values. If the number is compact and positive, the cast succeeds, and it fails if it is negative. In the rare case that the number is not compact, I can fall back to If you bake together those functions in the CPython side, it removes that flexibility. |
It just means you're in the same place as you would be for numbers that are bigger than Besides, I gave up on my preferred API for this because nobody seemed to like it, and have simplified my discussion point down to an existing API that other people mentioned. If I get over the argument-burnout to the point where I would actually code something up, you'd see. But everyone just wants to argue past everything I say, and I can't tell if it's me (in which case I'll bow out... again) or if it's something else (in which case I change my messaging... again... and things still don't go the way I'd like to see them go). Best of luck! |
Why is
The point is that the ints in range Given all that, we should choose the |
Put yourself into the shoes of the consumer of this API (binding project tools like pybind11, Cython, etc.). They must expose a C++ function taking a standard integer type ( So what these tools will ideally do is to first try the fast path (if My example was simply to point out why @zooba's suggestion of gluing All of this is completely unrelated to the question of small/compact integers, which are naturally much smaller than this threshold. |
Would my proposal from #101685 (comment) be sufficiently general? (Slightly tweaked from the first time I posted it)
Bearing in mind that this is all in a header file and is always inlined, when Usage looks like:
It only depends on |
Slightly tweaked from the first time I posted it
That's more or less what Cython does. Reads ok to me. I posted a link to the implementation a bit further up.
Obviously, it would be great to have the same feature available in the stable API. It's currently very difficult and slow to extract a 128 bit C integer from a PyLong there.
|
@zooba |
For unstable API, let's expose what CPython uses. |
It doesn't have to be a perfect calculation, though. We're only guaranteeing a fast path when it's fast, so if |
Ah, I missed that you used |
See faster-cpython#16 for what I have in mind. |
I've added some tests to faster-cpython#16. Feel free to cherry-pick/adapt anything from there if it helps. |
Thanks @encukou for doing this properly |
@zooba Any objections to me merging this? |
Looks good to me, FWIW. |
OK, it looks like we have some sort of consensus that this is worth having. |
This PR breaks building pybind11. See below. Is this expected? What do you recommend we do? https://github.com/pybind/pybind11/tree/d72ffb448c58b4ffb08b5ad629bc788646e2d59e
|
Ugh. Looks like 3.12 adds a bunch of functions that try to be const-correct, but fail to compile if you tell the compiler to actually enforce const-correctness ( |
Since it's probably difficult to change the existing cast/check macros to respect Also note that it's only asserts that fails here, so they could be removed. But it seems sensible to have them, especially since these are user facing functions. |
This is a stop-gap solution, until we decide on future directions for the C-API.
Adds some inline functions to allow fast access to the value of
PyLongObject
objects, without explicitly probing the internals.These new functions will be guarded by
PY_UNSTABLE_PYLONG_ABI
.In the future we may:
PY_UNSTABLE_PYLONG_ABI
with a more general ABI selector flag.@encukou Is this OK as a stopgap?
@scoder Is this usable by Cython?