Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add pyo3::experimental namespace with new owned PyAny<'py> #3205

Closed
wants to merge 6 commits into from

Conversation

davidhewitt
Copy link
Member

This is my leading proposal for how we might solve #1056 and remove the idea of the "pool" from the Python marker.

From my benchmarking here, this can net up to 30% wins in performance for PyO3 interactions with Python objects as well as avoid the memory growth related to the object pool. I rewrote the argument extraction logic using this new API to demonstrate the potential. Will post benchmarks below when I have a moment.

The fundamental idea is that the gil-bound reference types like &'py PyAny which frequently reside in the pool are replaced by PyAny<'py> owned types with a lifetime attached. They provide an almost identical API with the difference being ownership semantics which can essentially be thought of as Rc.

The new type is added as pyo3::experimental::PyAny, along with accompanying implementations of PyDict and PyList. We also need to have replacements of the existing conversion traits e.g. pyo3::experimental::FromPyObject.

What I'd like to do if we like this direction is to make pyo3::experimental a public API which we state is a possible future direction of PyO3 which we are using for research, so may change fast. Ideally users can import pyo3::experimental::prelude::* and just use the "new PyO3" without much challenge. Eventually we could make this pyo3::v2::prelude::* and then go through a deprecation / removal cycle of the old stuff.

We can implement the bulk of PyO3 using the new API to learn about it and get the maximum performance speedup internally, though we have to be aware this does create quite a lot of maintenance overhead.

There are downsides of adopting this new API (this list is incomplete and we can fill this out in this PR and eventually in the docs):

  • I think this might be harder for users to learn, because of the PyAny<'py> lifetime being attached everywhere.
  • By losing the pool there are a few edge cases where conversions are no longer possible. E.g. I think &PyAny -> Vec<&str> is unsupported because the &str needs to be owned by a Rust type. At the moment the intermediate &PyString references go in the pool.
    • Maybe we solve this in time.
  • I don't have a perfect answer what to do with Py<PyAny> and friends for storage without lifetimes. Py<PyAny<'static>> is nonsensical because 'static GIL lifetime is kinda meaningless.
    • I am considering PyDetachedAny, PyDetachedDict etc as the types to replace Py<PyAny>, Py<PyDict> etc. I don't love this but maybe it's the best choice. Also with PEP 630 as a possible thing to aim for we probably want to be making static storage less ergonomic anyway.

Also the branch is currently out-of-date and unfinished, this is posted here to invite discussion. Sorry if you are reading the code 😆

With apologies I have to dash now, will do my best to add and explain any thoughts here as questions come in.

@adamreichold
Copy link
Member

I was likely not part of the previous discussions which happened on this topic, so please just tell if something was discussed and decided before.

One thing that feels off in the proposal is the need to have separate types PyAny/List/Dict and PyDetachedAny/List/Dict. Couldn't we leave PyAny/List/Dict as they are and just introduce a new smart pointer like Gil<'a, T> to complement the detacted Py<T> version we already have?

@davidhewitt davidhewitt changed the title add pyo3::experiemental namespace with new owned PyAny<'py> add pyo3::experimental namespace with new owned PyAny<'py> Jun 6, 2023
@davidhewitt
Copy link
Member Author

That's a great question, and I've got reasons why that didn't work with the current state of things. I'll try to find time to write that out here tomorrow or on Friday.

@davidhewitt
Copy link
Member Author

As promised...

TLDR; I see three main API options, all with drawbacks, the proposal here seemed like the best compromise for now. There might be other designs I have not considered and I would be extremely excited for any proposal which is better than these three.


I played around with this kind of GIL<'py, T> idea in the past in attempts such as #1300; I ended up creating a total mess and needing to rework all the types anyway.

The issue I keep running into is related to Deref. Consider the following snippet.

let py: Python<'py> = /* ... */;

// possible return type of `PyDict::new` in a world without the pool could be `GIL<'py, PyDict>`
let dict: GIL<'py, PyDict> = PyDict::new(py);
dict.set_item("foo", "bar")?;

// return type of `PyDict::get_item` would similarly be `GIL<'py, PyAny>`.
let bar: GIL<'py, PyAny> = dict.get_item("foo").expect("foo is in dict");

I think we want this snippet to compile. However, the naive Deref implementation for GIL<'py, T> would be:

impl<T> Deref for GIL<'_, T> {
    type Target = T;
    fn deref(&self) -> &T {
        self.0
    }
}

and the problem with that is that the '_py lifetime doesn't feature in the deref, so in the snippet above the lifetime which bar ends up with is just the lifetime of the deref, and will therefore fail to compile.

Replacing 'py with '_ for bar, i.e. let bar: GIL<'_, PyAny> = dict.get_item("foo").expect("foo is in dict");, would compile, however that lifetime '_ would be tied to the lifetime of dict, it's not the GIL lifetime. This is not what we would want it to be and leads to "loss" of the GIL lifetime, especially once dict is dropped then bar is also unusable.

I also explored something like this:

impl<'py, T> Deref for GIL<'py, T> {
    type Target = &'py T;
    fn deref(&self) -> &&'py T {
        // where self.0 is `*mut ffi::PyObject`
        unsafe { std::mem::transmute(&(self.0)) }
    }
}

That may seem better, because it has the 'py lifetime, but actually is unsound because &'py T is Copy and outlives the deref, leading to trivial use after free errors, such as this:

Python::with_gil(|py| {
    let bare_mapping = PyDict::new(py).as_mapping();
    assert_eq!(bare_mapping.get_refcnt(), 1);  // Blows up with get_refcnt() as 0

This is because the &'py PyDict will happily outlive the GIL<'py, T> temporary after the Deref expression, because the lifetime 'py is the full length of the closure.


The path forward which could work for a GIL<'py, T> type would be to use arbitrary self types and then the signature of .as_mapping() would have to be:

pub fn as_mapping<'py>(self: &GIL<'py, Self>) -> &GIL<'py, PyMapping> {
    /* tbc */
}

This is unfortunately not in stable Rust, though if we really believed this was the best option (it might be) we could consider lobbying upstream to see what work would be needed to stabilise. Even once stabilised that will be a future PyO3 design, rather than something we can use now.

We can almost get the same effect as arbitrary self types if we implement all methods on GIL<'py, T> instead. For example, we could have

impl<'py> GIL<'py, PyDict> {
    pub fn as_mapping(&self) -> &GIL<'py, PyMapping> {
        /* tbc */
    }
}

which compiles today and has the correct lifetime characteristics. The big downsides I see of this are:

  • Documentation - all methods will end up on the GIL type, which will be a big downgrade compared to our current API where all the PyAny, PyDict etc have their own doc pages.
  • Downstream crates - numpy won't be allowed to implement Gil<'py, PyArray> without making it a trait PyGILArrayExt which then users will have to import to interact with PyArray methods.

I'm not completely against going this route, but it simple doesn't feel that nice. Maybe if we got a signal from upstream that arbitrary self types (or at least GIL<'py, T> as a self type) was on route to stabilise, we could use this approach to get something which compiles on today's Rust and evolves cleanly into a future API.


The conclusion I drew was that whichever API we go for, GIL-bound references no longer make sense once the pool is removed. I think I have three options which create workable APIs:

  1. PyAny<'py> as proposed in this PR, with the drawback being we may want new design ideas for what to do with Py<PyAny> (I agree PyDetachedAny etc is quite ugly).
  2. impl<'py> GIL<'py, PyDict>, which may have a solid future design path with arbitrary self types, although I think is quite painful at present.
  3. PyAny without a lifetime at all, remove the Py type, and take py: Python in all methods (i.e. the rust-cpython). This is simple but I think Py<T> is useful for #[pyclass]. Also not having a GIL lifetime attached means we cannot implement Display without needing to call with_gil internally, so I don't like this option much.

One solution I'm open to is that for now we implement option 1 as pyo3::experimental. We use this implementation for as much of PyO3's internals as possible. We get an efficient, pool-free internal design and we can also offer it for users who want the same speedup. We'll try to keep pyo3::experimental usable with the understanding there may significant churn there if we find a better alternative design. We keep the GIL-bound references and the pool as PyO3's primary API for now, and take our time to experiment further with option 1 (maybe we find a design for Py<T> we like) while also probing upstream to understand when option 2 might become viable.

This obviously has increased maintenance burden, however personally I'd be willing to take that pain if it means PyO3 is more performant. I do not expect other maintainers may agree with so much additional maintenance overhead.

Maybe we eventually find another API design outside these three which we like even better, and rework pyo3::experimental to become that.

@davidhewitt
Copy link
Member Author

As an example of the kind of performance differences I see on this branch, here's the benchmarks from the pytests crate, which are a good way to exercise the reworked argument extraction code:

-------------------------------------------------------------------------------------------- benchmark: 7 tests ---------------------------------------------------------------------------------------------
Name (time in ns)                   Min                    Max                Mean              StdDev              Median                IQR             Outliers  OPS (Mops/s)            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_none_rs                    29.5800 (1.0)         345.8300 (1.0)       30.2550 (1.0)        3.1125 (1.0)       30.0000 (1.0)       0.0000 (inf)      632;45442       33.0524 (1.0)      158932         200
test_empty_class_init           64.1650 (2.17)      1,963.5450 (5.68)      66.1302 (2.19)      11.0378 (3.55)      65.0000 (2.17)      0.2100 (inf)      802;16780       15.1217 (0.46)      73395         200
test_args_kwargs_rs             68.7500 (inf)         865.8300 (1.93)      71.9540 (2.36)      12.9770 (2.28)      70.8400 (2.35)      0.8300 (inf)      888;12638       13.8978 (0.42)     126985         100
test_simple_rs                 119.0476 (inf)       2,188.4762 (4.88)     124.5252 (4.09)      20.1116 (3.54)     123.0238 (4.08)      1.0000 (inf)      846;21984        8.0305 (0.24)     195122          42
test_simple_args_rs            125.0000 (4.23)     14,082.9998 (40.72)    214.1713 (7.08)      95.6950 (30.75)    208.0001 (6.93)      1.0000 (inf)      158;16996        4.6692 (0.14)      87268           1
test_simple_args_kwargs_rs     165.9996 (5.61)     16,083.0000 (46.51)    258.5954 (8.55)     131.6085 (42.28)    250.0001 (8.33)      0.0000 (1.0)       87;38996        3.8670 (0.12)      88238           1
test_simple_kwargs_rs          165.9996 (5.61)     15,625.0003 (45.18)    265.0055 (8.76)     123.9712 (39.83)    250.0001 (8.33)     41.0000 (inf)        149;538        3.7735 (0.11)     193536           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

experimental-api current HEAD

-------------------------------------------------------------------------------------------- benchmark: 7 tests --------------------------------------------------------------------------------------------
Name (time in ns)                   Min                    Max                Mean              StdDev              Median                IQR            Outliers  OPS (Mops/s)            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_none_rs                    29.2722 (1.0)         385.2848 (1.0)       30.8346 (1.0)        5.5422 (1.14)      30.5886 (1.0)       0.7911 (1.88)     825;3205       32.4311 (1.0)      196735         158
test_empty_class_init           66.4550 (2.27)      1,508.7500 (3.92)      68.5619 (2.22)       9.0494 (1.86)      68.1250 (2.23)      1.2550 (2.99)     447;2676       14.5854 (0.45)      64001         200
test_args_kwargs_rs             72.0850 (2.46)        512.7100 (1.33)      73.7274 (2.39)       4.8563 (1.0)       73.3350 (2.40)      0.4200 (1.0)      401;3874       13.5635 (0.42)      60607         200
test_simple_rs                  82.9996 (2.84)     20,792.0002 (53.97)    163.8921 (5.32)     161.3914 (33.23)    167.0001 (5.46)      1.0000 (2.38)     52;24215        6.1016 (0.19)     124348           1
test_simple_args_rs            135.4167 (4.63)      2,644.6667 (6.86)     141.7904 (4.60)      31.7837 (6.54)     138.8889 (4.54)      2.3056 (5.49)   1006;15032        7.0527 (0.22)     198374          36
test_simple_args_kwargs_rs     165.9996 (5.67)     15,624.9998 (40.55)    229.1406 (7.43)     123.2132 (25.37)    209.0001 (6.83)     42.0000 (100.00)    122;298        4.3641 (0.13)      87597           1
test_simple_kwargs_rs          175.0000 (5.98)      4,964.6000 (12.89)    186.9728 (6.06)      40.6735 (8.38)     183.3500 (5.99)      6.2500 (14.88)    537;4917        5.3484 (0.16)     136352          20
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

This branch is noticeably better at handling the *args and **kwargs cases, and I'd hope that with further refactoring we'd be able to make the other cases better too. I haven't investigated what causes the slowdown on the test_simple case, due to the change in ownership rules there may be some additional reference count changes which weren't there before (and might be removable again in future with careful optimization).

@adamreichold
Copy link
Member

I think we want this snippet to compile. However, the naive Deref implementation for GIL<'py, T> would be:

I agree that the Deref issue is insurmountable without arbitrary self types. What I would to question again is

I don't have a perfect answer what to do with Py and friends for storage without lifetimes. Py<PyAny<'static>> is nonsensical because 'static GIL lifetime is kinda meaningless.

though. Is there any downside to just using the 'static lifetime that is not aesthetical?

@davidhewitt
Copy link
Member Author

Excitingly, I see there is recent movement on rust-lang/rust#44874 - so maybe the arbitrary self types are not completely out of reach!

Is there any downside to just using the 'static lifetime that is not aesthetical?

Good question. I suppose my objection is that there is no such thing as a 'static GIL lifetime, and this might confuse users to see PyAny<'static> in this way. I'd also have to check that Py<PyAny<'static>>::as_ref(&self, py: Python<'py>) -> &PyAny<'py> is achievable, hopefully with traits this works.

@adamreichold
Copy link
Member

One silly knot I still have in my head: The reference counting semantics of PyAny<'py> are basically the same as for Py, i.e. cloning increments the Python reference count.

Couldn't e.g. PyDict::new or PyDict::get_item just return Py<PyDict> and Py<PyAny> respectively which the user can turn into references using Py::as_ref as usual? The one thing does not work is Py::into_ref which "dissolves" the Py in the pool but AFAIU this is exactly the thing that PyAny<'py> cannot do either?

Of course, this would probably imply a lot of repetitive code snippets like

let dict = PyDict::new(py);
let dict = dict.as_ref(py);
...

but maybe this could be handled similarly to std's pin! macro, e.g.

let dict = bind!(py, PyDict::new(py));
...

@adamreichold
Copy link
Member

Couldn't e.g. PyDict::new or PyDict::get_item just return Py and Py respectively which the user can turn into references using Py::as_ref as usual?

Of course, Py::as_ref does not really produce a GIL-bound reference, because while we write the signature as

fn as_ref<'py>(&'py self, _py: Python<'py>) -> &'py T::AsRefTarget

this will usually be some lifetime 'a shorter than 'py via covariance of Python<'py> and Python<'a>.

@adamreichold
Copy link
Member

Ok, one other probably dead-end idea: Keep the pool, but make the Python<'py> not just a token, but a pointer to the pool to avoid any global state and/or thread-local storage involved. I suspect that whether this is viable depends on how tedious "careful optimizations" have to be here to avoid unnecessary reference count changes versus the overhead of actually passing a pointer instead of a zero-sized type?

@adamreichold
Copy link
Member

Ok, one other probably dead-end idea:

So to answer myself, this fails because we do not always thread the token through but rely on being able to produce it "out of thin air", especially via GIL-bound references, i.e. PyNativeType::py.

@adamreichold
Copy link
Member

adamreichold commented Jun 18, 2023

So, after thinking about this some, I think PyAny<'py> and Py<PyAny<'static>> are the best course of action which fits without MSRV and is immediately actionable. In this approach, I also don't see a pressing need for a type like GIL<'py, T> and hence arbitrary self types.

Taking all of this into consideration, I think I would prefer to avoid the duplication of the experimental namespaces. Personally, in light of the amount of thinking and experimenting already invested into this topic, I think that would be too cautious. Of course, there is also the additional amount of required work compared to the limited amount of available time. Hence, I would suggest that we do switch over PyO3 proper for 0.20 but add a more lengthy pre-release cycle, e.g. making 0.20-alpha.1 pre-releases and actively call for testing by the ecosystem.

Hopefully, a lot of code will still compile as PyAny<'py> has the same methods &'py PyAny had, or be simple to fix by adding <'py>. We could even keep the PyObject alias as Py<PyAny<'static>>. Not sure if this is a benefit though... 🤪

EDIT: I forgot "to avoid" in "prefer to avoid the duplication"... 🤦🏽

@adamreichold
Copy link
Member

adamreichold commented Jun 18, 2023

I'd also have to check that Py<PyAny<'static>>::as_ref(&self, py: Python<'py>) -> &PyAny<'py> is achievable, hopefully with traits this works.

I suspect this would require GAT to work safely, but I also think it could be worked around using a HKTB and helper trait like e.g. here

@davidhewitt
Copy link
Member Author

Taking all of this into consideration, I think I would prefer the duplication of the experimental namespaces. Personally, in light of the amount of thinking and experimenting already invested into this topic, I think that would be too cautious. Of course, there is also the additional amount of required work compared to the limited amount of available time. Hence, I would suggest that we do switch over PyO3 proper for 0.20 but add a more lengthy pre-release cycle, e.g. making 0.20-alpha.1 pre-releases and actively call for testing by the ecosystem.

This is an interesting proposal, we could certainly allow plenty of time for downstream code to test prereleases. It would be a delight to get rid of the pool.

I guess my main concern would be - should arbitrary_self_types become available in the future, would we still want to switch to them? I think probably yes, because we could replace weirdness like Py<PyAny<'static>> with (perhaps) PyDetached<PyAny> and PyAny<'py> with Py<'py, PyAny>. Would that be a difficult migration for users to undergo again? Probably, yes. They may not appreciate the double-churn.

I suppose we could make the argument that with MSRV of 1.52 we're about 2 years behind stable Rust, so even if arbitrary self types landed in the next year it may not be for another three years until our MSRV would support them. So, if 0.20 was released without the pool, the next migration would be a few years away.

@adamreichold
Copy link
Member

I guess my main concern would be - should arbitrary_self_types become available in the future, would we still want to switch to them?

Personally, I would actually say no. Or at least only if there is some tangible benefit for ergonomics that goes beyond getting rid of the hard to explain Py<PyAny<'static>> which we will probably have explained a lot at that point.

If arbitrary self types were available today, I think we would design our smart pointers around them. But path dependencies are a fact of life and I see no strong reason to hide that our design originates in today's Rust with all the rough edges it still has, and not the language that is probably two to three MSRV bumps away still.

@davidhewitt
Copy link
Member Author

Personally, I would actually say no. Or at least only if there is some tangible benefit for ergonomics that goes beyond getting rid of the hard to explain Py<PyAny<'static>> which we will probably have explained a lot at that point.

Agreed, we can revisit that (far) in the future and see what makes most sense for the API and users.

I think after some reflection it's probably worth getting 0.20 out before merging these changes, and then make this the focus for 0.21 (with a long release cycle as proposed). I think if we're committing to a slow release we will probably also want to have a stable branch which will be easy to backport bugfixes onto, and I'm heavily demotivated from doing any 0.19.x patch releases what with the removal of bors and MSRV bump 😅

@adamreichold
Copy link
Member

I think after some reflection it's probably worth getting 0.20 out before merging these changes, and then make this the focus for 0.21 (with a long release cycle as proposed). I think if we're committing to a slow release we will probably also want to have a stable branch which will be easy to backport bugfixes onto, and I'm heavily demotivated from doing any 0.19.x patch releases what with the removal of bors and MSRV bump sweat_smile

Agreed, the feature list for 0.20 seems sufficient to warrant a release and the maintenance effort for 0.19 would indeed be high.

This does re-open the question of whether we want a unsafe/nightly-safe Python::with_pool for 0.20 only to remove it completely for 0.21 though?

@davidhewitt
Copy link
Member Author

This does re-open the question of whether we want a unsafe/nightly-safe Python::with_pool for 0.20 only to remove it completely for 0.21 though?

Given that 0.21 will probably be some effort for users to migrate to, I see no harm in adding new APIs which may only exist on 0.20 if it helps them in the period before they can invest resources to migrate.

@SuperJappie08
Copy link
Contributor

I was wondering how does this effect PyClass implementors, since they are not allowed to have lifetimes currently.

if let Some(lt) = class.generics.lifetimes().next() {
bail_spanned!(
lt.span() =>
"#[pyclass] cannot have lifetime parameters. \
For an explanation, see https://pyo3.rs/latest/class.html#no-lifetime-parameters"
);
}

And updated implementations of PyCell and Py (or PyObject) probably need this annotation (if I have understood this discussion correctly).
However, they are often used inside PyClasses to store references to Python objects.

Is there already a plan about how this would work in the future?

@adamreichold
Copy link
Member

Is there already a plan about how this would work in the future?

This would continue to work as it does now, with the only difference of using PyAny<'static> instead of PyAny. So for example the definition PyObject would just change from Py<PyAny> to Py<PyAny<'static>> which still does not require a lifetime parameter to be stored in a class struct. (While the 'static is a lifetime, it is not a lifetime parameter of the enclosing class type. Just like you can store a &'static str inside a class right now.)

@davidhewitt
Copy link
Member Author

A quick thought - should I write up and pin an issue (and maybe a discussion which links to the issue, or vice-versa) inviting users to give feedback about the intention of going through this API change in 0.21?

I'd hope users will generally respond positively or neutrally (or mostly not at all I suppose), however if we get a lot of early pushback then that could save us a lot of pain in trying to work through this.

@davidhewitt
Copy link
Member Author

Closing in favour of #3361

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants