Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#[pyo3(get)] much slower than python attribute access #4247

Closed
gerrymanoim opened this issue Jun 14, 2024 · 7 comments
Closed

#[pyo3(get)] much slower than python attribute access #4247

gerrymanoim opened this issue Jun 14, 2024 · 7 comments

Comments

@gerrymanoim
Copy link

I have the following setup:

// lib.rs
use pyo3::prelude::*;

#[pyclass]
struct RustClass {
    #[pyo3(get)]
    value: i32
}

#[pymethods]
impl RustClass {
    #[new]
    fn new() -> Self {
        RustClass{value: 0}
    }
}

/// A Python module implemented in Rust.
#[pymodule]
fn _lowlevel(_py: Python, m: &PyModule) -> PyResult<()> {
    m.add_class::<RustClass>()?;
    Ok(())
}

and

import timeit

from pyo3_bench._lowlevel import RustClass


class PyClass:
    def __init__(self):
        self.value = 0

def bench():
    py = PyClass()
    rust = RustClass()
    print("Python:")
    print(timeit.timeit(lambda: py.value, number=1000000))
    print("Rust:")
    print(timeit.timeit(lambda: rust.value, number=1000000))

When I run bench I find that the rust attribute access is much slower than the python direct access.

>>> bench()
Python:
0.06078508700011298
Rust:
0.6096129899960943
>>> bench()
Python:
0.05267535999882966
Rust:
0.6028135990200099

Its unclear whether this is intended or not. Is there a good way for me to replicate the performance of python here?

If its useful:

  • macOS
  • Python 3.12
  • pyo3 0.21.2
@alex
Copy link
Contributor

alex commented Jun 14, 2024

When using pyo3(get) on a primitive, it will need to be turned into a Python object at each attribute access -- that is to say, there's a heap allocation for a new PyLong object each time.

A strictly apples-to-apples comparison would be a Py<SomeType> attribute.

@gerrymanoim
Copy link
Author

Hmm I appreciate the suggestion but can't seem to get that to behave any differently.

I now have

#[pyclass]
struct RustClass {
    #[pyo3(get)]
    value: i32,

    #[pyo3(get)]
    pyvalue: PyObject,
}

and get pretty much the same result:

>>> bench()
Python:
0.057306197995785624
Rust value:
0.6657147670048289
Rust pyvalue:
0.6800711160176434

Perhaps I'm not understanding your suggestion correctly?

@davidhewitt
Copy link
Member

I attempted to reproduce this locally, and I find that the difference is much smaller than you're finding. Did you compile with --release?

$ python test.py
Python:
0.02138412299996162
Rust:
0.04930254800001421

PyO3 0.22 is expected to be faster still, I get numbers more like this:

$ python test.py
Python:
0.021781480999948144
Rust:
0.040761033000080715

Hmm I appreciate the suggestion but can't seem to get that to behave any differently.

I find the same as you; I think this is because constructing a Python zero from a Rust zero is also extremely cheap. But I still find the slowdown to be only 2x rather than 12x.

I'm going to do a quick bit of profiling and see if I can understand and close the gap...

@davidhewitt
Copy link
Member

Ok, so I did some experimentation and if I teach Python to read directly from a Py<PyAny> value using an ffi::PyMemberDef definition, then I get (unsurprisingly) performance which matches the Python lookup:

$ python test.py
Python:
0.02066981300004045
Rust:
0.03565813099976367
Rust (Py<PyAny>):
0.02015118600002097

I did the same trick for the Rust i32, but it's still slower than Python because it needs to create the int object. That's not a huge surprise and a natural cost of crossing the boundary.

I also think that this provides us a nice solution to the lack of Py::clone without the py-clone feature, so I'm going to see if I can turn this into a real implementation immediately. Between perf optimization and a usability improvement, seems like a clear win to get into 0.22 quickly.

I think we cannot apply the same optimization for #[pyo3(set)] because we have the borrow-checking to guard against write aliasing.

@gerrymanoim
Copy link
Author

Ah - you're totally right - I wasn't passing --release.

>>> bench()
Python:
0.06525319800130092
Python property:
0.08756525401258841
Python fn:
0.08051640499616042
Rust value:
0.1070495129970368
Rust pyvalue:
0.11770369400619529

I'm much closer now.

Can you show me how you're using ffi::PyMemberDef - that's exactly what I'm looking for I think.

Appreciate all the help!

@davidhewitt
Copy link
Member

So PyMemberDef only works for a very specific set of types and only for read-only fields due to Rust's guarantees around mutable aliasing.

We don't really have a point in the #[pyclass] machinery where you can hook one in. Given the safety risks, I'd rather not make one either. Instead I'm working on a PR which would automatically choose it as the implementation when possible. I'm going to push this today.

@davidhewitt
Copy link
Member

I think with #4254 released in 0.22 this is essentially solved as much as it can be. There's maybe a short follow-up to add more types beyond Py<T>, e.g. i32, but that's a small task and I doubt will net much performance improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants