-
Notifications
You must be signed in to change notification settings - Fork 770
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance issue when calling rust function in python #3787
Comments
Thanks @richecr for the report. TLDR:
First, I'll assume that when you say "is slower than running in pure Python" that you're testing with against this Python implementation: def py_sleep():
start = time.time_ns()
num = 1 + 1
duration = time.time_ns() - start
print(duration)
return num There's several different factors coming into play. Let's try to break these down:
In the end, let's end up with this code: use pyo3::prelude::*;
use std::time::Instant;
#[pyfunction]
fn rust_sleep() -> i32 {
let start = Instant::now();
let num = 1 + 1;
let duration = start.elapsed();
println!("{}", duration.as_nanos());
num
}
#[pymodule]
fn pyo3_scratch(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(rust_sleep, m)?)?;
Ok(())
} import time
from timeit import timeit
import pyo3_scratch
def py_sleep():
s = time.time_ns()
x = 1 + 1
e = time.time_ns()
print(e - s)
return x
# run some warmups
pyo3_scratch.rust_sleep()
py_sleep()
# measure average duration of 1 million calls
N = 1_000_000
py = timeit("py_sleep()", setup="from __main__ import py_sleep", number=N) / N
rust = timeit("rust_sleep()", setup="from pyo3_scratch import rust_sleep", number=N) / N
# report final timings
print("py", py)
print("rust", rust) Now, running this, I get the following output:
There is still volatility in these numbers; sometimes Rust is a little slower than Python, sometimes Rust is a little faster. Overall, both are reporting around 4.5us on my machine, which to me makes me assume the work on both languages is dominated here by the system-level operations: timing measurements and writing to stdout. Let's try to measure the call overhead more precisely by making both of these functions into noops: use pyo3::prelude::*;
#[pyfunction]
fn rust_sleep() {}
#[pymodule]
fn pyo3_scratch(_py: Python, m: &PyModule) -> PyResult<()> {
m.add_function(wrap_pyfunction!(rust_sleep, m)?)?;
Ok(())
} from timeit import timeit
import pyo3_scratch
def py_sleep():
return
# run some warmups
pyo3_scratch.rust_sleep()
py_sleep()
# measure average duration of 1 million calls
N = 1_000_000
py = timeit("py_sleep()", setup="from __main__ import py_sleep", number=N) / N
rust = timeit("rust_sleep()", setup="from pyo3_scratch import rust_sleep", number=N) / N
print("py", py)
print("rust", rust) Now on my machine I get less volatility, and pure-Python shows an edge:
What we see is that calling the noop Python function is measuring as taking 18ns, and 25ns to call into Rust. This 7ns slowdown is a fairer estimate of the slowdown which PyO3 currently exhibits over pure-Python function calls. Finally, let's go a step further and estimate what PyO3 could look like with the overheads which we're working to remove in PyO3 0.21. I'll apply the following diff to current PyO3 main, which is a crude way to disable framework-level overheads we're working to remove in #3382: diff --git a/src/impl_/trampoline.rs b/src/impl_/trampoline.rs
index 4b4eac17a..2664d7598 100644
--- a/src/impl_/trampoline.rs
+++ b/src/impl_/trampoline.rs
@@ -174,8 +174,9 @@ where
R: PyCallbackOutput,
{
let trap = PanicTrap::new("uncaught panic at ffi boundary");
- let pool = unsafe { GILPool::new() };
- let py = pool.python();
+ // let pool = unsafe { GILPool::new() };
+ // let py = pool.python();
+ let py = unsafe { Python::assume_gil_acquired() };
let out = panic_result_into_callback_output(
py,
panic::catch_unwind(move || -> PyResult<_> { body(py) }), (NB do not attempt to apply the above diff and run this for real world code in production. Until PyO3 is fully transitioned to the new API, the GILPool is a fundamental part of correct operation of PyO3.) This reverses the situation. For the noop function calls, once we've sorted out this framework-level overhead, calling a noop PyO3 function will be faster than calling a noop pure-Python one, by about ~4.5ns on my machine:
|
Keep in mind that the |
@Paulo-21 further to the above, you may want to consider using |
Yes i understand, it's roughly 2.5 Million elements.
Thank you, i will give a try ! |
With 0.22 (without the GIL Refs feature) and also on the upcoming 0.23, we now have the changes I mentioned above completed, and I consistently measure calling a noop PyO3 function as faster than calling a noop Python one. I will close this issue, I'm sure we will yet find more cases to optimise in future, they can be new issues. |
I have created this simple function:
In pure rust it takes ~60ns and when called in python ~22350ns. Can you help understand why this happens? (is slower than running in pure python)
The text was updated successfully, but these errors were encountered: