-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test(python): add read / write benchmarks #933
Conversation
# Description This PR builds in top of the changes to handling the runtime in #933. In my local tests this fixed #915. Additionally, I added the runtime as a property on the fs handler to avoid re-creating it on every call. In some non-representative tests with a large number of very small partitions it cut the runtime in about half. cc @wjones127 # Related Issue(s) <!--- For example: - closes #106 ---> # Documentation <!--- Share links to useful documentation --->
Performance improvements in filesystems. Slight improvement in read, and 3x improvement for writing. Before
After
|
@@ -531,10 +531,10 @@ impl ObjectOutputStream { | |||
Err(PyNotImplementedError::new_err("'read' not implemented")) | |||
} | |||
|
|||
fn write(&mut self, data: Vec<u8>) -> PyResult<i64> { | |||
fn write(&mut self, data: &PyBytes) -> PyResult<i64> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Vec<u8>
meant PyO3 was cloning all the input data. By using PyBytes
we avoid the copy.
@@ -229,7 +229,7 @@ impl DeltaFileSystemHandler { | |||
let file = self | |||
.rt | |||
.block_on(ObjectInputFile::try_new( | |||
self.rt.clone(), | |||
Arc::clone(&self.rt), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity, I would have thought the statements are equivalent, since rt is inside an Arc ... ?
@wjones127 - our users are very much looking forward to a new python release, and I think it would be an even greater release if it contains this PR, due to the performance improvements it contains :). Is there still something you want to do under this PR, or can we merge it? |
I think we can merge. |
# Description Considering adding continuous benchmarks to Python reader / writer. # Related Issue(s) <!--- For example: - closes delta-io#106 ---> # Documentation <!--- Share links to useful documentation --->
Description
Considering adding continuous benchmarks to Python reader / writer.
Related Issue(s)
Documentation