-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce allocations & accept bytes and bytearray inputs #22
Conversation
It may be worth asking Failing that, bytesarray would be totally fine, it's drop-in in most places in python where you would use a bytes object (or memoryview). Note that for parquet, we do always know the output uncompressed size. If we refine for this, I expect everything can be made to work fine for the current code, right - in fact better, since we never over-allocate, and we don't have to ask the decompressor how long the output might need to be. Obviously this is the case for decompression only. |
Another nice feature might be a |
I'm uncertain about the buffer protocol, as from my understanding, would mean exposing a Rust type to Python which would own the de/compressed bytes; and would significantly alter the API. But, can dedicate more time to seeing what details lie beneath. Anyway, the points I'm getting here to complete this desired speed up is:
Would this work for you? As a side note, I don't think Interesting point about parquet output size. Is this only for snappy, or is that regardless of compression type in parquet? |
Parquet always gives the uncompressed size for any codec.
Preallocated numpy is important, because this is how fastparquet gets around creating dataframes by concatenation - it would avoid a copy. Rather specific to the use case, but i think still a great feature.
β¦On February 14, 2021 2:13:49 PM EST, Miles Granger ***@***.***> wrote:
I'm uncertain about the buffer protocol, as from my understanding,
would mean exposing a Rust type to Python which would own the
decompressed bytes; and would significantly alter the API. But anyway,
can dedicate more time to seeing what details lie beneath.
Anyway, the points I'm getting here to complete this desired speed up
is:
- [ ] All compression variants should accept `bytes` or `bytearray`s.
- [ ] Add an optional parameter where, from python, one can specify the
exact length of the de/compressed buffer, allowing both `PyBytes` and
`PyByteArray` to do a single allocation for the output.
Would this work for you?
As a side note, I don't think `decompress_into` would be very
difficult, just need to pass the pointer and length of the buffer to
Rust, from there we can simply hand the resulting slice to the
underlying de/compression variant. (but guess that's a separate
feature/issue from this) and I'm assuming it's then not for performance
reasons? Whether it's python/numpy doing the allocation or Rust is
really splitting hairs; albeit unless one is wanting to re-use a
buffer??
Interesting point about parquet output size. Is this only for snappy,
or is that regardless of compression type in parquet?
--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
#22 (comment)
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
|
@martindurant You are welcome to review if you'd like, but as it is now:
I've made a separate issue for de/compressing into a preallocated buffer #23 |
That's excellent news! Obviously, the inline comment docs will also need to be updated, including all of the variants that now exist. Actually, there is repeated code between the algorithms that could be simplified. So do I understand that if you pass bytes, you get bytes back, and if you pass bytearray, you get bytearray back? I'm not certain there's a good reason for ever having bytes anymore. I notice you redid the gzip benchmarks with the size parameter (although, I'm not sure what it means for the non-cramjam version). Do you have numbers for all the algorithms. |
Right-o, that's me being lazy.. :-) Will fix
I agree as there is a TODO in there for this, I was using a macro, as it's basically the same code, but there are subtle differences in some places that made it annoying enough for me to put it in a different PR/issue; it'll be functionally the same.
I would think that since most (all?) of the existing python variants at least take
Just saving time/space; w/ snappy there is hardly a difference, and gave gzip as an example of using the |
Agreed for the input - but we might choose to consistently return bytearrays. Just a thought. |
Wouldn't that be somewhat unexpected behavior when things like |
That's a fair argument |
@martindurant
Here's the start of it. πββοΈ
Problem: π
The timings I showed earlier used the pre-allocated
PyBytes::new_with
which allowed writing the de/compressed bytes to a single allocated buffer back to Python. However, one cannot know the exact length of de/compression output so we were stuck with dangling null values in the byte string output. ie.And it seemed quite impossible at the moment to pass the
Vec
allocated on the Rust side directly to Python without creating a custom Python exported Rust type which would govern the ownership of those bytes; and resizing the resulting byte string also seemed ill-advised.Solution?: βοΈ
We can use
bytearray
/PyByteArray
for Python/Rust types respectively as well asbytes
/PyBytes
, so if passedbytes
we'd see the current benchmark timings, while if passedbytearray
we would see the benchmarks you see below. This is becausebytearray
can be resized from Rust. So we only need to overallocate to the estimated size of the de/compressed bytes and then resize to the actual; keeping a single allocation for the bytes. Is it acceptable you think? If so, I'll update the remaining options which should further improve their performance as well.Thus, in the future, if a way it worked out to avoid double allocating buffers when using
bytes
objects, then the improvement would require no external API changes.Benchmarks ποΈ