-
-
Notifications
You must be signed in to change notification settings - Fork 719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimize copying in maybe_compress
& byte_sample
#6273
Conversation
2981f08
to
705399f
Compare
Otherwise `memoryview` will raise a `TypeError`.
No need to pay the cost for copying here. Just use an empty `bytes` object for the `memoryview`. Should be faster in this case and saves us a check in the `cast` case.
This is converted to `int` here, but is unused below. So go ahead and drop it as it doesn't seem to be needed.
Also rename `nbytes` variable to `payload_nbytes` for clarity.
To allow more efficient accessing of the `payload` (like when selecting portions in `byte_sample`), take a `memoryview` of the data. Ensure that is 1-D contiguous `uint8` data. This makes it very similar to `bytes`, which will work well in `byte_sample` and compressors that handle only a narrow form of the Python Buffer Protocol. This allows us to drop various `ensure_bytes` calls in compression that would otherwise copy the data. Should reduce memory usage when serializing as part of transmission or spilling.
705399f
to
c4299c1
Compare
cc @dask/maintenance (in case anyone has thoughts on this) Also cc-ing @madsbk given the |
Unit Test Results 16 files ± 0 16 suites ±0 7h 40m 45s ⏱️ + 19m 41s For more details on these failures, see this check. Results for commit 8661564. ± Comparison against base commit 2286896. ♻️ This comment has been updated with latest results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just have a couple of thoughts, but it's definitely a good thing to do.
Co-authored-by: Martin Durant <martindurant@users.noreply.github.com>
Go ahead and exit immediately in this case before doing anything else.
This is a bit clearer while being just as fast.
Planning to merge end of day tomorrow if no comments |
As comparisons were effectively flipped from how they were before, these should have `=`s as a condition as well.
This can be quite a bit faster than `append`ing each value (particularly if resizing of the underlying array needs to occur).
These are basically unused and are expected to be `int`s internally. So just pick default values that are `int`s to start.
Avoid repeated copies while testing that don't add value here.
46c1b8f
to
8661564
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks Mads! 🙏 |
Thanks all! 🙏 Going to get this in. If anything else comes up, happy to follow up separately. |
Currently there are a bunch of copies that occur in
maybe_compress
andbyte_sample
. Some of these are explicit (like callingensure_bytes
) and some are implicit (like slicing). In either case it would be good to avoid additional memory allocation and copying in these functions when it is not needed. After all these code paths can be triggered when sending data over the wire or spilling to disk (either could be occurring due to memory pressure that we don't want to add to).pre-commit run --all-files