-
Notifications
You must be signed in to change notification settings - Fork 506
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experiment with using bytes::Bytes
to back bytes and string fields
#190
Conversation
Signed-off-by: Nick Cameron <nrc@ncameron.org>
Signed-off-by: Nick Cameron <nrc@ncameron.org>
Coincidentally, I put some time into a yet-unpublished crate to provide a UTF-8 invariant enforcement wrapper for |
Hey @nrc thanks for the work on this. I’m definitely in support of this direction. I’d prefer if this were made configurable as an option on Prost-build in the same way that hashmap/btree map is. That way the default can stay as the std types, and the bytes based types can be used if/when perf issues arise. I agree re. Not maintaining our own string wrapper over bytes, it seems like well want to use whatever shakes out as the community choice. Re. Performance, in a bit surprised that this resulted in improvements. My understanding is that this PR isn’t using ref count sharing to clone into the fields, right? If not, what explains the perf difference? |
Ah, so I'd been thinking about this entirely from the perspective of decoding performance, but maybe the benchmark is measuring time to create & serialize a message? |
// inserted into it the resulting vec are valid UTF-8. We check | ||
// explicitly in order to ensure this is safe. | ||
super::bytes::merge(wire_type, value.as_mut_vec(), buf)?; | ||
|
||
super::bytes::merge(wire_type, value.as_bytes_mut(), buf)?; | ||
str::from_utf8(value.as_bytes()) | ||
.map_err(|_| DecodeError::new("invalid string value: data is not UTF-8 encoded"))?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks wrong: value
is left with broken UTF-8. The previous code was wrong too, FWIW.
String::as_mut_vec
should be banned and replaced with a more principled scope guard API that would sanitize the buffer on unwrap. I've been too lazy to write an RFC on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mzabaluev that's a great catch. Any interest in sending a PR? I imagine the fix will need to be something along the lines of swapping the string field with an empty string (to retain any allocated capacity), converting to a Vec<u8>
, copy the bytes in, then converting back to a String and storing it?
IIUC the only way this could have been caught without a specific unit test is if we ran the fuzzer under MIRI, but I'm not sure if that's technically possible. In any case, it should be straightforward to check with a unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be worthwhile to audit for any other uses of as_mut_vec()
as well, you are definitely right that that's a footgun of an API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#194 has a fix for the master branch. That's the only use of String::as_mut_vec
in the repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I understand the problem here - the issue is if there is an error in merge
and we implicitly return via the ?
. Do we assume that value
is valid in the error case? I suppose the user is able to do whatever they want to so we shouldn't give them invalid data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nrc It is UB to expose invalid UTF-8 data inside a String
. Both the error case, and any panic in the Buf
implementation, may result in a malformed string being accessed.
value: &mut Vec<u8>, | ||
buf: &mut B, | ||
) -> Result<(), DecodeError> | ||
pub fn merge<B>(wire_type: WireType, value: &mut Bytes, buf: &mut B) -> Result<(), DecodeError> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current shape, this just replaces Vec::extend_from_slice
with Bytes::extend_from_slice
, so it does not really achieve the zero-copy optimization requested for by #31 on the decoding side. It's hard to see how this code alone could, though, since the source buffer is behind the Buf
abstraction. How about changing the merge
method to extract data from Bytes
, so that the value representations backed by Bytes
could split from the buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this is why I've been holding off on adding support for Bytes
fields; as I mentioned in #31 it's not possible to get the ref counting zero copy support without specialization. Switching Bytes
for Buf
in the Message
API is not something I want to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I don't have any issue landing a PR to add support for Bytes
fields, and if it's technically possible I'm also in favor of supporting zero-copy decoding through a nightly feature flag + specialization. The only thing I'm not willing to do at this point is change the Message
API to enable zero copy.
The performance improvement is from the improved allocation patterns - 'small' Bytes are allocated inline rather than on the heap and this avoids a lot of small allocations, since that is slow with some allocators, this causes a big improvement. On platforms with better allocators (or using a custom allocator) the difference is far less significant. |
To clarify, how would you like to proceed? Use |
Good question, there are a couple of concerns here so I'll try and break them out.
|
I see, thanks. That makes sense, and is a compelling reason to support this feature even independent of zero-copy decoding. |
Would it be hard or unnecessarily complicated to make the choice of representation types for bytes and string fields configurable in prost-build? What would be necessary to implement for the repr types, besides |
I have published strchunk to crates.io now. |
The latest bytes have deleted SBO, Is there still performance improvement? |
The SBO was a dubious optimization to begin with, as it caused additional branching and it was not clear at all that much usage would fit within the inlined buffers even inside a single application. |
We'd need to benchmark to know. SBO made a huge difference with the default allocator, but not too much difference when using jemalloc. Would be interesting to investigate. |
I am interested in using Is this PR still on the radar? |
I think the state of things is still the same as what I summarized in https://github.com/danburkert/prost/pull/190/files#r291820981. I'm open to a PR which adds As far as zero-copy deserialization, I think it's still going to require changes to either the |
@danburkert what would you think about using a trait-based approach instead of what is proposed in this PR? Like, if it supports Or maybe, what I'm saying is orthogonal to this -- I'm describing a feature of prost-derive and they are describing a feature of prost-build |
@garbageslam not sure I fully understand, but that sounds orthogonal. |
I see, I forgot this, never mind. thanks |
Thanks for your suggestions. I implemented this in #337. |
Closing accordingly! |
This PR uses
bytes::Bytes
as the type forbytes
and a new typeBytesString
as the type forstring
(rather thanVec<u8>
andString
).BytesString
is a wrapper aroundBytes
in the same way thatString
is a wrapper aroundVec<u8>
. Obviously this is a severe breaking change.This idea has been discussed in #31 and #35.
This PR improves performance by 25% (!) on Mac OS, 9% on Ubuntu, and 2% on CentOS. This was tested using a fairly realistic benchmark dervied from TiKV, however, the bytes and string values were all quite small. I expect the performance improvements would be smaller for larger values. I also tested a multi-threaded version of the benchmark on Ubuntu, there the average improvement was only 1.5%, however, the worst-performing thread improved by 14%, so the overall time to completion was significantly improved.
If we should go forward with this PR, then I think there are few open questions, but I thought I'd post early to get feedback about whether it was possible to land at all, given the back-compat situation.
If we do land, I think we can probably get rid of the
BytesMutString
type and just useBytesString
- I didn't see any performance benefit or usability benefit in using theBytesMut
version.One option might be to land the change to
bytes
but not tostring
. I feel like that is a less invasive change but will get a decent benefit for some users.@danburkert what do you think?