Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiment with using bytes::Bytes to back bytes and string fields #190

Closed
wants to merge 4 commits into from

Conversation

nrc
Copy link
Contributor

@nrc nrc commented May 30, 2019

This PR uses bytes::Bytes as the type for bytes and a new type BytesString as the type for string (rather than Vec<u8> and String). BytesStringis a wrapper around Bytes in the same way that String is a wrapper around Vec<u8>. Obviously this is a severe breaking change.

This idea has been discussed in #31 and #35.

This PR improves performance by 25% (!) on Mac OS, 9% on Ubuntu, and 2% on CentOS. This was tested using a fairly realistic benchmark dervied from TiKV, however, the bytes and string values were all quite small. I expect the performance improvements would be smaller for larger values. I also tested a multi-threaded version of the benchmark on Ubuntu, there the average improvement was only 1.5%, however, the worst-performing thread improved by 14%, so the overall time to completion was significantly improved.

If we should go forward with this PR, then I think there are few open questions, but I thought I'd post early to get feedback about whether it was possible to land at all, given the back-compat situation.

If we do land, I think we can probably get rid of the BytesMutString type and just use BytesString - I didn't see any performance benefit or usability benefit in using the BytesMut version.

One option might be to land the change to bytes but not to string. I feel like that is a less invasive change but will get a decent benefit for some users.

@danburkert what do you think?

nrc added 3 commits May 30, 2019 15:14
Signed-off-by: Nick Cameron <nrc@ncameron.org>
Signed-off-by: Nick Cameron <nrc@ncameron.org>
Signed-off-by: Nick Cameron <nrc@ncameron.org>
@mzabaluev
Copy link
Contributor

Coincidentally, I put some time into a yet-unpublished crate to provide a UTF-8 invariant enforcement wrapper for Bytes. I intend to use it in a text encoding library for Tokio, but it could be reused here as well, rather than reinventing the string wrapper specifically for prost.

@danburkert
Copy link
Collaborator

danburkert commented Jun 7, 2019

Hey @nrc thanks for the work on this. I’m definitely in support of this direction.

I’d prefer if this were made configurable as an option on Prost-build in the same way that hashmap/btree map is. That way the default can stay as the std types, and the bytes based types can be used if/when perf issues arise.

I agree re. Not maintaining our own string wrapper over bytes, it seems like well want to use whatever shakes out as the community choice.

Re. Performance, in a bit surprised that this resulted in improvements. My understanding is that this PR isn’t using ref count sharing to clone into the fields, right? If not, what explains the perf difference?

@danburkert
Copy link
Collaborator

My understanding is that this PR isn’t using ref count sharing to clone into the fields, right?

Ah, so I'd been thinking about this entirely from the perspective of decoding performance, but maybe the benchmark is measuring time to create & serialize a message?

// inserted into it the resulting vec are valid UTF-8. We check
// explicitly in order to ensure this is safe.
super::bytes::merge(wire_type, value.as_mut_vec(), buf)?;

super::bytes::merge(wire_type, value.as_bytes_mut(), buf)?;
str::from_utf8(value.as_bytes())
.map_err(|_| DecodeError::new("invalid string value: data is not UTF-8 encoded"))?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong: value is left with broken UTF-8. The previous code was wrong too, FWIW.

String::as_mut_vec should be banned and replaced with a more principled scope guard API that would sanitize the buffer on unwrap. I've been too lazy to write an RFC on this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mzabaluev that's a great catch. Any interest in sending a PR? I imagine the fix will need to be something along the lines of swapping the string field with an empty string (to retain any allocated capacity), converting to a Vec<u8>, copy the bytes in, then converting back to a String and storing it?

IIUC the only way this could have been caught without a specific unit test is if we ran the fuzzer under MIRI, but I'm not sure if that's technically possible. In any case, it should be straightforward to check with a unit test.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be worthwhile to audit for any other uses of as_mut_vec() as well, you are definitely right that that's a footgun of an API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#194 has a fix for the master branch. That's the only use of String::as_mut_vec in the repository.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I understand the problem here - the issue is if there is an error in merge and we implicitly return via the ?. Do we assume that value is valid in the error case? I suppose the user is able to do whatever they want to so we shouldn't give them invalid data.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nrc It is UB to expose invalid UTF-8 data inside a String. Both the error case, and any panic in the Buf implementation, may result in a malformed string being accessed.

value: &mut Vec<u8>,
buf: &mut B,
) -> Result<(), DecodeError>
pub fn merge<B>(wire_type: WireType, value: &mut Bytes, buf: &mut B) -> Result<(), DecodeError>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current shape, this just replaces Vec::extend_from_slice with Bytes::extend_from_slice, so it does not really achieve the zero-copy optimization requested for by #31 on the decoding side. It's hard to see how this code alone could, though, since the source buffer is behind the Buf abstraction. How about changing the merge method to extract data from Bytes, so that the value representations backed by Bytes could split from the buffer?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this is why I've been holding off on adding support for Bytes fields; as I mentioned in #31 it's not possible to get the ref counting zero copy support without specialization. Switching Bytes for Buf in the Message API is not something I want to do.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I don't have any issue landing a PR to add support for Bytes fields, and if it's technically possible I'm also in favor of supporting zero-copy decoding through a nightly feature flag + specialization. The only thing I'm not willing to do at this point is change the Message API to enable zero copy.

@nrc
Copy link
Contributor Author

nrc commented Jun 9, 2019

Re. Performance, in a bit surprised that this resulted in improvements. My understanding is that this PR isn’t using ref count sharing to clone into the fields, right? If not, what explains the perf difference?

The performance improvement is from the improved allocation patterns - 'small' Bytes are allocated inline rather than on the heap and this avoids a lot of small allocations, since that is slow with some allocators, this causes a big improvement. On platforms with better allocators (or using a custom allocator) the difference is far less significant.

@nrc
Copy link
Contributor Author

nrc commented Jun 9, 2019

I’d prefer if this were made configurable as an option on Prost-build in the same way that hashmap/btree map is. That way the default can stay as the std types, and the bytes based types can be used if/when perf issues arise.

I agree re. Not maintaining our own string wrapper over bytes, it seems like well want to use whatever shakes out as the community choice.

To clarify, how would you like to proceed? Use Bytes for Vec<u8> but do nothing with Strings for now, and with the Bytes stuff behind a Cargo feature?

@danburkert
Copy link
Collaborator

To clarify, how would you like to proceed? Use Bytes for Vec but do nothing with Strings for now, and with the Bytes stuff behind a Cargo feature?

Good question, there are a couple of concerns here so I'll try and break them out.

  • Whether to do only bytes in this PR or also support string as well: I don't have a strong preference, I'd defer to you as the implementer to do what's more straightforward.

  • What will the default generated types be: I have a strong preference that the default for a protobuf string translates to a Rust String (the status quo). I don't have a strong preference for bytes, but w/o a compelling reason otherwise I'd err on the side of consistency, so it would default to Vec<u8>.

  • how users will opt-in to changing the generated type: A Cargo feature isn't a good fit for the opt-in configuration here, due to the viral nature of cargo features. Instead, I think it'd be better to follow the lead of prost-build::Config::btree_map and make it configurable at a finer grained level. This will make the feature harder to implement -- it's going to require effectively adding new 'primitive' field types to prost-derive and prost::encoding -- but I feel it's the 'right' choice WRT configurability and avoiding surprising cargo feature side effects.

@danburkert
Copy link
Collaborator

The performance improvement is from the improved allocation patterns - 'small' Bytes are allocated inline rather than on the heap and this avoids a lot of small allocations, since that is slow with some allocators, this causes a big improvement. On platforms with better allocators (or using a custom allocator) the difference is far less significant.

I see, thanks. That makes sense, and is a compelling reason to support this feature even independent of zero-copy decoding.

@nrc
Copy link
Contributor Author

nrc commented Jun 11, 2019

Good question, there are a couple of concerns here so I'll try and break them out.

Cool, thanks for the pointers! I'll work on bringing this PR inline with that. I would like to get #186 and #187 landed before I work more on this (those two are blocking us using Prost, whereas this one is not).

@mzabaluev
Copy link
Contributor

Would it be hard or unnecessarily complicated to make the choice of representation types for bytes and string fields configurable in prost-build? What would be necessary to implement for the repr types, besides Default and the traits for encoding and decoding?

@mzabaluev
Copy link
Contributor

I have published strchunk to crates.io now. StrChunk can be used for string fields instead of a wrapper type specific to this crate.

@quininer
Copy link
Member

The latest bytes have deleted SBO, Is there still performance improvement?

@mzabaluev
Copy link
Contributor

The latest bytes have deleted SBO, Is there still performance improvement?

The SBO was a dubious optimization to begin with, as it caused additional branching and it was not clear at all that much usage would fit within the inlined buffers even inside a single application.
The zero-copy advantage of Bytes persists, though.

@nrc
Copy link
Contributor Author

nrc commented Nov 27, 2019

We'd need to benchmark to know. SBO made a huge difference with the default allocator, but not too much difference when using jemalloc. Would be interesting to investigate.

@rolftimmermans
Copy link
Contributor

I am interested in using Bytes instead of Vec<u8> for fields of type bytes. We are dealing with messages where most of the payload is a medium-large blob. It would be great to be able to decode the message without copying the blobs. I'd be happy with any sort of opt-in mechanism (either per field, or as a Cargo feature).

Is this PR still on the radar?

@danburkert
Copy link
Collaborator

I think the state of things is still the same as what I summarized in https://github.com/danburkert/prost/pull/190/files#r291820981. I'm open to a PR which adds Bytes field generation as an option for protobuf string/bytes fields, but it needs to be opt-in. There's already a pattern for this kind of opt-in codegen change in the repo with HashMap and BTreeMap.

As far as zero-copy deserialization, I think it's still going to require changes to either the bytes crate (I think an additional API such as fn Buf::take_bytes(usize) -> Bytes), or specialization. Landing support for Bytes fields will be a good first step, then we can prioritize zero copy from there.

@cbeck88
Copy link
Contributor

cbeck88 commented May 12, 2020

@danburkert what would you think about using a trait-based approach instead of what is proposed in this PR? Like, if it supports From<&[u8]> and AsRef<&[u8]>, (or possibly, TryFrom instead of From?) then it can be used as bytes in prost-derive? Prost itself likely doesn't use any vec-specific stuff right? or it could abstract what it does use as a trait?

Or maybe, what I'm saying is orthogonal to this -- I'm describing a feature of prost-derive and they are describing a feature of prost-build

@danburkert
Copy link
Collaborator

@garbageslam not sure I fully understand, but that sounds orthogonal. prost definitely does 'know' about vec; the derived code calls into vec specific serialization/deserialization routines defined in src/encoding.rs.

@cbeck88
Copy link
Contributor

cbeck88 commented May 12, 2020

I see, I forgot this, never mind. thanks

@rolftimmermans
Copy link
Contributor

I think the state of things is still the same as what I summarized in https://github.com/danburkert/prost/pull/190/files#r291820981. I'm open to a PR which adds Bytes field generation as an option for protobuf string/bytes fields, but it needs to be opt-in. There's already a pattern for this kind of opt-in codegen change in the repo with HashMap and BTreeMap.

As far as zero-copy deserialization, I think it's still going to require changes to either the bytes crate (I think an additional API such as fn Buf::take_bytes(usize) -> Bytes), or specialization. Landing support for Bytes fields will be a good first step, then we can prioritize zero copy from there.

Thanks for your suggestions. I implemented this in #337.

@danburkert
Copy link
Collaborator

Thanks for your suggestions. I implemented this in #337.

Closing accordingly!

@danburkert danburkert closed this Nov 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants