Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(quick-protobuf-codec): reduce allocations during encoding #4782

Merged
merged 22 commits into from
Nov 24, 2023

Conversation

thomaseizinger
Copy link
Contributor

@thomaseizinger thomaseizinger commented Nov 2, 2023

Description

We can reduce the number of allocations during the encoding of messages by not depending on the Encoder implementation of unsigned-varint but doing the encoding ourselves.

This allows us to directly plug the BytesMut into the Writer and directly encode into the target buffer. Previously, we were allocating each message as an additional buffer that got immediately thrown-away again.

Related: #4781.

Notes & open questions

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • A changelog entry has been made in the appropriate crates

@thomaseizinger thomaseizinger marked this pull request as draft November 2, 2023 01:55
@thomaseizinger
Copy link
Contributor Author

thomaseizinger commented Nov 2, 2023

Draft because it is stacked on top of another PR, but ready for review.

@joshuef
Copy link
Contributor

joshuef commented Nov 2, 2023

@thomaseizinger I'm trying to have a bash at this for heaptracking, but getting compile errors atm.

Is there a stablish commit here worth testing from?

eg
error[E0277]: the trait bound `quick_protobuf_codec::Codec<proto::structs::Message>: Decoder` is not satisfied --> /Users/josh/.cargo/git/checkouts/rust-libp2p-98135dbcf5b63918/aa08f5a63e91/protocols/autonat/src/protocol.rs:48:43 | 48 | let message = FramedRead::new(io, codec()) | --------------- ^^^^^^^ the trait `Decoder` is not implemented for `quick_protobuf_codec::Codec<proto::structs::Message>` | | | required by a bound introduced by this call | = help: the following other types implement trait `Decoder`: BytesCodec LengthCodec LinesCodec asynchronous_codec::framed_write::FramedWrite2<T>

@thomaseizinger
Copy link
Contributor Author

Try the one before the last merge commit! Sorry for that, I didn't expect there to be compile errors with merging master 😅

@joshuef
Copy link
Contributor

joshuef commented Nov 2, 2023

Ran into issues today and didnt have time to get deeper. Probs will be next week when i come back with heaps i'm afraid as I'm off tomorrow

@thomaseizinger
Copy link
Contributor Author

Perhaps @jxs has time to check whether this reduces allocations for lighthouse?

@jxs
Copy link
Member

jxs commented Nov 4, 2023

@thomaseizinger I'm trying to have a bash at this for heaptracking, but getting compile errors atm.

Is there a stablish commit here worth testing from?

eg error[E0277]: the trait bound `quick_protobuf_codec::Codec<proto::structs::Message>: Decoder` is not satisfied --> /Users/josh/.cargo/git/checkouts/rust-libp2p-98135dbcf5b63918/aa08f5a63e91/protocols/autonat/src/protocol.rs:48:43 | 48 | let message = FramedRead::new(io, codec()) | --------------- ^^^^^^^ the trait `Decoder` is not implemented for `quick_protobuf_codec::Codec<proto::structs::Message>` | | | required by a bound introduced by this call | = help: the following other types implement trait `Decoder`: BytesCodec LengthCodec LinesCodec asynchronous_codec::framed_write::FramedWrite2<T>

Hi @joshuef this seems fixable by running cargo clean and cargo build again.

Perhaps @jxs has time to check whether this reduces allocations for lighthouse?

Sorry Thomas, this fix runs on the latest rust-libp2p master which I could not make compatible with Lighthouse's current transport stack, either because SelectUpgrade doesn't implement InboundConnectionUpgrade or because I couldn't add with_bandwidth_logging to the the new SwarmBuilder with tcp quic and dns, my changes are here if you want to take a look.
Thanks!

@thomaseizinger
Copy link
Contributor Author

Oh, thanks for trying @jxs. I might try and backport it for v0.52 then.

@joshuef
Copy link
Contributor

joshuef commented Nov 5, 2023

Oookay, reupped main and the branch. The difference is only in the libp2p v. Main is on 0.52, the libp2p-branch one is running this (so w/0.53 changes in). All sorted by allocations here.

baseline (main):
main-w_52
main w/ this PR:
mainw-libp2p-branch

Focussing on gossip specifically (main is the top, this PR bottom)
main on top, branch on bottom

So the main allocating functions seem to change. This PR seems like it's allocating a fair bit less...

Though the overall stats main is leaking/allocating less overall 🤔 (same order).

I have the heaps for this if that's helpful (largest is ~12mb i think).

ovarllall maon top, branch bottom

I'm not sure how to account for that difference on the overall. But it's not complete apples-apples as main is on 0.52x as I say...

i'm going to run our main w/ the 0.53 in libp2p/master to get more apples/apples with this PR.

@joshuef
Copy link
Contributor

joshuef commented Nov 5, 2023

Okay, here is our main w/ master libp2p 0.53.

Screenshot 2023-11-05 at 15 07 36 Screenshot 2023-11-05 at 15 07 49 Screenshot 2023-11-05 at 15 07 59

@thomaseizinger
Copy link
Contributor Author

thomaseizinger commented Nov 9, 2023

I've done some more benchmarking of this code. Here are the results:

  • Less allocations. As intended with this refactoring, we are essentially no longer allocating (assuming the target buffer is big enough):

    Baseline This PR
    Screenshot from 2023-11-09 14-02-37 Screenshot from 2023-11-09 14-01-54

    Pay attention to peak heap memory consumption. This is 3MB for the baseline and 2MB for the this version. With a message of 1 MB size, the large_message test needs to allocate the message itself plus the target buffer. In the baseline version, we allocate an additional 1MB for encoding. This is now gone. I've generated this data but running heaptrack against the large_message test binary.

  • Slight performance improvement for encoding. This is expected as we perform less allocations now:
    image

@thomaseizinger thomaseizinger force-pushed the deps/less-allocations-in-codec branch from 7975a8b to 81ef6cc Compare November 14, 2023 02:21
Base automatically changed from dependabot/cargo/asynchronous-codec-0.7.0 to master November 14, 2023 22:22
@thomaseizinger thomaseizinger force-pushed the deps/less-allocations-in-codec branch from 81ef6cc to 7893b81 Compare November 14, 2023 22:26
@thomaseizinger thomaseizinger marked this pull request as ready for review November 14, 2023 22:26
@thomaseizinger
Copy link
Contributor Author

cc @altonen It might be of interest to you as maintainer of unsigned-varint that we are no longer using its Encoder and Decoder impl because it results in lots of temporary allocations. I think the only way this could be avoided is if the codec of unsigned-varint holds another inner codec that directly encoder item to the destination buffer.

In our case, that would be a protobuf-aware codec. With the current impl (prior to this PR), we have to encode the message into a temporary buffer because the codec in unsigned-varint only accepts Buf. After playing around with these impls, I came to the conclusion that is likely better to not use a codec for length-prefixing in the first place but rather provide a single, more elaborate codec that can efficiently encode typed messages.

This is quite significant as it affects every protobuf message sent over libp2p, e.g. kademlia, gossipsub, identify, ...

@altonen
Copy link

altonen commented Nov 15, 2023

Thanks for the ping. I've noticed the same thing as well which is why litep2p is not directly using unsigned-varint's Encode/Decode because it had quite a negative effect on performance in some load testing.

If I ever find time, I'll take a look if the issue can be fixed in unsigned-varint because right now the situation is not good.

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned out-of-band, great catch!

misc/quick-protobuf-codec/src/lib.rs Show resolved Hide resolved

This comment was marked as resolved.

mxinden added a commit to mxinden/rust-libp2p that referenced this pull request Nov 21, 2023
… BufMut::put

Builds on top of libp2p#4782.

Removes the requirement for `unsafe` code.
misc/quick-protobuf-codec/src/lib.rs Outdated Show resolved Hide resolved
misc/quick-protobuf-codec/src/lib.rs Outdated Show resolved Hide resolved
@thomaseizinger
Copy link
Contributor Author

@mxinden I incorporated #4911.

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the find and the fix!

@mxinden
Copy link
Member

mxinden commented Nov 23, 2023

@mergify refresh

Copy link
Contributor

mergify bot commented Nov 23, 2023

refresh

✅ Pull request refreshed

@mxinden
Copy link
Member

mxinden commented Nov 24, 2023

Reran webrtc job. Seems like it didn't get a runner.

@mergify mergify bot merged commit d851d1b into master Nov 24, 2023
71 checks passed
@mergify mergify bot deleted the deps/less-allocations-in-codec branch November 24, 2023 09:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants