Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use QUIC for all communications between peers #8

Open
nkeywal opened this issue Nov 29, 2018 · 12 comments
Open

Use QUIC for all communications between peers #8

nkeywal opened this issue Nov 29, 2018 · 12 comments

Comments

@nkeywal
Copy link

nkeywal commented Nov 29, 2018

QUIC is a network protocol defined by Google, implemented in Chrome, used by various Google's services like Youtube or Maps. Its scope is TCP+TLS, but it's implemented on top of UDP. Standardization is in progress at the IETF:

Here is what could be interesting for us:

  • it's supposed to be more efficient than TCP.
  • generic protocol with encrypted communications, and will be used for https connections. Ethereum nodes' communications will be more complicated to identify/block for an external actor
  • in some circumstances, low cost for establishing a new communication (0 RTT)

This last point is very interesting, because it allows to connect to a lot of peers. That's especially useful for attesters or block producers: they need to push their signatures/blocks, and contacting more nodes lowers the impact of a sybil attack at the p2p level (#6). It's also interesting if we want to go the Tor route (github issue to be created). There is no magic for the 0 RTT trick however: it works by caching the communications keys.

As of today, it's a work in progress: even if it's used at Google for a while the standardization is not finished (see this for a high level picture of the impact: https://blog.cloudflare.com/the-road-to-quic/) It's under implementation for the libp2p team. Other implementations are listed here: https://github.com/quicwg/base-drafts/wiki/Implementations. Anyway there is no need to rush, but we can track the progress in this issue. On our side (Consensys/PegaSys) we will give it a first try in December.

@Mikerah
Copy link

Mikerah commented Feb 13, 2019

Have the simulations for using QUIC in sharding been completed? If so, are there any results to share?

@nkeywal
Copy link
Author

nkeywal commented Feb 14, 2019

When we tried in December (with the libp2p) we had packaging issues so we decided to pause it. We're going to try again soon (within ~4 weeks) on Handel.

@fjl
Copy link

fjl commented Feb 15, 2019

in some circumstances, low cost for establishing a new communication (0 RTT)

This last point is very interesting, because it allows to connect to a lot of peers

It would be very interesting to verify how efficient this is for real. Setting up a QUIC connection isn't free. What you can do with zero-roundtrip connects is to send encrypted/authenticated data in the first packet. Setting up an interactive connection will probably still require roundtrips.

@Mikerah
Copy link

Mikerah commented Feb 15, 2019

Setting up a QUIC connection isn't free
From my understanding, setting up a QUIC connection requires 1 packet whereas with TCP, requires a 3-way handshake. It's much easier to send 1 packet to multiple peers instead of doing a 3-way handshake with multiple peers.

@bkolad
Copy link

bkolad commented Mar 12, 2019

We evaluated QUIC-go protocol as a transport layer for the handel framework:
https://github.com/ConsenSys/handel/

We observed 3x slowdown compared to UDP based network (experiments on 500 one-core AWS nodes).
The most important factors we identified are:

  1. 0-RTT handshake not supported in QUIC-go yet (with UDP we don't have handshake)
  2. QUIC is using encryption by default (our UDP communication is not encrypted) and handel is CPU intensive (BLS signature verification) so the whole protocol slows down due to CPU overload.

Consensys/handel#4

@raulk
Copy link

raulk commented Mar 13, 2019

@marten-seemann and @bkolad have been chatting offline about the QUIC experiment. A slowdown of 3x is unexpected and Marten has provided some guidance about elements to adjust, such as congestion control sizing, preestablishing connections, the AcceptCookie callback (which by default adds 1-RTT) and others.

@bkolad were you able to iterate on those? Is there a stress test in https://github.com/ConsenSys/handel/ that we could use to replicate your setup and test scenario?

@raulk
Copy link

raulk commented Mar 13, 2019

I quickly reviewed the QUIC network implementation. Unless I'm mistaken, it seems to be thrashing sessions (opening a QUIC session, reading one packet, then closing the QUIC session).

Renegotiating QUIC sessions on every packet is likely a big cause of slowdown. With this behaviour, the UDP and QUIC versions aren't really comparable.

Could you please keep QUIC sessions open and run the benchmark again?

I filed an issue with details: Consensys/handel#126.

@bkolad
Copy link

bkolad commented Mar 13, 2019

@raulk @marten-seemann
Please see more details here:
Consensys/handel#4

The initial slowdown I reported was 4x, after implementing the AcceptCookie callback it went down to 3x at this point I was happy with the result as I think the handshake and encryption overhead are unavoidable (like I pointed out handel spends most of the CPU time on bls signature verification and the QUIC encryption adds on top of it). I run the stress tests on our custom test bed of 500 AWS nodes.
I agree the scenario is not directly comparable to the UDP case and handel fits better the UDP model. Our intention was not to compare QUIC to UDP but rather switch to QUIC and check what happens for handel protocol (hoping that 0-RTT handshake would do a miracle).

Thanks for filling the issue, I will give more detailed answer regarding session management there.

@bkolad
Copy link

bkolad commented Mar 13, 2019

For ETH2.0 context I think we should continue the investigation of using QUIC for communication between peers as proposed by @nkeywal

@raulk
Copy link

raulk commented Mar 14, 2019

Thanks for the info, @bkolad!

Our intention was not to compare QUIC to UDP but rather switch to QUIC and check what happens for handel protocol

IIUC, the UDP reification of the network in Handel doesn't set up a secure channel.

If encryption and authentication, parallel conversations (multiplexing), reliability or congestion control are non-requirements, then QUIC is a poor functional fit for this use case.

A more accurate comparison would be UDP + (overlaid multiplexing + encryption + congestion control) vs. QUIC.

In practice, Handel would not run in isolation but on the Serenity network where these aspects are relevant.

(hoping that 0-RTT handshake would do a miracle)

Could you elaborate on this? In terms of what? Your UDP variant is not handshaking from what I gather.

@bkolad
Copy link

bkolad commented Mar 14, 2019

Could you elaborate on this? In terms of what? Your UDP variant is not handshaking from what I gather.

I am not being clear, for reasons you pointed out any stateful protocol would perform worse in terms of latency (TCP/TLS, QUIC etc) compared to the UDP. We are thrashing sessions for every packet and we pay the cost of handshake every time. In my intuition the latency should be:
QUIC > QUIC-0-RTT (when peer contact a node it saw before we wouldn't pay for the RTT) > UDP and we thought it would be interesting to see how much 0-RTT helps here(by miracle I meant the latency would be close to UDP).

In practice, Handel would not run in isolation but on the Serenity network where these aspects are relevant.

Yes that's why I think it is interesting exercise to try out QUIC.

@raulk
Copy link

raulk commented Mar 14, 2019

Yes that's why I think it is interesting exercise to try out QUIC.

Yeah, and thanks for spearheading this effort in the Serenity community! I wanted to make sure we drew accurate conclusions out of your experiment, which we seem to agree on now. Cheers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants