Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snappy compression for DEVp2p #706

Merged
merged 5 commits into from
Oct 5, 2017
Merged

Conversation

karalabe
Copy link
Member

@karalabe karalabe commented Sep 7, 2017

Ping @fjl @Arachnid @vbuterin @Souptacular @arkpar @holiman.

Open question: Should we enforce compression for all connections running devp2p v5, or should we add a capabilities bitset instead which can be used to signal current (and potentially future) compression capabilities and let clients agree on common ones? The non obvious thing here is the extra complexity if multiple conflicting capabilities match (e.g. multiple compression algos, which one do we chose).

Update:

  • 2017-09-08: Per @gumb0's suggestion and a brief discussion, I've updated the spec to enforce limiting decompressed message sizes to 16MB (2^24-1). This ensures that higher level applications have the same guarantees about network messages as until now and makes the implementation simpler too as it doesn't need lazy decompression any more.
  • 2017-09-08: You can find a list of Snappy compression libraries at https://google.github.io/snappy/. All Ethereum client implementation languages seem to have their counterpart already implemented, so it should be a properly performing and battle-tested compression algorithm to use.

@Souptacular
Copy link
Contributor

Souptacular commented Sep 7, 2017

Sup? I have been pinged. The beacon has been lit.

@karalabe
Copy link
Member Author

karalabe commented Sep 7, 2017

Feedback welcome :)

@Arachnid
Copy link
Contributor

Arachnid commented Sep 7, 2017

👍

@holiman
Copy link
Contributor

holiman commented Sep 7, 2017

👍

Open question: Should devp2p enforce a limit on the decompressed size of a message too?

I think that sounds like a reasonable thing to have, yes.

@karalabe
Copy link
Member Author

karalabe commented Sep 7, 2017

@holiman Any suggestion on a message size limit?

@holiman
Copy link
Contributor

holiman commented Sep 7, 2017

Oh, no I don't know enough about the layer to make a good suggestion.

@pipermerriam
Copy link
Member

Can I get a sense for how this would effect light clients? Would light clients be required to implement this compression/decompression as well?

For Context: We're aiming to have Py-EVM be primary used as a light client and thus are trying to not require any system dependencies. So far we've been able to skirt around all of them, but this would require us to create our own pure python implementation of snappy (as I can't find any existing ones that seem well supported).

@bas-vk
Copy link
Member

bas-vk commented Sep 7, 2017

One concern that I have is that it can reveal size information about the payload if the application protocol is padding messages and trusting on the devp2p layer for encryption. If the compression algorithm is able to compress the added padding it can make the padding useless.

Whisper comes to mind but I believe that whisper does padding and encryption on the application level before pushing it to the network layer. So it probably isn't a problem for whisper. It can also be mitigated by choosing a padding algorithm that the compression algorithm cannot compress efficiently.

@mightypenguin
Copy link

Even with random noise padding there are attacks to try to compromise the encryption :(

@awfm9
Copy link

awfm9 commented Sep 8, 2017

Is there a reason to use snappy over lz4? It seems that it has a slightly better compression rate, while also offering better speed. I think it was also quite a bit faster for decompression, but I couldn't find any reference for that in my quick search to confirm.

@karalabe
Copy link
Member Author

karalabe commented Sep 8, 2017

@pipermerriam This would lie in the devp2p protocol, on top of which all other protocols (including light client) runs. As such, yes.

There would also be the possibility to make compression optional and enable it only if both sides signal via a flag in the protocol handshake. The issue I see is that "optional features" leads to a openssl-type pandorra's box where there are so many different modes of operation to mix and match, that it was extremely hard to make a properly secure combo. So my suggestion would be to enforce it and not allow wiggle-room for implementers to write suboptimal code.

@bas-vk I don't think devp2p should aim to provide darkness itself, rather to be a performant general purpose transport. If Whisper encrypts all its messages, Snappy should not be able to compress it at all, so it should not be able to interfere.

IMHO an applciation layer protocol should not rely on the underlying transport being one kind or another for security. I can imagine eventually having gateways into/out of Ethereum from other protocols, where it won't be devp2p, and app protocols shouldn't break because of it.

@awishformore I haven't done extensive benchmarks or comparisons between different compression algorithms, but Snappy seems to be fast enough (a single i7 core with the Go implementation can encode at 350MB/s text files (a lot of work, but a lot of gain too) and 15.7GB/s jpegs (non compressible)). At these rates compression speed doesn't seem to matter compared to network bandwidth limitations.

The other benefit of Snappy is that it has widely available and battle-tested implementations for just about any programming language out there: https://google.github.io/snappy/. As such, clients have ready tools out of the box that can be incorporated without any effort.

I'm happy to consider other compression algos if it can beat Snappy at both operation speed and implementation support.

@karalabe
Copy link
Member Author

karalabe commented Sep 8, 2017

@mightypenguin I would not consider the underlying devp2p transport a secure element. Purely because these higher level protocols (eth, les, shh, bzz) may run on top of any other transport too, so there's no guarantee that somewhere in the chain the content won't fly in plain-text. As such, if an application protocol needs security, it needs to take care of it itself and not rely on undernets.

@gumb0
Copy link
Member

gumb0 commented Sep 8, 2017

Would it make sense to enforce 16 MB limit for the decompressed size on the devp2p level? That's easier than implementing lazy decompressing (but also may be used together with it)

@karalabe
Copy link
Member Author

karalabe commented Sep 8, 2017

@gumb0 that would indeed be the most conservative solution as it would keep all current limits in place, just performing better transport wise.

@awfm9
Copy link

awfm9 commented Sep 8, 2017

@karalabe just FYI, on the original repository, you can see at https://github.com/lz4/lz4 that it beats Snappy by around 2x for decompression speed. As far as I'm aware of, there are native implementations in Go, C/C++, C#, Java (probably some others) and bindings for pretty much any other language.

@karalabe
Copy link
Member Author

karalabe commented Sep 8, 2017

I can't seem to find pure Go implementation apart from https://github.com/pierrec/lz4, which I don't find reliable enough to trust given that there are no benchmarks vs. Snappy having been developed officially by Google.

@holiman
Copy link
Contributor

holiman commented Sep 8, 2017

I'd also prefer Snappy over Lz4, since snappy is well maintained and battle-tested.

@karalabe
Copy link
Member Author

karalabe commented Sep 8, 2017

@gumb0 We've discussed it briefly with @holiman and @Arachnid and both seem to like your idea, so I'll update the spec to replace lazy decompression with an enforced limit of 16MB.

@axic
Copy link
Member

axic commented Sep 8, 2017

It seems Snappy has a pure Rust and Javascript implementations too (not sure about the speed), so that's a plus.

@arkpar
Copy link

arkpar commented Sep 8, 2017

I'd suggest making this optional on the subprotocol level. During handshake nodes would exchange an extra bit with each capability that tells if they want compression for that capability. If both parties agree on that bit the subprotocol session will be compressed. Some subprotocols, such as parity snapshot download and whisper already exchange compressed or encrypted data, so they would benefit from disabling devp2p compression.

@axic
Copy link
Member

axic commented Sep 8, 2017

@karalabe are you suggesting to also use the Snappy framing format? (Which has support for different chunks, including uncompressed chunks, for incompressible blocks).

I think the framing format is quite useless otherwise, because it has a header and a checksum, all of which are already provided by devp2p I assume.

@karalabe
Copy link
Member Author

karalabe commented Sep 8, 2017

@axic No, I'm suggesting just plain compression of the message payloads.

@axic
Copy link
Member

axic commented Sep 8, 2017

@karalabe perhaps you could make this clear and also link to the spec https://github.com/google/snappy/blob/master/format_description.txt

Also worth mentioning in the "16Mb limit section" that the decompressed length is the first value in the stream:

The stream starts with the uncompressed length (up to a maximum of 2^32 - 1),
stored as a little-endian varint.

@arkpar it seems that up to 4GB of literal data can be stored in a chunk in Snappy, should the compressor detect it is incompressible, it would store it as a literal.

For longer literals, the (len-1) value is stored after the tag byte,
little-endian. The upper six bits of the tag byte describe how
many bytes are used for the length; 60, 61, 62 or 63 for
1-4 bytes, respectively. The literal itself follows after the
length.

@karalabe
Copy link
Member Author

karalabe commented Sep 8, 2017

@arkpar I don't think the complexity is worth the performance save.

  • The current devp2p protocol supports adding new fields to the handshake, but does not support adding new fields to the capabilities of subprotocols. This means you can't just advertise a higher version with extra data in those fields, because old clients will immediately reject the messages and not connect, irrelevant that they could still talk to each other. It took many months of coordination the last time the handshake format was changed to ensure the network doesn't fall apart and all clients had to be updated first, before they even started using the new handshake format (i.e. two release cycles).
  • Snappy "compression" on uncompressable data (i.e. already compressed) is at 15+GB/s on a single core in the Go implementation. Since you're pushing the compressed messages down a network pipe, spending 1/480th extra of your CPU to saturate a gigabit fiber line doesn't seem to make a difference.

The current proposal is easy to add to any client, elegant in that it's immediately useful for all subprotocols and backward compatible with old clients. If you think your suggestion can be done in a similar way I'd be curious to see a reference implementation, but as I highlighted above, I don't think it can be done without a coordinated overhaul of the devp2p protocol once again.

@Arachnid
Copy link
Contributor

Arachnid commented Sep 8, 2017

I agree, and think the extra engineering effort of selective compression isn't worth it; compression is fast and widely useful, and far simpler if just implemented across the board.

@pipermerriam
Copy link
Member

After having a day to think about this I'm going to formally withdraw any objections to this being required or suggestion that we make it optional. Snappy seems well supported and we'll figure out a way to work around the dependency if needed.

@esaulpaugh
Copy link

esaulpaugh commented Sep 23, 2017

Simple snappy test vectors would be convenient but I can't find any. This is what I'm getting for input "Wikipedia is a free, web-based, collaborative, multilingual encyclopedia project." with xerial/snappy-java:

51F05057696B697065646961206973206120667265652C207765622D62617365642C20636F6C6C61626F7261746976652C206D756C74696C696E6775616C20656E6379636C6F70656469612070726F6A6563742E

And in framed stream mode (x-snappy-framed):

FF060000734E6150705901550000609A375657696B697065646961206973206120667265652C207765622D62617365642C20636F6C6C61626F7261746976652C206D756C74696C696E6775616C20656E6379636C6F70656469612070726F6A6563742E

Note that the example compression on the Snappy Wikipedia page is invalid:

CA02F04257696B697065646961206973206120667265652C207765622D62617365642C20636F6C6C61626F7261746976652C206D756C74696C696E6775616C20656E6379636C6F093FF08170726F6A6563742E00000000000000000000000000

@karalabe
Copy link
Member Author

karalabe commented Sep 23, 2017 via email

@karalabe
Copy link
Member Author

@esaulpaugh @axic I've added a note that no framing is used whatsoever due to devp2p already being message based. Also I've provided 2 test vector encodings of a block RLP generated with Go and Python, along with verification utilities in both languages.

@@ -0,0 +1,192 @@
## Preamble

EIP: <to be assigned>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EIP: 706

Copy link
Contributor

@Souptacular Souptacular left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few minor changes and spelling corrections.

## Preamble

EIP: <to be assigned>
Title: devp2p snappy compression
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEVp2p Snappy Compression

Author: Péter Szilágyi <peter@ethereum.org>
Type: Standard Track
Category: Networking
Status: Draft
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Status: Final

Created: 2017-09-07

## Abstract
The base networking protocol (devp2p) used by Ethereum currently does not employ any form of compression. This results in a massive amount of bandwidth wasted in the entire network, making both initial sync as well as normal operation slower and laggyer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slower and laggier


However, most of this data (blocks, transactions) are heavily compressable. By enabling compression at the message payload level, we can reduce the previous numbers to 1.01GB upload / 13.46GB download on the main network, and 46.21MB upload / 463.65MB download on the test network.

The motivation behind doing this at the devp2p level (opposed to eth for example) is that it would enable compression for all sub-protocols (eth, les, bzz) seamlessly, reducing any complexity those protocols might incur in trying to individually optimize for data traffic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEVp2p


### Avoiding DOS attacks

Currently a devp2p message length is limited to 24 bits, amoutning to a maximum size of 16MB. With the introduction of Snappy compression, care must be taken not to blidly decompress messages, since they may get significantly larger than 16MB.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amounting spelling fix.
blindy spelling fix.


Currently a devp2p message length is limited to 24 bits, amoutning to a maximum size of 16MB. With the introduction of Snappy compression, care must be taken not to blidly decompress messages, since they may get significantly larger than 16MB.

However, Snappy is capable of calculating the decompressed size of an input message without inflating it in memory. This can be used to discard any messages which decompress above some threshold. **The proposal is to use the same limit (16MB) as the threshold for decompressed messages.** This retains the same guarantees that the current devp2p protocol does, so there won't be surprises in application level protocols.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current DEVp2p


**Alternative solutions to data compression that have been brought up and discarded are:**

Extend protocol `xyz` to support compressed messages versus doing it at devp2p level:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEVp2p


Don't explicitly limit the decompressed message size, only the compressed one:

* **Pro**: Allows larger messages to traverse through devp2p.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEVp2p

* **Con**: Needs lazy decompression to allow size limitations without DOS.

## Backwards Compatibility
This proposal is fully backward compatible. Clients upgrading to the proposed devp2p protocol version `5` should still support skipping the compression step for connections that only advertise version `4` of the devp2p protocol.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEVp2p

@karalabe karalabe changed the title Snappy compression for devp2p Snappy compression for DEVp2p Sep 27, 2017
@karalabe
Copy link
Member Author

@Souptacular PTAL


Currently a DEVp2p message length is limited to 24 bits, amounting to a maximum size of 16MB. With the introduction of Snappy compression, care must be taken not to blindy decompress messages, since they may get significantly larger than 16MB.

However, Snappy is capable of calculating the decompressed size of an input message without inflating it in memory. This can be used to discard any messages which decompress above some threshold. **The proposal is to use the same limit (16MB) as the threshold for decompressed messages.** This retains the same guarantees that the current DEVp2p protocol does, so there won't be surprises in application level protocols.
Copy link
Member

@axic axic Sep 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you extend Snappy is capable of calculating the decompressed size of an input message without inflating it in memory with (taken from the spec):

The stream starts with the uncompressed length (up to a maximum of 2^32 - 1),
stored as a little-endian varint.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


* The handshake message is **never** compressed, since it is needed to negotiate the common version.
* Snappy framing is **not** used, since the DEVp2p protocol already message oriented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add here (or the "references" section) the link to the reference spec: https://github.com/google/snappy/blob/master/format_description.txt ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a references section to the end.


* The handshake message is **never** compressed, since it is needed to negotiate the common version.
* Snappy framing is **not** used, since the DEVp2p protocol already message oriented.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also would mention as a feature that should be no concern about compressing already compressed data since Snappy can store uncompressed literals up to 4GB in size should the compressor choose so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note.

@axic
Copy link
Member

axic commented Sep 27, 2017

Also if it was agreed to include this in Constantinople it would make sense changing the README too.

@karalabe
Copy link
Member Author

karalabe commented Oct 2, 2017

@axic Addressed all your comments. Btw, you don't need a hard fork to enable this update. DEVp2p v4 continues to operate without Snappy, and v5 with Snappy. The two can coexist within a node. PTAL.

@Souptacular
Copy link
Contributor

Changes have been applied and the EIP was agreed upon in Core dev meeting 25: ethereum/pm#23

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.