Declarative RLP Encoding/Decoding #7975

emlautarom1 · 2024-12-26T19:57:41Z

Changes

Introduce an alternative approach to RLP encoding and decoding, based on a declarative API with support for code generation through Source Generators

Types of changes

What types of changes does your code introduce?

Bugfix (a non-breaking change that fixes an issue)
New feature (a non-breaking change that adds functionality)
Breaking change (a change that causes existing functionality not to work as expected)
Optimization
Refactoring
Documentation update
Build-related changes
Other: Description

Testing

Requires testing

Yes
No

If yes, did you write tests?

Yes
No

Notes on testing

The core library has 100% test coverage. Source generated code might not be fully covered.

Documentation

Requires documentation update

Yes
No

Requires explanation in Release Notes

Yes
No

Remarks

When we started working on refactoring our TxDecoder one thing that came up was how unergonomic is to work with our current RLP API. We even have some comments on the code itself mentioning these difficulties, for example:

nethermind/src/Nethermind/Nethermind.Serialization.Rlp/Eip2930/AccessListDecoder.cs

Lines 17 to 23 in b81070d

    
           /// <summary> 
        
           /// We pay a high code quality tax for the performance optimization on RLP. 
        
           /// Adding more RLP decoders is costly (time wise) but the path taken saves a lot of allocations and GC. 
        
           /// Shall we consider code generation for this? We could potentially generate IL from attributes for each 
        
           /// RLP serializable item and keep it as a compiled call available at runtime. 
        
           /// It would be slightly slower but still much faster than what we would get from using dynamic serializers. 
        
           /// </summary>

nethermind/src/Nethermind/Nethermind.Serialization.Rlp/Eip2930/AccessListDecoder.cs

Lines 76 to 81 in b81070d

    
           /// <summary> 
        
           /// We pay a big copy-paste tax to maintain ValueDecoders but we believe that the amount of allocations saved 
        
           /// make it worth it. To be reviewed periodically. 
        
           /// Question to Lukasz here -> would it be fine to always use ValueDecoderContext only? 
        
           /// I believe it cannot be done for the network items decoding and is only relevant for the DB loads. 
        
           /// </summary>

This PR introduces a new RLP API based on #7334 (comment) with several improvements:

Describe the structure of a record and get encoding and decoding for free. No code duplication required.
Records can be described using other records. Supports conditional, exceptions, function calls, etc.
Decoding and encoding are extensible through classes that can be defined anywhere, plus some extension methods.
Minimal core library with 100% code coverage.
Supports backtracking.
All function calls are known ahead of time (no virtual or override). Interfaces are only used to enforce implementations.
Despite the extensive usage of lambdas, no closures are required (all lambdas are static). You can still use them if you want to, but overloads are provided to avoid them.
Automatically generate the required code through Source Generators.

- Test for Set Theoretical Representation

- Extend instances

- Good things happen to those who respect symmetry.

- Code works, but compiler complains

- Primitives are integers, byte sequences, and lists - No need for `IntRlpConverter` (covered by primitives)

- Replace virtual calls with conditional and inner state flag - Refactor call sites

- Rename to `FastRlp` to avoid conflicts

emlautarom1 · 2025-01-02T18:24:20Z

I've added a benchmark that encodes and decodes an AccessList as defined in:

nethermind/src/Nethermind/Nethermind.Core/Eip2930/AccessList.cs

Line 15 in e0c4a59

    
           public class AccessList : IEnumerable<(Address Address, AccessList.StorageKeysEnumerable StorageKeys)>

Results on my machine are the following:

| Method  | Mean     | Error   | StdDev  | Ratio |
|-------- |---------:|--------:|--------:|------:|
| Current | 343.9 us | 1.43 us | 1.34 us |  1.00 |
| Fluent  | 834.9 us | 2.34 us | 2.19 us |  2.43 |

There is room for a possible optimization: some records like Address have a known fixed byte size which we can leverage to avoid having to copy bytes twice: once to figure out the length and the other to actually copy the bytes.

- Nice speedup

emlautarom1 · 2025-01-02T18:47:08Z

Replacing Marshal.SizeOf<T>() with sizeof(T) and some unsafe annotations gives quite the boost at no cost:

| Method  | Mean     | Error   | StdDev  | Ratio | RatioSD |
|-------- |---------:|--------:|--------:|------:|--------:|
| Current | 359.8 us | 5.03 us | 4.70 us |  1.00 |    0.02 |
| Fluent  | 626.2 us | 2.90 us | 2.42 us |  1.74 |    0.02 |

…ature/declarative-rlp

emlautarom1 · 2025-01-02T19:15:31Z

src/Nethermind/Nethermind.Serialization.FluentRlp/RlpWriter.cs

+        var size = sizeof(T);
+        Span<byte> bigEndian = stackalloc byte[size];
+        value.WriteBigEndian(bigEndian);
+        bigEndian = bigEndian.TrimStart((byte)0);


TrimStart does not seem to be heavily optimized. There might be something better that we can use, specially considering that we're removing leading zeros.

Looks like .net doesn't use SIMD for it!
We could write Vector based way to find start index.
@benaadams

Scooletz · 2025-01-03T12:55:34Z

Replacing Marshal.SizeOf<T>() with sizeof(T) and some unsafe annotations gives quite the boost at no cost:

| Method  | Mean     | Error   | StdDev  | Ratio | RatioSD |
|-------- |---------:|--------:|--------:|------:|--------:|
| Current | 359.8 us | 5.03 us | 4.70 us |  1.00 |    0.02 |
| Fluent  | 626.2 us | 2.90 us | 2.42 us |  1.74 |    0.02 |

2x slower. Quite a lot. Can you add the ASM diagnoser and memory diagnoser? Would be nice to compare it more.

emlautarom1 · 2025-01-03T15:32:44Z

After running some benchmarks I found that UInt256 was getting boxed due to the usage of interface default bodies. Fixing that issue improves performance and drastically reduces memory allocation (added [MemoryDiagnoser] as requested by @Scooletz):

| Method  | Mean     | Error   | StdDev  | Ratio | Gen0     | Gen1     | Gen2     | Allocated  | Alloc Ratio |
|-------- |---------:|--------:|--------:|------:|---------:|---------:|---------:|-----------:|------------:|
| Current | 359.5 us | 0.98 us | 0.86 us |  1.00 | 166.5039 | 166.5039 | 166.5039 | 1033.87 KB |        1.00 |
| Fluent  | 530.1 us | 4.67 us | 4.14 us |  1.47 |  51.7578 |  51.7578 |  51.7578 |   617.5 KB |        0.60 |

Scooletz · 2025-01-03T16:19:47Z

src/Nethermind/Nethermind.Serialization.Rlp.Benchmark/Program.cs

+        var decoder = Eip2930.AccessListDecoder.Instance;
+
+        var length = decoder.GetLength(_current, RlpBehaviors.None);
+        var stream = new RlpStream(length);


This line is responsible for all the allocations as it allocates a new array underneath

nethermind/src/Nethermind/Nethermind.Serialization.Rlp/RlpStream.cs

Line 44 in 4e44559

_data = new byte[length];

Is this the case that we want to benchmark or rather it should be a reused RlpStream here?

To be comparable with the FluentRlp approach we should allocate a new buffer. Since both will allocate the same buffer size then it should not matter.

I see. If they allocate the same buffer, what makes the current allocate over 400kb more then? Is it the different return type Nethermind.Core.Eip2930.AccessList vs AccessList in the new one or something else? With 400kb more the current will be greatly penalized.

Interestingly they're not allocating the same buffer size: the fluent approach uses a buffer of 170850 bytes while the current one uses 172845. That does not account for the 400kb you mention though. Rider's profiler is not giving me anything useful so I'm kind of stuck now.

Maybe we should add other objects (ex. LogEntry, BlockInfo, etc.) to get more accurate benchmarks.

you can use NettyRlpStream that uses arena memory

- Return value is now `ReadOnlyMemory<byte>` - Add overloads for reading `ReadOnlyMemory<byte>` - Add `FluentAssertions` extensions

…ature/declarative-rlp # Conflicts: # src/Nethermind/Nethermind.Serialization.FluentRlp/Rlp.cs

…ature/declarative-rlp

LukaszRozmej · 2025-01-07T11:45:04Z

src/Nethermind/Nethermind.Serialization.FluentRlp/Rlp.cs

+    /// <param name="capacity">The capacity of the underlying buffer.</param>
+    public FixedArrayBufferWriter(int capacity)
+    {
+        _buffer = new T[capacity];


By default I think we should go with a plain array to match the ArrayBufferWriter behavior. At the end of the day, the RLP static class is like a "safe default" API.

If we want more control over the buffers we use we can write a custom IBufferWriter as you suggested earlier while using RlpReader and RlpWriter directly.

emlautarom1 added 30 commits December 19, 2024 12:56

Initial Rlp + Writer

d3a3232

Rename Sequence -> List

9a90ccb

Initial RlpReader

30dda91

Initial ReadList

d3d83f9

Multiple ReadList

65f0891

- Test for Set Theoretical Representation

Restructure into separate files

b4656cd

- Extend instances

Use Rlp.Read API

a63b64f

Add HeterogeneousList test

6735d29

Add overload for Action

74cf2fd

Test UnknownLengthList

2ef4c4d

Rename converter

b3dddcb

Use custom Exception

8b19e01

Add Action overload

2c67165

Test for invalid readings

52e1f7b

Support long lists (+55 bytes)

7262407

Implement interface on ReadOnlySpanConverter

e4c2e63

Make Rlp[Reader|Writer] symmetric

35b9203

- Good things happen to those who respect symmetry.

Support ref struct

e0ec1ce

Reorder tests

d3a2e5b

Annotate as scoped

67a69df

- Code works, but compiler complains

Initial Choice implementation

dc3798b

Cleanup test

ffa62f4

Restructure internals

e626fc9

- Primitives are integers, byte sequences, and lists - No need for `IntRlpConverter` (covered by primitives)

Test for deep Choice (backtracking)

6f3efbc

Move IRlpConverter

2d2eeda

Demo user-defined record support

67abc2b

Remove versions

8f4b02a

Use ref struct over Interface

9db2ebb

- Replace virtual calls with conditional and inner state flag - Refactor call sites

Consistent error

2223845

Split tests from library

51fd038

- Rename to `FastRlp` to avoid conflicts

emlautarom1 added 3 commits January 2, 2025 13:27

Include user using statements

40cfd25

Fix nested Generics support

f419316

Initial benchmark

d3592d8

Prefer sizeof(T) over Marshal.SizeOf<T>

a52d923

- Nice speedup

emlautarom1 and others added 5 commits January 2, 2025 15:47

Formatting

606a260

Merge branch 'master' into feature/declarative-rlp

abf143d

File encoding

2a9bee7

Merge remote-tracking branch 'origin/feature/declarative-rlp' into fe…

d0dfb12

…ature/declarative-rlp

Remove unused import

bcaa83f

emlautarom1 commented Jan 2, 2025

View reviewed changes

emlautarom1 requested review from Scooletz and LukaszRozmej and removed request for Scooletz January 2, 2025 19:17

emlautarom1 added 2 commits January 3, 2025 12:43

Avoid boxing due to interface default bodies

783acc6

Add MemoryDiagnoser

6dc21f4

Scooletz reviewed Jan 3, 2025

View reviewed changes

emlautarom1 force-pushed the feature/declarative-rlp branch from dcd6d85 to 6dc21f4 Compare January 6, 2025 18:35

emlautarom1 added 5 commits January 6, 2025 15:38

Use IBufferWriter instead of byte[]

1c8f472

- Return value is now `ReadOnlyMemory<byte>` - Add overloads for reading `ReadOnlyMemory<byte>` - Add `FluentAssertions` extensions

Use IBufferWriter instead of byte[]

b2e5951

Merge remote-tracking branch 'origin/feature/declarative-rlp' into fe…

9622a7b

…ature/declarative-rlp # Conflicts: # src/Nethermind/Nethermind.Serialization.FluentRlp/Rlp.cs

Merge remote-tracking branch 'origin/feature/declarative-rlp' into fe…

11bf0c7

…ature/declarative-rlp # Conflicts: # src/Nethermind/Nethermind.Serialization.FluentRlp/Rlp.cs

Merge remote-tracking branch 'origin/feature/declarative-rlp' into fe…

92bc902

…ature/declarative-rlp

LukaszRozmej reviewed Jan 7, 2025

View reviewed changes

emlautarom1 added 3 commits January 10, 2025 14:11

Add support for Optional

7e03ac3

Use sizeof instead of Marshal.SizeOf<T> in RlpReader

04ac6b2

More tests for Optional

3250b15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Declarative RLP Encoding/Decoding #7975

Declarative RLP Encoding/Decoding #7975

emlautarom1 commented Dec 26, 2024 •

edited

Loading

emlautarom1 commented Jan 2, 2025

emlautarom1 commented Jan 2, 2025

emlautarom1 Jan 2, 2025

LukaszRozmej Jan 6, 2025

Scooletz commented Jan 3, 2025

emlautarom1 commented Jan 3, 2025

Scooletz Jan 3, 2025

emlautarom1 Jan 3, 2025

Scooletz Jan 4, 2025 •

edited

Loading

emlautarom1 Jan 6, 2025

LukaszRozmej Jan 7, 2025

LukaszRozmej Jan 7, 2025

emlautarom1 Jan 7, 2025

	/// <summary>
	/// We pay a high code quality tax for the performance optimization on RLP.
	/// Adding more RLP decoders is costly (time wise) but the path taken saves a lot of allocations and GC.
	/// Shall we consider code generation for this? We could potentially generate IL from attributes for each
	/// RLP serializable item and keep it as a compiled call available at runtime.
	/// It would be slightly slower but still much faster than what we would get from using dynamic serializers.
	/// </summary>

	/// <summary>
	/// We pay a big copy-paste tax to maintain ValueDecoders but we believe that the amount of allocations saved
	/// make it worth it. To be reviewed periodically.
	/// Question to Lukasz here -> would it be fine to always use ValueDecoderContext only?
	/// I believe it cannot be done for the network items decoding and is only relevant for the DB loads.
	/// </summary>

Declarative RLP Encoding/Decoding #7975

Are you sure you want to change the base?

Declarative RLP Encoding/Decoding #7975

Conversation

emlautarom1 commented Dec 26, 2024 • edited Loading

Changes

Types of changes

What types of changes does your code introduce?

Testing

Requires testing

If yes, did you write tests?

Notes on testing

Documentation

Requires documentation update

Requires explanation in Release Notes

Remarks

emlautarom1 commented Jan 2, 2025

emlautarom1 commented Jan 2, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Scooletz commented Jan 3, 2025

emlautarom1 commented Jan 3, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Scooletz Jan 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

emlautarom1 commented Dec 26, 2024 •

edited

Loading

Scooletz Jan 4, 2025 •

edited

Loading