-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize message body encoding #8085
Conversation
Thanks for all the PRs! Do you mind annotating some of the changes in GitHub using review comments to include some reasoning, etc behind the changes and adding benchmark results, if you have any? |
_currentSpan[_bufferPos++] = value; | ||
if ((uint)_bufferPos < (uint)_currentSpan.Length) | ||
{ | ||
Unsafe.AddByteOffset(ref MemoryMarshal.GetReference(_currentSpan), (uint)_bufferPos) = value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessary, best codegen is when written as
public void WriteByte(byte value)
{
int bufferPos = _bufferPos;
if ((uint)bufferPos < (uint)_currentSpan.Length)
{
_currentSpan[bufferPos] = value;
_bufferPos = bufferPos + 1;
}
else
{
WriteByteSlow(value);
}
}
- there's no bound check emitted
_bufferPos
is read and written only once (memory accesses), while with_bufferPos++
it's 2 reads + 1 write (for theinc
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your input is appreciated, @gfoidl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's still going to be a bounds check unless _currentSpan
is also first copied to a local variable, but that introduces additional overhead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious: on what target platform did you see this? Is this after inlining?
Asking because I checked the codegen and didn't see the bound check -- so the JIT should be taught to eliminate that one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested on x64 and after inlining.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well this seems to be dotnet/runtime#72004 (actually opened by myself, and I forgot about this 😉).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should link to that issue and this comment from code so that it can be re-assessed later
@@ -415,7 +431,7 @@ public void WriteVarUInt32(uint value) | |||
// Since this method writes a ulong worth of bytes unconditionally, ensure that there is sufficient space. | |||
EnsureContiguous(sizeof(ulong)); | |||
|
|||
var pos = _bufferPos; | |||
nuint pos = (uint)_bufferPos; | |||
var neededBytes = BitOperations.Log2(value) / 7; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var neededBytes = BitOperations.Log2(value) / 7; | |
var neededBytes = (int)((uint)BitOperations.Log2(value) / 7); |
More ugly code, but shorter machine code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. I wonder if this results in measurable differences.
There are serialization microbenchmarks in the test/Benchmarks directory which can be invoked with
dotnet run -c Release -f net7.0 -- suite --filter *erializeBenchmark
to run the serialization comparison benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did two runs of the benchmarks (current main, then changed to the uint-division). The runs aren't comparable 1:1, but judging based on Ratio-column it improved a bit, which I expected because of less code and the mul instead of imul.
Current main branch
| Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Payload | Allocated | Alloc Ratio |
|------------------ |-----------:|---------:|---------:|------:|--------:|-------:|--------:|----------:|------------:|
| Orleans | 107.1 ns | 2.12 ns | 3.17 ns | 1.00 | 0.00 | - | 20 B | - | NA |
| Utf8Json | 169.6 ns | 3.23 ns | 7.50 ns | 1.59 | 0.09 | - | 154 B | - | NA |
| SystemTextJson | 571.5 ns | 5.31 ns | 4.97 ns | 5.29 | 0.14 | 0.0019 | 154 B | 56 B | NA |
| MessagePackCSharp | 115.0 ns | 0.84 ns | 0.71 ns | 1.06 | 0.03 | 0.0014 | 10 B | 40 B | NA |
| ProtobufNet | 283.3 ns | 3.69 ns | 2.88 ns | 2.61 | 0.06 | - | 18 B | - | NA |
| Hyperion | 165.0 ns | 3.33 ns | 7.66 ns | 1.53 | 0.06 | 0.0019 | 39 B | 56 B | NA |
| NewtonsoftJson | 1,700.0 ns | 33.79 ns | 78.31 ns | 16.21 | 0.87 | 0.0763 | 154 B | 2080 B | NA |
| SpanJson | 192.5 ns | 2.09 ns | 2.64 ns | 1.79 | 0.05 | 0.0067 | 154 B | 184 B | NA |
With unsigned divsion
| Method | Mean | Error | StdDev | Ratio | RatioSD | Gen0 | Payload | Allocated | Alloc Ratio |
|------------------ |------------:|----------:|----------:|------:|--------:|-------:|--------:|----------:|------------:|
| Orleans | 95.51 ns | 0.351 ns | 0.311 ns | 1.00 | 0.00 | - | 20 B | - | NA |
| Utf8Json | 156.21 ns | 1.141 ns | 0.953 ns | 1.64 | 0.01 | - | 154 B | - | NA |
| SystemTextJson | 537.95 ns | 4.033 ns | 3.575 ns | 5.63 | 0.04 | 0.0019 | 154 B | 56 B | NA |
| MessagePackCSharp | 118.38 ns | 2.122 ns | 1.985 ns | 1.24 | 0.02 | 0.0014 | 10 B | 40 B | NA |
| ProtobufNet | 273.29 ns | 1.479 ns | 1.311 ns | 2.86 | 0.02 | - | 18 B | - | NA |
| Hyperion | 152.09 ns | 1.320 ns | 1.170 ns | 1.59 | 0.01 | 0.0019 | 39 B | 56 B | NA |
| NewtonsoftJson | 1,594.22 ns | 15.754 ns | 14.736 ns | 16.70 | 0.17 | 0.0763 | 154 B | 2080 B | NA |
| SpanJson | 189.82 ns | 1.633 ns | 1.528 ns | 1.99 | 0.02 | 0.0067 | 154 B | 184 B | NA |
Full results:
Benchmarks.zip
@@ -415,7 +431,7 @@ public void WriteVarUInt32(uint value) | |||
// Since this method writes a ulong worth of bytes unconditionally, ensure that there is sufficient space. | |||
EnsureContiguous(sizeof(ulong)); | |||
|
|||
var pos = _bufferPos; | |||
nuint pos = (uint)_bufferPos; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These casts get rid of some signed extension instructions.
(closed by mistake) |
PooledResponse<T>
wrapper types for normal response messages, only encodeT
.Writer
andField/Tag
optimizations.Microsoft Reviewers: Open in CodeFlow