Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize message body encoding #8085

Merged
merged 3 commits into from
Oct 29, 2022
Merged

Optimize message body encoding #8085

merged 3 commits into from
Oct 29, 2022

Conversation

pentp
Copy link
Contributor

@pentp pentp commented Oct 27, 2022

  • Don't encode PooledResponse<T> wrapper types for normal response messages, only encode T.
  • Fix some more expected-type handling corner cases.
  • Some small Writer and Field/Tag optimizations.
Microsoft Reviewers: Open in CodeFlow

@ReubenBond
Copy link
Member

Thanks for all the PRs! Do you mind annotating some of the changes in GitHub using review comments to include some reasoning, etc behind the changes and adding benchmark results, if you have any?

src/Orleans.Serialization/Buffers/Writer.cs Outdated Show resolved Hide resolved
_currentSpan[_bufferPos++] = value;
if ((uint)_bufferPos < (uint)_currentSpan.Length)
{
Unsafe.AddByteOffset(ref MemoryMarshal.GetReference(_currentSpan), (uint)_bufferPos) = value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not necessary, best codegen is when written as

public void WriteByte(byte value)
{
    int bufferPos = _bufferPos;
    if ((uint)bufferPos < (uint)_currentSpan.Length)
    {
        _currentSpan[bufferPos] = value;
        _bufferPos = bufferPos + 1;
    }
    else
    {
        WriteByteSlow(value);
    }
}
  • there's no bound check emitted
  • _bufferPos is read and written only once (memory accesses), while with _bufferPos++ it's 2 reads + 1 write (for the inc)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your input is appreciated, @gfoidl

Copy link
Contributor Author

@pentp pentp Oct 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still going to be a bounds check unless _currentSpan is also first copied to a local variable, but that introduces additional overhead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious: on what target platform did you see this? Is this after inlining?
Asking because I checked the codegen and didn't see the bound check -- so the JIT should be taught to eliminate that one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested on x64 and after inlining.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well this seems to be dotnet/runtime#72004 (actually opened by myself, and I forgot about this 😉).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should link to that issue and this comment from code so that it can be re-assessed later

@@ -415,7 +431,7 @@ public void WriteVarUInt32(uint value)
// Since this method writes a ulong worth of bytes unconditionally, ensure that there is sufficient space.
EnsureContiguous(sizeof(ulong));

var pos = _bufferPos;
nuint pos = (uint)_bufferPos;
var neededBytes = BitOperations.Log2(value) / 7;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var neededBytes = BitOperations.Log2(value) / 7;
var neededBytes = (int)((uint)BitOperations.Log2(value) / 7);

More ugly code, but shorter machine code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work. I wonder if this results in measurable differences.
There are serialization microbenchmarks in the test/Benchmarks directory which can be invoked with
dotnet run -c Release -f net7.0 -- suite --filter *erializeBenchmark to run the serialization comparison benchmarks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did two runs of the benchmarks (current main, then changed to the uint-division). The runs aren't comparable 1:1, but judging based on Ratio-column it improved a bit, which I expected because of less code and the mul instead of imul.

Current main branch

|            Method |       Mean |    Error |   StdDev | Ratio | RatioSD |   Gen0 | Payload | Allocated | Alloc Ratio |
|------------------ |-----------:|---------:|---------:|------:|--------:|-------:|--------:|----------:|------------:|
|           Orleans |   107.1 ns |  2.12 ns |  3.17 ns |  1.00 |    0.00 |      - |    20 B |         - |          NA |
|          Utf8Json |   169.6 ns |  3.23 ns |  7.50 ns |  1.59 |    0.09 |      - |   154 B |         - |          NA |
|    SystemTextJson |   571.5 ns |  5.31 ns |  4.97 ns |  5.29 |    0.14 | 0.0019 |   154 B |      56 B |          NA |
| MessagePackCSharp |   115.0 ns |  0.84 ns |  0.71 ns |  1.06 |    0.03 | 0.0014 |    10 B |      40 B |          NA |
|       ProtobufNet |   283.3 ns |  3.69 ns |  2.88 ns |  2.61 |    0.06 |      - |    18 B |         - |          NA |
|          Hyperion |   165.0 ns |  3.33 ns |  7.66 ns |  1.53 |    0.06 | 0.0019 |    39 B |      56 B |          NA |
|    NewtonsoftJson | 1,700.0 ns | 33.79 ns | 78.31 ns | 16.21 |    0.87 | 0.0763 |   154 B |    2080 B |          NA |
|          SpanJson |   192.5 ns |  2.09 ns |  2.64 ns |  1.79 |    0.05 | 0.0067 |   154 B |     184 B |          NA |

With unsigned divsion

|            Method |        Mean |     Error |    StdDev | Ratio | RatioSD |   Gen0 | Payload | Allocated | Alloc Ratio |
|------------------ |------------:|----------:|----------:|------:|--------:|-------:|--------:|----------:|------------:|
|           Orleans |    95.51 ns |  0.351 ns |  0.311 ns |  1.00 |    0.00 |      - |    20 B |         - |          NA |
|          Utf8Json |   156.21 ns |  1.141 ns |  0.953 ns |  1.64 |    0.01 |      - |   154 B |         - |          NA |
|    SystemTextJson |   537.95 ns |  4.033 ns |  3.575 ns |  5.63 |    0.04 | 0.0019 |   154 B |      56 B |          NA |
| MessagePackCSharp |   118.38 ns |  2.122 ns |  1.985 ns |  1.24 |    0.02 | 0.0014 |    10 B |      40 B |          NA |
|       ProtobufNet |   273.29 ns |  1.479 ns |  1.311 ns |  2.86 |    0.02 |      - |    18 B |         - |          NA |
|          Hyperion |   152.09 ns |  1.320 ns |  1.170 ns |  1.59 |    0.01 | 0.0019 |    39 B |      56 B |          NA |
|    NewtonsoftJson | 1,594.22 ns | 15.754 ns | 14.736 ns | 16.70 |    0.17 | 0.0763 |   154 B |    2080 B |          NA |
|          SpanJson |   189.82 ns |  1.633 ns |  1.528 ns |  1.99 |    0.02 | 0.0067 |   154 B |     184 B |          NA |

Full results:
Benchmarks.zip

@@ -415,7 +431,7 @@ public void WriteVarUInt32(uint value)
// Since this method writes a ulong worth of bytes unconditionally, ensure that there is sufficient space.
EnsureContiguous(sizeof(ulong));

var pos = _bufferPos;
nuint pos = (uint)_bufferPos;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These casts get rid of some signed extension instructions.

@ReubenBond
Copy link
Member

(closed by mistake)

@ReubenBond ReubenBond merged commit df82177 into dotnet:main Oct 29, 2022
@pentp pentp deleted the message-coding branch October 30, 2022 14:12
ReubenBond added a commit to ReubenBond/orleans that referenced this pull request Nov 1, 2022
ReubenBond added a commit to ReubenBond/orleans that referenced this pull request Nov 1, 2022
@github-actions github-actions bot locked and limited conversation to collaborators Dec 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants