Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix perf regression in System.Net.NetworkInformation.PhysicalAddress #40571

Closed
wants to merge 2 commits into from
Closed

Fix perf regression in System.Net.NetworkInformation.PhysicalAddress #40571

wants to merge 2 commits into from

Conversation

tkp1n
Copy link
Contributor

@tkp1n tkp1n commented Aug 8, 2020

Closes #39720 by:

Method Job Toolchain Mean Error StdDev Ratio RatioSD
PAShort Job-CSJLAA PR 11.90 ns 0.215 ns 0.191 ns 0.84 0.02
PAShort Job-RPOJLS master 14.17 ns 0.302 ns 0.335 ns 1.00 0.00

/cc @danmosemsft @DrewScoggins

@Dotnet-GitSync-Bot
Copy link
Collaborator

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

@danmoseley danmoseley added this to the 5.0.0 milestone Aug 8, 2020
@ghost
Copy link

ghost commented Aug 8, 2020

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

@@ -83,6 +83,7 @@ public static void ToCharsBuffer(byte value, Span<char> buffer, int startingInde
buffer[startingIndex] = (char)(packedResult >> 8);
}

[MethodImpl(MethodImplOptions.AggressiveInlining)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this not getting inlined already? I wonder why

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Methods with loops are generally too large to inline by default.

To confirm, you can try:

  • BDN's inlining diagnoser
  • enable and look at the jit inlining events under perfview
  • Use a checked jit and set COMPlus_JitPrintInlinedMethods=<class_name>.<method name>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this not getting inlined already? I wonder why

It is considered unprofitable... (as per BDN inlining diagnoser)

--------------------
Inliner: System.HexConverter+<>c.<ToString>b__4_0 - instance void  (value class System.Span`1<wchar>,value class System.ValueTuple`3<int,int32,value class Casing>)
Inlinee: System.HexConverter.EncodeToUtf16 - void  (value class System.ReadOnlySpan`1<unsigned int8>,value class System.Span`1<wchar>,value class Casing)
Fail Reason: unprofitable inline
--------------------

@danmoseley
Copy link
Member

@tkp1n thanks for this!

Copy link
Member

@danmoseley danmoseley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems ok to me. @dotnet/ncl ?

@@ -90,7 +90,7 @@ public override bool Equals(object? comparand)

public override string ToString()
{
return Convert.ToHexString(_address.AsSpan());
return HexConverter.ToString(_address.AsSpan(), HexConverter.Casing.Upper);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's concerning if the brand new public API is insufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not concerted as we are talking about nanoseconds here.
A public API with input validation will never be a match for an internal method that gets straight to business. Especially with such short input values (16 bytes).

There is also #39702 to improve things (perf). The initial PR adding the public API was kept simple by design (to get it in for .NET 5).

@stephentoub
Copy link
Member

stephentoub commented Aug 9, 2020

Is this addressing a regression from 3.1 or just intra-5.0?

While I appreciate the effort here, I'm inclined to just accept the small regression and close this, especially if it's no worse than 3.1. The aggressive inlining will affect all call sites of that internal helper, this internal helper wasn't otherwise being used by this assembly and thus will add to places it's included in shipped binaries, and I never like using internal instead of public APIs when there's a public API that is meant to suffice. On top of that, I'm skeptical there are places where the cited 3ns in PhysicalAddress.ToString will make a real difference (but I'd like to better understand such scenarios if there is one).

Thanks.

@geoffkizer
Copy link
Contributor

If we are really concerned about the perf here, then it seems like we should just take the perf PR you mentioned (#39702), which would benefit everyone using this new API.

@tkp1n
Copy link
Contributor Author

tkp1n commented Aug 9, 2020

There are actually two regressions if we look at the entire time window since 3.1:

  1. With 3.1 PhysicalAddress.ToString had custom hex encoding logic. This was replaced by the shared but slower branch-free encoding method in HexConverter.
  2. The introduction of the public API caused a method to longer be inlined and added some overhead due to input validation.
Method Job Runtime Toolchain Mean Error StdDev Ratio RatioSD
PAShort 3.1 .NET Core 3.1 Default 9.290 ns 0.2043 ns 0.2187 ns 0.65 0.02
PAShort Job-YOCJAS .NET Core 5.0 PR 12.163 ns 0.1314 ns 0.1229 ns 0.86 0.01
PAShort Job-MOXHEH .NET Core 5.0 master 14.203 ns 0.1427 ns 0.1265 ns 1.00 0.00

As the combined regression since 3.1 is quite substantial I would suggest the following:

  1. We add the AggressiveInlining with this PR to restore how things were before Add methods to convert between hexadecimal strings and bytes #37546 but stick to using the public API (accepting its slight overhead).
  2. Focus on getting improvements (either the vectorization proposed in Add vectorized implementation of hex encoding/decoding #39702 or just plain faster scalar code) made to HexConverter to bring faster perf for every caller of the internal and public APIs.

@stephentoub
Copy link
Member

Is the regression only on "PAShort"? That's a one-byte PhysicalAddress. Where does the throughput of using PhysicalAddress.ToString with a one-byte address show up as meaningful?

@tkp1n
Copy link
Contributor Author

tkp1n commented Aug 9, 2020

I would argue that neither PAShort nor PAShort are particularly relevant in the real world, assuming the typical usage would be 6-byte MAC addresses.
PALong seems to be bound by GC with high jitter on my machine and no regression compared to 3.1 outside of margin of error.

@stephentoub
Copy link
Member

Is there an example you believe is real-world, does that exhibit a meaningful regression, and can we reasonably construct a use-case where such a regression matters? That's what I'm struggling with. In what situation is someone using PhysicalAddress.ToString on a hot path, and in such a use case, do we actually see a measurable regression?

@tkp1n
Copy link
Contributor Author

tkp1n commented Aug 10, 2020

I can't come up with a reason why anyone would use this on the hot path... but then again, I assume this benchmark was added to dotnet/performance for a reason.
Should it end up on the hot path of some application, I would assume the inherent that GC pressure caused by the created strings is worse issue than this 3ns regression.

@stephentoub
Copy link
Member

stephentoub commented Aug 10, 2020

but then again, I assume this benchmark was added to dotnet/performance for a reason

I believe I added them since some work was being done to improve the performance by a community member in a previous release, and perf tests are needed to help validate and lock in such efforts. But just because a small regression in a perf test is identified, doesn't mean it needs to be fixed... it just means the test alerted us to something, at which point we can decide whether it's meaningful. My opinion on this would be different if, for example, the regression was 10x rather than 3ns. Until we can come up with a real scenario where the 3ns matters, I'd prefer to keep the code simple, using the same public APIs we ask everyone else to use, keeping assembly size smaller, and not risking regressions in the other places that consume the method being attributed as aggressive inlining.

@danmoseley
Copy link
Member

OK, seems we have consensus to leave the code. @tkp1n apologies for leading you down this dead end.

@danmoseley danmoseley closed this Aug 10, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Perf -19%] System.Net.NetworkInformation.Tests.PhysicalAddressTests.PAShort
6 participants