-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Miscellaneous performance fixes for Crossgen2 #758
Miscellaneous performance fixes for Crossgen2 #758
Conversation
Create comparison functions for Enum (Enum.CompareTo boxes since it takes an Object). In ModuleToken, use the existing MetadataReader instead of making a new one each time. Use Dictionary instead of ImmutableDictionary in CoreRTNameMangler to reduce allocations. In CopiedFieldRvaNode, only read as much of the section as we need to copy.
cc @dotnet/crossgen-contrib |
@@ -56,7 +56,7 @@ public override void AppendMangledName(NameMangler nameMangler, Utf8StringBuilde | |||
public override int CompareToImpl(ISortableNode other, CompilerComparer comparer) | |||
{ | |||
FieldFixupSignature otherNode = (FieldFixupSignature)other; | |||
int result = _fixupKind.CompareTo(otherNode._fixupKind); | |||
int result = _fixupKind.CompareEnum(otherNode._fixupKind); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int result = _fixupKind.CompareEnum(otherNode._fixupKind); | |
int result = ((int)_fixupKind).CompareTo((int)otherNode._fixupKind); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just _fixupKind - otherNode._fixupKind
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to call CompareTo
. Just do the subtraction here directly as @huoyaoyuan suggests (and typecast if necessary).
Applies to all other places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the subtraction does not work to compare arbitrary integers. It only works if the integers are small. It is probably the case here, but then you need to think about all the time about whether you are dealing with small integers or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be an opportunity to improve the codegen for int.CompareTo()
. It's currently not branch-free. For instance, these two variants that works with arbitrary integers could be used instead of the branchy version:
Convert.ToInt32(a > b) - Convert.ToInt32(a < b)
should be something likexor eax, eax; xor ecx,ecx; cmp edi, esi; seg al; setl cl; sub eax, ecx
(Might also be able to use a conditional move, although it doesn't seem that necessary.)
namespace System | ||
{ | ||
/// <summary> | ||
/// Enum.CompareTo takes an Object, so using that requires boxing the enum we are comparing to (which is slow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think it makes sense to have these in the shared file.
Instead, I would just add casts to int to every place where this is used (like what I have suggested below).
string mangledName; | ||
if (_mangledTypeNames.TryGetValue(type, out mangledName)) | ||
return mangledName; | ||
lock (this) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we break down the locking hierarchy so that there's not a single giant lock for type names, method names, field names, and mangled strings, but instead there's a lock for each?
In general, I'm worried that we're probably not using this from multithreaded code at all right now, so this looks like an improvement, but the locks will become a bottleneck as soon as we start doing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using the lock free concurrent dictionaries here, to avoid locking while performing the lookup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MichalStrehovsky We are using this from multithreaded code at the moment. I tried using more granulated locking and it didn't seem to make a performance difference. @davidwrighton and I thought that, since there's no performance difference right now, it would be better to keep it in this form where it's clear to people reading it that there are no deadlocks.
@fadimounir That's an option, but then it gets a little hard to analyze because it's hard to reason about whether each of the GetOrAdd functions must only be run once and so has to be either locked or wrapped in a Lazy. This + the limited time left is why we didn't end up using that right now, but perhaps @davidwrighton can provide more direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because it's hard to reason about whether each of the GetOrAdd functions must only be run once
What do you mean? We can always do a .TryGetValue outside of a lock, and .TryAdd inside a lock (Dictionary
has the same APIs too so we can use it as well). Locking while retrieving a value is not ideal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see what you mean now. Yes I could change it to do that, but I'm not sure whether we even have contention there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using Dictionary you cannot use the TryGetValue apis outside of a lock. This worked before with ImmutableDictionary as that api allows lock-free reads. My expectation is that ConcurrentDictionary would be better in performance here than Dictionary, but I don't want to see us use that without seeing the performance impact in a profiler. I think we should check Anubhav's work in here as written, and explore finding even more performance in the future.
return (int)first - (int)second; | ||
} | ||
|
||
public static int CompareEnum(this ImportThunk.Kind first, ImportThunk.Kind second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the type is not in Common, code using the type should not be in Common.
+1 to Jan's advice.
lock (this) | ||
{ | ||
Utf8String mangledName; | ||
if (_unqualifiedMangledMethodNames.TryGetValue(method, out mangledName)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here... if we make _unqualifiedMangledMethodNames a lock-free dictionary, we can avoid locking while looking up. We only need locking while adding new entries
@@ -76,18 +75,20 @@ private unsafe byte[] GetRvaData(int targetPointerSize) | |||
int currentSize = field.FieldType.GetElementSize().AsInt; | |||
if (currentSize > size) | |||
{ | |||
if (currentSize > memBlock.Length) | |||
throw new BadImageFormatException(); | |||
|
|||
// We need to handle overlapping fields by reusing blobs based on the rva, and just update | |||
// the size and contents | |||
size = currentSize; | |||
result = new byte[AlignmentHelper.AlignUp(size, targetPointerSize)]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also move the array allocation outside of the loop. No need to allocate and resize now
The new perf numbers look pretty nice! Thank you! |
Remove EnumExtensions in favor of casting to int and using CompareTo. Move array allocation out of loop in CopiedFieldRvaNode.
2a33bb3
to
964ec9d
Compare
// Licensed to the .NET Foundation under one or more agreements. | ||
// The .NET Foundation licenses this file to you under the MIT license. | ||
// See the LICENSE file in the project root for more information. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see that you removed usage of these apis, but this file seems to still be here. Please remove this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh oops, yes.
@SrivastavaAnubhav, these changes are breaking determinism. See: https://dev.azure.com/dnceng/public/_build/results?buildId=454679&view=ms.vss-test-web.build-test-results-tab&runId=14512252&resultId=102657&paneView=attachments |
Overall, these changes speed up Crossgen2 compilation of CoreLib by ~34%. Most of the improvement comes from fewer/smaller allocations => need to do less GC.
Before changes:
After changes: