Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscellaneous performance fixes for Crossgen2 #758

Merged
merged 5 commits into from
Dec 13, 2019

Conversation

SrivastavaAnubhav
Copy link
Contributor

Overall, these changes speed up Crossgen2 compilation of CoreLib by ~34%. Most of the improvement comes from fewer/smaller allocations => need to do less GC.

  • Create comparison functions for Enum (Enum.CompareTo boxes since it takes an Object)
  • In ModuleToken, use the existing MetadataReader instead of making a new one each time
  • Use Dictionary instead of ImmutableDictionary in CoreRTNameMangler to reduce allocations
  • In CopiedFieldRvaNode, only read as much of the section as we need to copy

Before changes:

  • Total average compilation time: 9522.11 ms
  • Phase breakdown (average):
    • Loading time: 682.91 ms
    • Graph processing time: 4189.59 ms
      • Added 211462 nodes to mark
      • Dependency analysis time: 622.58
      • Wall clock JIT time: 2988.49
      • Total JIT time: 23079.96 ms (sum of all threads)
      • 22680 methods JITed
    • Emitting time: 4646.94 ms

After changes:

  • Total average compilation time: 6245.38 ms
  • Phase breakdown (average):
    • Loading time: 697.13 ms
    • Graph processing time: 3670.02 ms
      • Added 211462 nodes to mark stack
      • Dependency analysis time: 631.47 ms
      • Wall clock JIT time: 2483.27 ms
      • Total JIT time: 19068.50 ms (sum of all threads)
      • 22680 methods JITed
    • Emitting time: 1875.50 ms

Create comparison functions for Enum (Enum.CompareTo boxes since it takes an Object).
In ModuleToken, use the existing MetadataReader instead of making a new one each time.
Use Dictionary instead of ImmutableDictionary in CoreRTNameMangler to reduce allocations.
In CopiedFieldRvaNode, only read as much of the section as we need to copy.
@SrivastavaAnubhav
Copy link
Contributor Author

cc @dotnet/crossgen-contrib

@@ -56,7 +56,7 @@ public override void AppendMangledName(NameMangler nameMangler, Utf8StringBuilde
public override int CompareToImpl(ISortableNode other, CompilerComparer comparer)
{
FieldFixupSignature otherNode = (FieldFixupSignature)other;
int result = _fixupKind.CompareTo(otherNode._fixupKind);
int result = _fixupKind.CompareEnum(otherNode._fixupKind);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int result = _fixupKind.CompareEnum(otherNode._fixupKind);
int result = ((int)_fixupKind).CompareTo((int)otherNode._fixupKind);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just _fixupKind - otherNode._fixupKind ?

Copy link
Contributor

@fadimounir fadimounir Dec 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to call CompareTo. Just do the subtraction here directly as @huoyaoyuan suggests (and typecast if necessary).

Applies to all other places

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the subtraction does not work to compare arbitrary integers. It only works if the integers are small. It is probably the case here, but then you need to think about all the time about whether you are dealing with small integers or not.

Copy link
Contributor

@lpereira lpereira Dec 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be an opportunity to improve the codegen for int.CompareTo(). It's currently not branch-free. For instance, these two variants that works with arbitrary integers could be used instead of the branchy version:

  • Convert.ToInt32(a > b) - Convert.ToInt32(a < b) should be something like xor eax, eax; xor ecx,ecx; cmp edi, esi; seg al; setl cl; sub eax, ecx

(Might also be able to use a conditional move, although it doesn't seem that necessary.)

namespace System
{
/// <summary>
/// Enum.CompareTo takes an Object, so using that requires boxing the enum we are comparing to (which is slow
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it makes sense to have these in the shared file.

Instead, I would just add casts to int to every place where this is used (like what I have suggested below).

string mangledName;
if (_mangledTypeNames.TryGetValue(type, out mangledName))
return mangledName;
lock (this)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we break down the locking hierarchy so that there's not a single giant lock for type names, method names, field names, and mangled strings, but instead there's a lock for each?

In general, I'm worried that we're probably not using this from multithreaded code at all right now, so this looks like an improvement, but the locks will become a bottleneck as soon as we start doing that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using the lock free concurrent dictionaries here, to avoid locking while performing the lookup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MichalStrehovsky We are using this from multithreaded code at the moment. I tried using more granulated locking and it didn't seem to make a performance difference. @davidwrighton and I thought that, since there's no performance difference right now, it would be better to keep it in this form where it's clear to people reading it that there are no deadlocks.

@fadimounir That's an option, but then it gets a little hard to analyze because it's hard to reason about whether each of the GetOrAdd functions must only be run once and so has to be either locked or wrapped in a Lazy. This + the limited time left is why we didn't end up using that right now, but perhaps @davidwrighton can provide more direction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it's hard to reason about whether each of the GetOrAdd functions must only be run once

What do you mean? We can always do a .TryGetValue outside of a lock, and .TryAdd inside a lock (Dictionary has the same APIs too so we can use it as well). Locking while retrieving a value is not ideal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see what you mean now. Yes I could change it to do that, but I'm not sure whether we even have contention there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using Dictionary you cannot use the TryGetValue apis outside of a lock. This worked before with ImmutableDictionary as that api allows lock-free reads. My expectation is that ConcurrentDictionary would be better in performance here than Dictionary, but I don't want to see us use that without seeing the performance impact in a profiler. I think we should check Anubhav's work in here as written, and explore finding even more performance in the future.

return (int)first - (int)second;
}

public static int CompareEnum(this ImportThunk.Kind first, ImportThunk.Kind second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the type is not in Common, code using the type should not be in Common.

+1 to Jan's advice.

lock (this)
{
Utf8String mangledName;
if (_unqualifiedMangledMethodNames.TryGetValue(method, out mangledName))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here... if we make _unqualifiedMangledMethodNames a lock-free dictionary, we can avoid locking while looking up. We only need locking while adding new entries

@@ -76,18 +75,20 @@ private unsafe byte[] GetRvaData(int targetPointerSize)
int currentSize = field.FieldType.GetElementSize().AsInt;
if (currentSize > size)
{
if (currentSize > memBlock.Length)
throw new BadImageFormatException();

// We need to handle overlapping fields by reusing blobs based on the rva, and just update
// the size and contents
size = currentSize;
result = new byte[AlignmentHelper.AlignUp(size, targetPointerSize)];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also move the array allocation outside of the loop. No need to allocate and resize now

@fadimounir
Copy link
Contributor

The new perf numbers look pretty nice! Thank you!

Remove EnumExtensions in favor of casting to int and using CompareTo.
Move array allocation out of loop in CopiedFieldRvaNode.
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that you removed usage of these apis, but this file seems to still be here. Please remove this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh oops, yes.

@fadimounir
Copy link
Contributor

@SrivastavaAnubhav, these changes are breaking determinism. See: https://dev.azure.com/dnceng/public/_build/results?buildId=454679&view=ms.vss-test-web.build-test-results-tab&runId=14512252&resultId=102657&paneView=attachments

@fadimounir fadimounir merged commit d850517 into dotnet:master Dec 13, 2019
@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants