Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the multicore performance of mapping from an instruction pointer to a MethodDesc #79021

Merged
merged 36 commits into from
Mar 4, 2023

Conversation

davidwrighton
Copy link
Member

@davidwrighton davidwrighton commented Nov 30, 2022

Mapping from an instruction pointer to a MethodDesc is a critical component of the runtime used in many places, notably diagnostics, slow path delegate creation, and stackwalking.

This change provides a dramatic improvement in the performance of that logic through several techniques.

  1. The mapping from IP to range of interesting memory is done via a tree structure resembling that of a page table. The result of this is that in addition to reducing the locking logic, the cost of looking up in the presence of many different loaded assemblies will be reduced to a nearly constant time.
  2. That tree structure is configured so that when accessing memory which will never be freed in the lifetime of the process, the memory can be accessed without taking any locks whatsoever.
  3. The code map is enhanced to include not only the code generated by the JIT/R2R code, but also to include the fixup and stub precode stubs.
  4. In addition, performance improvement was done to improve the performance of slow path delegate creation in particular, by reordering which checks are done, and by writing a simplified signature parse routine for computing the number of arguments to a function.

Performance of this was tested in both the EH stackwalking scenario as well as the delegate slow path creation scenario, but the delegate logic is what will be most visible when this is checked in. (See PR #79471 for details of the additional changes necessary to take advantage of this work when doing EH) (#8393 describes the potential product impact of just improving the delegate slow path creation code)

For the delegate creation scenario the perf impact was measured to be quite substantial. These numbers are in ms to create a constant number of delegates on each thread used. Smaller is better. The test was run on a 6 core machine.

Threads Without fix With this PR
1 840 313
2 1977 316
3 3290 325
4 4259 344
5 5140 351
6 5872 374
7 6463 442
8 7046 499
9 7648 547
10 8241 627
11 8851 749
12 9595 773

Fixes #8393
Replaces the controversial logic in PR #68607

@EgorBo
Copy link
Member

EgorBo commented Nov 30, 2022

Nice! Does it make ExecutionManager::IsReadyToRun(PCODE) cheap? (I wanted to rely on it being cheap in some of my experiments)

@davidwrighton
Copy link
Member Author

@EgorBo, yes, I do expect that this will make IsReadyToRun very cheap. (Probably 5-6 memory loads from separate pages plus some math, so repeated calls should be extremely cheap, and random calls will be somewhat expensive, but not bad.) The implementation as written has a risk of entering a lock under some circumstances, but after sleeping on it, I think we could get rid of that for the IsReadyToRun logic, and be assured that IsReadyToRun is cheap. What do you think you can optimize?

@EgorBo
Copy link
Member

EgorBo commented Nov 30, 2022

What do you think you can optimize?

I'd like to sort of de-prioritize R2R'd code for tier1 promotion in favor of non R2R'd one.

Now that it is profileable, there are tweaks to be made
@davidwrighton
Copy link
Member Author

davidwrighton commented Dec 2, 2022

@EgorBo, fyi, ExecutionManager::IsReadyToRun is slower than ExecutionManager::FindReadyToRunModule. Also, I've found there may be a slight bug in the current implementation of the locking strategy for those 2 methods, so I'm addressing that as well. ExecutionManager::IsReadyToRun validates that the pointer is actually a pointer to real compiled code, and return false for thunks and other things from a ReadyToRun image. Its possible we may want to change the definition of IsReadyToRun to make more sense.

- Implement a fast path for determining that an instruction pointer is a FixupPrecode/StubPrecode. It turns out those are more likely, than actual jitted code (as we typically have wrapper stubs of some form at this point).
- Implement a fast path for computing the number of arguments to a method instead of using MetaSig.
- Remove fancy locking scheme from RangeSectionMap. The existing fancy locking scheme works fine.
- Fix bug in ExecutionManager::IsReadyToRun and ExecutionManager::FindReadyToRunModule where locks were not used
@davidwrighton davidwrighton changed the title Experiment with changing the implementation strategy for RangeListMap Fix the multicore performance of mapping from an instruction pointer to a MethodDesc Dec 9, 2022
@davidwrighton davidwrighton requested a review from jkotas December 10, 2022 00:09
src/coreclr/vm/codeman.h Outdated Show resolved Hide resolved
src/coreclr/vm/comdelegate.cpp Outdated Show resolved Hide resolved
src/coreclr/vm/codeman.h Outdated Show resolved Hide resolved
src/coreclr/vm/codeman.h Outdated Show resolved Hide resolved
src/coreclr/vm/codeman.h Outdated Show resolved Hide resolved
src/coreclr/vm/codeman.h Outdated Show resolved Hide resolved
src/coreclr/vm/codeman.h Outdated Show resolved Hide resolved
src/coreclr/vm/codeman.h Outdated Show resolved Hide resolved
src/coreclr/vm/codeman.h Show resolved Hide resolved
src/coreclr/vm/codeman.h Outdated Show resolved Hide resolved
Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo feedback. Nice work!

src/coreclr/vm/codeman.h Outdated Show resolved Hide resolved
@davidwrighton davidwrighton merged commit 766d8c1 into dotnet:main Mar 4, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Apr 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ExecutionManager::GetRangeSection performs a linear search over list of loaded assemblies
3 participants