Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search native code in all R2R images in version bubble #68607

Closed
wants to merge 3 commits into from

Conversation

y-yamshchikov
Copy link
Contributor

@y-yamshchikov y-yamshchikov commented Apr 27, 2022

This PR fixes part of #44948 and fixes #46160.

This code simply traverses through assemblies in the application
domain. For each assembly (module) it realizes is it Ready To Run and is
it in the same bubble with (does it deliberately bubbling the) module
from which generic function originates. If so, it makes request for code
is Ready To Run and hopes there is some in the module. If the request
succeeds it proceeds with found pointer to the bare native code.

Generic instantiations like System.Collections.Concurrent.ConcurrentDictionary`2[System.Int32,System.__Canon] contained within same version bubble are either a bug in crossgen2 or should be taken care of by having PGO data.

Now such methods use their Ready To Run code.

We have got significant performance gain on startup: 7% average on our representative set on Tizen.

This PR worked out notices in PRs below:
#47269
about linear search through the set of RangeSections. In this new PR we propose storing of RangeSections in sorted array (with number of optimizations inspired by prior linked list based solution).
#57277

about lockfree requirements

We have extensively tested this PR on armel/Tizen platform so in this case we are confident in reliability and profitability of the solution.

We have also implemented high coverage unit-testing model for RangeSection optimization and used it in single-processor single-thread load, single-processor multi-thread load and multi-processor multi-thread load with all combination of simultaneous add and/or read and/or delete (with and without two simultaneous readers and/or two simultaneous adders and/or two simultaneous deleters). Order of adding has had switches between ascending, descending and random modes. All these switches have been tested under debug and release setups of the algorithm. Strictly, we have performed full-factor experiment under selected options in the unit-test. The test can be accessed by the following link: https://github.com/y-yamshchikov/rs-model. The unit-test had been performed for linux-x64.
Main concept of the test is 1-to-1 copying of the code of RangeSection storage with decoupling from the CoreCLR by the least significant line (as far as it is possible in the case it is not in a separate file), so we can be sure an error-prone surface is smallest.
Of course, we have performed priority1 tests and performance tests after all.

In this PR (vs #57277) we have implemented new synchronization mechanism for RangeSection storage. It grants lock-free (but not wait-free) access and Log(n) complexity. Primitives have the same names: ReaderLockHolder and WriterLockHolder, but theirs behavior and encapsulation slightly differ.

Dear colleagues @jkotas @alpencolt, please take a look.

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Apr 27, 2022
@y-yamshchikov y-yamshchikov force-pushed the opt-r2r-rs branch 3 times, most recently from 03e04a3 to 5056171 Compare May 4, 2022 11:18
@danmoseley
Copy link
Member

@davidwrighton

Copy link
Member

@davidwrighton davidwrighton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the idea of how this is supposed to work, but I do not believe the locking is not correct. In any case, if the locking is correct, the set of comments describing the locking has not been updated, and is certainly incorrect. @jkotas do you agree?

@@ -1927,13 +1927,6 @@ HRESULT GetFunctionInfoInternal(LPCBYTE ip, EECodeInfo * pCodeInfo)

if (ShouldAvoidHostCalls())
{
ExecutionManager::ReaderLockHolder rlh(NoHostCalls);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ReaderLockHolder logic needs to stay in place. GetFunctionInfoInternal from the profiler api may be called while other threads are suspended via OS thread suspension apis, and may be used when alloc/free/new/delete are not permitted to be used. However, the technique you're using for locking looks like it will always successfully acquire, so you don't need to restore the Acquired() api, and call below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the following perspective
#68607 (comment)
#68607 (comment)
it is possibble to stop writers here and pass NoHostCalls to the codeman by call below
//TODO

// Returns whether the reader lock is acquired
if (count == 1)// h->count == 0
{
//we are here in relatively rare case
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are in a NoHostCalls situation, we should not delete anything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I can leave this code only in wlh destructor (as writers are about to need allocate or delete something). Now this code is both in rlh and wlh destructors.
//TODO

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no reason to do that. Just disable the code when running in NoHostCalls.

@@ -4475,7 +4493,6 @@ BOOL ExecutionManager::IsManagedCodeWithLock(PCODE currentPC)
GC_NOTRIGGER;
} CONTRACTL_END;

ReaderLockHolder rlh;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it ok to remove this lock?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that GetRangeSection will unconditionally take the lock, but it will return a RangeSection, that itself isn't protected from unload. Please restore the original structure of how the locks are created.

Copy link
Contributor Author

@y-yamshchikov y-yamshchikov May 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I can understand your comment, the function of rlh there is not for protecting the RS list from possibly inconsistence but instead for protecting an RS using in IsManagedCodeWorker from deletion.

In my proposal, as described in my answer to you at #68607 (comment) rlh and wlh are used only for maintain RS storage consistency, but the schema leaves us an opportunity to restore the RS instance protecting lock structure.
//TODO

@@ -4496,7 +4513,6 @@ BOOL ExecutionManager::IsManagedCode(PCODE currentPC, HostCallPreference hostCal
}

ReaderLockHolder rlh(hostCallPreference);
if (!rlh.Acquired())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By removing the if statement, we are unconditionally going down the failure path, why are you doing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As soon as the schema described in my answer below (#68607 (comment)) is correct, there are no reader's locks at all. rlh is needed only for counting references and make proper swaps of reader's array and writer's array. There is no case the "lock" is not acquired: it is always "acquired" either instantly or a few cycles later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that isn't what I'm saying. In this case, line 4515 unconditionally takes the lock, and then as the if statement is removed, the new code path will unconditionally return FALSE, and set *pfFailedReaderLock to TRUE.

Copy link
Contributor Author

@y-yamshchikov y-yamshchikov May 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I certainly have to remove all the code under the if body, not only if statement.
//TODO

RangeSection *pCurr = pHead;
RangeSection *pLast = NULL;

ReaderLockHolder rlh;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need whatever lock we are using to protect not only the rangesection lists, but also the RangeSection instances themselves. This is not an appropriate place to take the lock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we become agree on
#68607 (comment)
in perspective of
#68607 (comment)
this annotation becomes resolved.

iHigh = iMid;
iMid = (iHigh + iLow)/2;
}
else if (addr >= array[iMid+1].LowAddress)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code here is doing a binary search of the RangeSection list, but I don't see how its protected from a separate thread writing to the sorted list. In particular, if there is an Add going on, which may be performing a memcpy of the list to insert something in the middle, the structure of the reader lock as written will allow a reader to enter, and read while the list is in an inconsistent state.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate writing thread uses the copy of an array. Schema is as follows:

  1. Readers can simultaneously perform any number of search runs over reader's copy of an array (I call it reader's array or reader's header (as the meta of an array resides in a header structure). There is only fast interlocked reference counting of how many readers are running. Once a reader leaves the reader array there is fast interlocked decrement of the reference counter. Reading is asymptotically log2 N by the nature of an algorithm or unit by the nature of last_used_index optimization (depending on the load model).

  2. Writers can perform O(N) (due to memcpy of the whole array) simultaneously with any number of readers traversing the reader's array. Writers use the copy of the reader's array (I call it writer's array of writer's header). There are only one writer at a time using writer's array, but O(N) writing is NOT preventing readers from traversing reader's array. Once a writer finishes preparing proper writer's array with all necessary changes, the WriterLockHolder (wlh) goes to it's destructor and performs the substitution of writer's array header pointer to the reader's one. Till now, all old readers still run over the old reader's array and decrement old reader's array reference counter, and all new readers will run over the new reader's array incrementing new reader's reference counter on the start. Once the latest old-reader leaves old-reader-array, rlh destructor returns reader's array header pointer to the writer's slot, and the swap of arrays completes (the wlh destructor can perform same action if there are no old-readers when it substitutes reader's <- writer's header pointers).

So we have almost seamless reader's copy and some writing work in a background.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed that. That highlights the concern I have with the comments, etc. ALL of those need to be updated to accurately describe the new model. Also, recent experience has shown that its much safer to use explicit volatile load/store methods instead, so the particular memory operations can easily be seen and examined.

@ghost ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label May 5, 2022
@jkotas
Copy link
Member

jkotas commented May 5, 2022

the set of comments describing the locking has not been updated, and is certainly incorrect. @jkotas do you agree?

I agree. I have double-checked some of the comments made by @davidwrighton and I agree that they highlight real problems.

@ghost ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label May 6, 2022
static Volatile<LONG> m_dwReaderCount;
static Volatile<LONG> m_dwWriterLock;
static volatile RangeSectionHandleHeader * m_RangeSectionHandleReaderHeader;
static volatile RangeSectionHandleHeader * m_RangeSectionHandleWriterHeader;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plain volatile is not safe to use within our codebase. It has subtly different meanings on various compilers. Please either use the Volatile type, or use explicit VolatileLoad/VolatileStore apis.

@davidwrighton
Copy link
Member

// release the writer and yield to wait for the readers to be done.

Comments like this throughout the codebase need to be updated with this change.


Refers to: src/coreclr/vm/codeman.cpp:651 in 5056171. [](commit_id = 5056171, deletion_comment = False)

int count;
RangeSectionHandleHeader *old_rh = (RangeSectionHandleHeader*)m_RangeSectionHandleReaderHeader;
h->count = 1; //EM's unit
m_RangeSectionHandleReaderHeader = h;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be a VolatileStore with a barrier.

@y-yamshchikov y-yamshchikov force-pushed the opt-r2r-rs branch 5 times, most recently from 472e200 to ac3d401 Compare June 3, 2022 17:27
@davidwrighton
Copy link
Member

@y-yamshchikov Please let me know when you are ready for me to review this again.

@y-yamshchikov
Copy link
Contributor Author

/azp run "runtime (CoreCLR Product Build Linux x86 checked)"

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 68607 in repo dotnet/runtime

@y-yamshchikov
Copy link
Contributor Author

@y-yamshchikov Please let me know when you are ready for me to review this again.

@davidwrighton please take a look
you can find exhaustive comment on top of the vm/codeman.h document

@alpencolt
Copy link

cc @HJLeee @wscho77

Copy link
Member

@davidwrighton davidwrighton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a viable locking strategy.

  1. The ReaderLock isn't safe to take unless the ForbidDeleteLock is held. (The process of entering into the ReaderLock, reads a value, that may be deleted before any other operation occurs)
  2. The ForbidDeletion lock is simply a reader/writer lock. If held, it makes the read side of the system safe, but should use standard reader/writer locks if that is what is wanted.
  3. There appear to be code paths which rely solely on a ReaderLock to protect them. However, by the description, while the reader lock is held, in theory the writer is allowed to replace the current copy of the reader data structure. If this happens twice while the reader is operating, then the reader may be using a copy of the reader array which can be freed (as the ReaderLockHolder only locks the reader array present at holder creation, but other code such as GetRangeSection will read the current reader array, and not the one that was locked.
  4. The ShouldAvoidHostCalls pathway in GetFunctionInfoInternal is not wait-free. That pathway relies on the ability to perform its operations when the rest of the runtime is suspended at any point of code, including while holding locks. The logic you've written does avoid host calls, but it does so by introducing spinning, which is not an improvement, and may deadlock if the other threads are suspended in the wrong spot.

@ghost ghost added the needs-author-action An issue or pull request that requires more info or actions from the author. label Jul 2, 2022
@y-yamshchikov
Copy link
Contributor Author

@davidwrighton

  1. ForbidDeletion lock is designed to protect all deletions of any element from the storage, so it should be held if one can expect deletion. Maybe you mean WriterLockHolder could delete something ReaderLockHolder is trying to read on acquiring, but it surely will not: the reader (and writer) arrays are separated in two parts: one with meta and one with an array itself. Metas (one for reader and one for writer) are never deleted. rlh works only with meta on acquiring.

  2. With ForbidDeletion lock we could block readers only if we want to delete (which seems to be the case on AppDomain unloading) and let readers seamlessly flow in case of ordinary (if it is actually more ordinary to add an RS then to delete) adding. In the proposal where we need to sync only on just a very moment of interchanging pointers to metas (which in turn point to arrays) it seems costly and unnecessary to yield.

  3. Maybe I had not highlighted this in the description. It won't happen twice. Second (call it so) writer will wait till oldest reader leaves and puts it's pointer to writer's slot. As it happens, Second writer becomes new First, and so on. I understand it is hard to pick from code review and it was the main reason I have developed huge unit test mentioned in topic. The payload model for the test was not so subtle to capture issues like you mentioned in (4), but very intensive to detect errors you suspect in (3). If I'm still not clear, please tell me and I'll made an attempt to prove it formally.

  4. OK, it is trivial to return logic like it was making one-shot FDH version (tries one time to enter and returns status).

@ghost ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Jul 18, 2022
@mangod9
Copy link
Member

mangod9 commented Aug 29, 2022

Checking in on the current state of this PR. Is this still being considered?

@davidwrighton
Copy link
Member

I've been busy with getting .NET 7 out the door, and I'm now going to take another deep look at this now that we've forked for .NET 8.

@davidwrighton davidwrighton self-assigned this Oct 13, 2022
@davidwrighton
Copy link
Member

I've taken a closer look at this, and while I think the concept is good, the locking mechanism is complex enough that even if it is correct (and I'm not entirely convinced of that, but I don't actually see incorrect behavior anywhere I look) there is no practical way to maintain confidence that it remains correct over time. In addition, while I'm confident the performance is good for the scenarios that your team sees, we have some customers that have many thousands of these range sections, and it is plausible that this scheme will have outsize performance problems in those scenarios. As such, I'd like to investigate different approaches to this that don't require such a monumentally complex to analyze locking scheme. In the next month or so I will try to put together a different approach to the problem that doesn't require such complex analysis. My current thought is to mimic some form of a page table, which will allow us to implement most of the complexity in terms of atomic operations which do not require complex locks.

@davidwrighton
Copy link
Member

Its taken a while longer than I hoped, but the work to fix the controversial portion of this change is well in progress, and shows excellent characteristics. I'll be closing this for now, but once #79021 is merged, please take another look at your scenario and see what more changes you'd like to have, and we'll review them.

@ghost ghost locked as resolved and limited conversation to collaborators Jan 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-crossgen2-coreclr community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Version Bubble implementation flaw: some methods are forced to rejit
7 participants