-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search native code in all R2R images in version bubble #68607
Conversation
I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label. |
03e04a3
to
5056171
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the idea of how this is supposed to work, but I do not believe the locking is not correct. In any case, if the locking is correct, the set of comments describing the locking has not been updated, and is certainly incorrect. @jkotas do you agree?
@@ -1927,13 +1927,6 @@ HRESULT GetFunctionInfoInternal(LPCBYTE ip, EECodeInfo * pCodeInfo) | |||
|
|||
if (ShouldAvoidHostCalls()) | |||
{ | |||
ExecutionManager::ReaderLockHolder rlh(NoHostCalls); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ReaderLockHolder logic needs to stay in place. GetFunctionInfoInternal from the profiler api may be called while other threads are suspended via OS thread suspension apis, and may be used when alloc/free/new/delete are not permitted to be used. However, the technique you're using for locking looks like it will always successfully acquire, so you don't need to restore the Acquired() api, and call below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the following perspective
#68607 (comment)
#68607 (comment)
it is possibble to stop writers here and pass NoHostCalls to the codeman by call below
//TODO
// Returns whether the reader lock is acquired | ||
if (count == 1)// h->count == 0 | ||
{ | ||
//we are here in relatively rare case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are in a NoHostCalls situation, we should not delete anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I can leave this code only in wlh destructor (as writers are about to need allocate or delete something). Now this code is both in rlh and wlh destructors.
//TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no reason to do that. Just disable the code when running in NoHostCalls.
@@ -4475,7 +4493,6 @@ BOOL ExecutionManager::IsManagedCodeWithLock(PCODE currentPC) | |||
GC_NOTRIGGER; | |||
} CONTRACTL_END; | |||
|
|||
ReaderLockHolder rlh; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it ok to remove this lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that GetRangeSection
will unconditionally take the lock, but it will return a RangeSection, that itself isn't protected from unload. Please restore the original structure of how the locks are created.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I can understand your comment, the function of rlh there is not for protecting the RS list from possibly inconsistence but instead for protecting an RS using in IsManagedCodeWorker from deletion.
In my proposal, as described in my answer to you at #68607 (comment) rlh and wlh are used only for maintain RS storage consistency, but the schema leaves us an opportunity to restore the RS instance protecting lock structure.
//TODO
@@ -4496,7 +4513,6 @@ BOOL ExecutionManager::IsManagedCode(PCODE currentPC, HostCallPreference hostCal | |||
} | |||
|
|||
ReaderLockHolder rlh(hostCallPreference); | |||
if (!rlh.Acquired()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By removing the if statement, we are unconditionally going down the failure path, why are you doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As soon as the schema described in my answer below (#68607 (comment)) is correct, there are no reader's locks at all. rlh is needed only for counting references and make proper swaps of reader's array and writer's array. There is no case the "lock" is not acquired: it is always "acquired" either instantly or a few cycles later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that isn't what I'm saying. In this case, line 4515 unconditionally takes the lock, and then as the if statement is removed, the new code path will unconditionally return FALSE, and set *pfFailedReaderLock to TRUE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I certainly have to remove all the code under the if
body, not only if
statement.
//TODO
src/coreclr/vm/codeman.cpp
Outdated
RangeSection *pCurr = pHead; | ||
RangeSection *pLast = NULL; | ||
|
||
ReaderLockHolder rlh; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need whatever lock we are using to protect not only the rangesection lists, but also the RangeSection instances themselves. This is not an appropriate place to take the lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we become agree on
#68607 (comment)
in perspective of
#68607 (comment)
this annotation becomes resolved.
iHigh = iMid; | ||
iMid = (iHigh + iLow)/2; | ||
} | ||
else if (addr >= array[iMid+1].LowAddress) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code here is doing a binary search of the RangeSection list, but I don't see how its protected from a separate thread writing to the sorted list. In particular, if there is an Add going on, which may be performing a memcpy of the list to insert something in the middle, the structure of the reader lock as written will allow a reader to enter, and read while the list is in an inconsistent state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Separate writing thread uses the copy of an array. Schema is as follows:
-
Readers can simultaneously perform any number of search runs over reader's copy of an array (I call it reader's array or reader's header (as the meta of an array resides in a header structure). There is only fast interlocked reference counting of how many readers are running. Once a reader leaves the reader array there is fast interlocked decrement of the reference counter. Reading is asymptotically log2 N by the nature of an algorithm or unit by the nature of last_used_index optimization (depending on the load model).
-
Writers can perform O(N) (due to memcpy of the whole array) simultaneously with any number of readers traversing the reader's array. Writers use the copy of the reader's array (I call it writer's array of writer's header). There are only one writer at a time using writer's array, but O(N) writing is NOT preventing readers from traversing reader's array. Once a writer finishes preparing proper writer's array with all necessary changes, the WriterLockHolder (wlh) goes to it's destructor and performs the substitution of writer's array header pointer to the reader's one. Till now, all old readers still run over the old reader's array and decrement old reader's array reference counter, and all new readers will run over the new reader's array incrementing new reader's reference counter on the start. Once the latest old-reader leaves old-reader-array, rlh destructor returns reader's array header pointer to the writer's slot, and the swap of arrays completes (the wlh destructor can perform same action if there are no old-readers when it substitutes reader's <- writer's header pointers).
So we have almost seamless reader's copy and some writing work in a background.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I missed that. That highlights the concern I have with the comments, etc. ALL of those need to be updated to accurately describe the new model. Also, recent experience has shown that its much safer to use explicit volatile load/store methods instead, so the particular memory operations can easily be seen and examined.
I agree. I have double-checked some of the comments made by @davidwrighton and I agree that they highlight real problems. |
src/coreclr/vm/codeman.h
Outdated
static Volatile<LONG> m_dwReaderCount; | ||
static Volatile<LONG> m_dwWriterLock; | ||
static volatile RangeSectionHandleHeader * m_RangeSectionHandleReaderHeader; | ||
static volatile RangeSectionHandleHeader * m_RangeSectionHandleWriterHeader; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plain volatile is not safe to use within our codebase. It has subtly different meanings on various compilers. Please either use the Volatile type, or use explicit VolatileLoad/VolatileStore apis.
src/coreclr/vm/codeman.cpp
Outdated
int count; | ||
RangeSectionHandleHeader *old_rh = (RangeSectionHandleHeader*)m_RangeSectionHandleReaderHeader; | ||
h->count = 1; //EM's unit | ||
m_RangeSectionHandleReaderHeader = h; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be a VolatileStore with a barrier.
472e200
to
ac3d401
Compare
@y-yamshchikov Please let me know when you are ready for me to review this again. |
264dd57
to
000a4df
Compare
000a4df
to
eab49b9
Compare
/azp run "runtime (CoreCLR Product Build Linux x86 checked)" |
Commenter does not have sufficient privileges for PR 68607 in repo dotnet/runtime |
@davidwrighton please take a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a viable locking strategy.
- The ReaderLock isn't safe to take unless the ForbidDeleteLock is held. (The process of entering into the ReaderLock, reads a value, that may be deleted before any other operation occurs)
- The ForbidDeletion lock is simply a reader/writer lock. If held, it makes the read side of the system safe, but should use standard reader/writer locks if that is what is wanted.
- There appear to be code paths which rely solely on a ReaderLock to protect them. However, by the description, while the reader lock is held, in theory the writer is allowed to replace the current copy of the reader data structure. If this happens twice while the reader is operating, then the reader may be using a copy of the reader array which can be freed (as the
ReaderLockHolder
only locks the reader array present at holder creation, but other code such asGetRangeSection
will read the current reader array, and not the one that was locked. - The ShouldAvoidHostCalls pathway in GetFunctionInfoInternal is not wait-free. That pathway relies on the ability to perform its operations when the rest of the runtime is suspended at any point of code, including while holding locks. The logic you've written does avoid host calls, but it does so by introducing spinning, which is not an improvement, and may deadlock if the other threads are suspended in the wrong spot.
|
Checking in on the current state of this PR. Is this still being considered? |
I've been busy with getting .NET 7 out the door, and I'm now going to take another deep look at this now that we've forked for .NET 8. |
I've taken a closer look at this, and while I think the concept is good, the locking mechanism is complex enough that even if it is correct (and I'm not entirely convinced of that, but I don't actually see incorrect behavior anywhere I look) there is no practical way to maintain confidence that it remains correct over time. In addition, while I'm confident the performance is good for the scenarios that your team sees, we have some customers that have many thousands of these range sections, and it is plausible that this scheme will have outsize performance problems in those scenarios. As such, I'd like to investigate different approaches to this that don't require such a monumentally complex to analyze locking scheme. In the next month or so I will try to put together a different approach to the problem that doesn't require such complex analysis. My current thought is to mimic some form of a page table, which will allow us to implement most of the complexity in terms of atomic operations which do not require complex locks. |
Its taken a while longer than I hoped, but the work to fix the controversial portion of this change is well in progress, and shows excellent characteristics. I'll be closing this for now, but once #79021 is merged, please take another look at your scenario and see what more changes you'd like to have, and we'll review them. |
This PR fixes part of #44948 and fixes #46160.
This code simply traverses through assemblies in the application
domain. For each assembly (module) it realizes is it Ready To Run and is
it in the same bubble with (does it deliberately bubbling the) module
from which generic function originates. If so, it makes request for code
is Ready To Run and hopes there is some in the module. If the request
succeeds it proceeds with found pointer to the bare native code.
Now such methods use their Ready To Run code.
We have got significant performance gain on startup: 7% average on our representative set on Tizen.
This PR worked out notices in PRs below:
#47269
about linear search through the set of RangeSections. In this new PR we propose storing of RangeSections in sorted array (with number of optimizations inspired by prior linked list based solution).
#57277
about lockfree requirements
We have extensively tested this PR on armel/Tizen platform so in this case we are confident in reliability and profitability of the solution.
We have also implemented high coverage unit-testing model for RangeSection optimization and used it in single-processor single-thread load, single-processor multi-thread load and multi-processor multi-thread load with all combination of simultaneous add and/or read and/or delete (with and without two simultaneous readers and/or two simultaneous adders and/or two simultaneous deleters). Order of adding has had switches between ascending, descending and random modes. All these switches have been tested under debug and release setups of the algorithm. Strictly, we have performed full-factor experiment under selected options in the unit-test. The test can be accessed by the following link: https://github.com/y-yamshchikov/rs-model. The unit-test had been performed for linux-x64.
Main concept of the test is 1-to-1 copying of the code of RangeSection storage with decoupling from the CoreCLR by the least significant line (as far as it is possible in the case it is not in a separate file), so we can be sure an error-prone surface is smallest.
Of course, we have performed priority1 tests and performance tests after all.
In this PR (vs #57277) we have implemented new synchronization mechanism for RangeSection storage. It grants lock-free (but not wait-free) access and Log(n) complexity. Primitives have the same names: ReaderLockHolder and WriterLockHolder, but theirs behavior and encapsulation slightly differ.
Dear colleagues @jkotas @alpencolt, please take a look.