-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Completely lock-free ClassLoader::LookupTypeKey #61346
Conversation
@davidwrighton @jkotas - please take a look. Thanks! |
@hoyosjs - could you take a look at DAC casts in the new code? |
CrstAvailableParamTypes, | ||
(CrstFlags)(CRST_UNSAFE_ANYMODE | CRST_DEBUGGER_THREAD)); | ||
CrstAvailableParamTypes, | ||
CRST_DEBUGGER_THREAD); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hoyosjs - If debugger only does lookups, then it will no longer take this look, then CRST_DEBUGGER_THREAD
is not needed.
Can debugger load/publish types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we can't load assemblies, but it's possible that a FuncEval can load a type I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is funceval running on debugger thread?
I assumed that CRST_DEBUGGER_THREAD
does not cover funceaval, since funceval could JIT and JIT does all kind of stuff (including loading assemblies). If that is attributed to the debugger thread, then debugger thread is not different from anything else.
I am not very familiar with how that all works though.
CRST_DEBUGGER_THREAD
is not a big nuisance here. I was just not sure it is still necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried removing CRST_DEBUGGER_THREAD
and running both regular and diagnostics tests (with Chk bits) - everything runs as before. Not sure if that is enough proof that CRST_DEBUGGER_THREAD
is unnecessary.
I just noticed that CRST_DEBUGGER_THREAD
is always paired with CRST_UNSAFE_ANYMODE
(or sometimes CRST_GC_NOTRIGGER_WHEN_TAKEN
), so since we are removing the other one, maybe we do not need CRST_DEBUGGER_THREAD
either.
But do not think CRST_DEBUGGER_THREAD
is a big nuisance either way - it basically increments/decrements a counter that is checked in couple asserts.
src/coreclr/vm/dacenumerablehash.inl
Outdated
// slot [1] will contain the next version of the table if it resizes | ||
S_SIZE_T cbNewBuckets = S_SIZE_T(cNewBuckets + 2) * S_SIZE_T(sizeof(PTR_VolatileEntry)); | ||
|
||
// REVIEW: I need a temp array here, relatively small (under 1K elements typically), is this the right heap for that? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use regular C/C++ heap. The loader heaps are not designed for allocating temporary arrays like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the tails long enough to make it worth it? Would it be better to just walk them as necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about the old table, which would be at 2x load factor at this point and that we are holding up somebody's progress (most likely JIT), so should use a tail array as a guarantee against degenerate cases.
However, these chains would be in the new table, and load factor while resizing will be 0 - 0.5, so buckets should be short. There is always a possibility of a case where a lot of items form a chain, but after tweaking the hash function in the previous change, it will be statistically rare .
Also, doubling-up resize can really happen only a few times in the life of the module, so I think the risk of causing an observable pause is low.
I will switch this to just walking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In System.Linq.Expressions.Tests
, which liberally uses generics, the max walk during individual resizes is generally 1 - 5, but every run has some outliers with max walk > 50. I think it is acceptable.
It makes me wonder though if there is a cheap way to bring the clustering further down by improving the hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the outliers are all from a different hashtable - EEClassHashTable
, which shares the base implementation with EETypeHashTable
, but has different elements and hash function.
Its resize factor is 4x, triggered at the same 2x load factor. In theory it should have even shorter buckets at resize.
I will have to look closer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are hashing types by namespace/name and see a lot of types with empty namespace and names like "<>c" .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EEClassHashTable
seems to be intentionally placing same-named nested classes in the same bucket.
I think walking the chain is still acceptable. - If these chains are acceptable for lookups, then they are ok for very rare 4x resizes.
EEClassHashTable
could use some clean-up in a separate change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a change that fixes the nested type collision issue for EEClassHashTable
. It will be a separate PR.
src/coreclr/vm/ceeload.inl
Outdated
@@ -35,6 +35,7 @@ void LookupMap<TYPE>::SetValueAt(PTR_TADDR pValue, TYPE value, TADDR flags) | |||
|
|||
value = dac_cast<TYPE>((dac_cast<TADDR>(value) | flags)); | |||
|
|||
// REVIEW: why this is not VolatileStore? What guarantees that we do not have concurrent readers? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think LookupMap<TYPE>
can be read concurrently while writer has not left the write lock yet (so on ARM the reader may occasionally not see all the writes). I will look at this separately. Maybe it is ok for some subtle reason.
It would be a separate issue anyways
PR review suggestion Co-authored-by: Jan Kotas <jkotas@microsoft.com>
src/coreclr/vm/dacenumerablehash.inl
Outdated
DWORD cNewBuckets = NextLargestPrime(cBuckets * SCALE_FACTOR); | ||
// two extra slots - slot [0] contains the length of the table, | ||
// slot [1] will contain the next version of the table if it resizes | ||
S_SIZE_T cbNewBuckets = S_SIZE_T(cNewBuckets + 2) * S_SIZE_T(sizeof(PTR_VolatileEntry)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
S_SIZE_T cbNewBuckets = S_SIZE_T(cNewBuckets + 2) * S_SIZE_T(sizeof(PTR_VolatileEntry)); | |
S_SIZE_T cbNewBuckets = (S_SIZE_T(cNewBuckets) + S_SIZE_T(2)) * S_SIZE_T(sizeof(PTR_VolatileEntry)); |
src/coreclr/vm/dacenumerablehash.inl
Outdated
PTR_VolatileEntry *pNewBuckets = (PTR_VolatileEntry*)(void*)GetHeap()->AllocMem_NoThrow(cbNewBuckets); | ||
if (!pNewBuckets) | ||
return; | ||
|
||
// element 0 stores the length of the table | ||
((size_t*)pNewBuckets)[0] = cNewBuckets; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to define constants or something for the fake 0 and 1 bucket indices to make this easier to understand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think making "Length" and "Next" helpers that take an array could make this more clear.
src/coreclr/vm/dacenumerablehash.inl
Outdated
@@ -129,21 +130,24 @@ void DacEnumerableHashTable<DAC_ENUM_HASH_ARGS>::BaseInsertEntry(DacEnumerableHa | |||
// Remember the entry hash code. | |||
pVolatileEntry->m_iHashValue = iHash; | |||
|
|||
// Compute which bucket the entry belongs in based on the hash. | |||
DWORD dwBucket = iHash % m_cBuckets; | |||
auto curBuckets = GetBuckets(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that uses of auto
like this do not help with code readability.
We have rule against it in the repo C# coding conventions. We do not have explicit repo C++ coding conventions, but I think it is still to best to use explicit types when the type is not obvious from the right hand side.
src/coreclr/vm/dacenumerablehash.inl
Outdated
|
||
// Compute which bucket the entry belongs in based on the hash. | ||
DWORD dwBucket = iHash % VolatileLoad(&m_cBuckets); | ||
PTR_VolatileEntry* curBuckets = GetBuckets(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think that this works with DAC. The cast from DPTR(PTR_VolatileEntry)
to PTR_VolatileEntry*
will only marshal the first array entry. This needs to pass DPTR(PTR_VolatileEntry)
all the way through to the places we index into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably run this through debugger tests.
src/coreclr/vm/dacenumerablehash.inl
Outdated
VolatileLoadBarrier(); | ||
|
||
// in a case if resize is in progress, look in the new table as well. | ||
auto nextBuckets = ((DPTR(PTR_VolatileEntry)*)pContext->m_curTable)[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hoyosjs - I assume this is correct? (modulo auto
). In this case I want to index on the remote side. Is this the right way to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto nextBuckets = ((DPTR(PTR_VolatileEntry)*)pContext->m_curTable)[1]; | |
auto nextBuckets = dac_cast<DPTR(PTR_VolatileEntry)>(pContext->m_curTable)[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made GetNext
a common helper. With that it is more convenient if m_curTable is (DPTR(PTR_VolatileEntry)
…ded a comment about a cast.
src/coreclr/vm/dacenumerablehash.h
Outdated
|
||
static DPTR(PTR_VolatileEntry) GetNext(DPTR(PTR_VolatileEntry) buckets) | ||
{ | ||
return (DPTR(PTR_VolatileEntry))dac_cast<TADDR>(buckets[SLOT_NEXT]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hoyosjs - is this correct or from TADDR it must be another dac_cast ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like:
static DPTR(PTR_VolatileEntry) GetNext(DPTR(PTR_VolatileEntry) buckets)
{
return dac_cast<DPTR(PTR_VolatileEntry)>(dac_cast<TADDR>(buckets[SLOT_NEXT]));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand this right buckets[SLOT_NEXT]
is pointing to a new hash table that's an array of linked lists? Then it's
return dac_cast<DPTR(PTR_VolatileEntry)>(dac_cast<TADDR>(buckets[SLOT_NEXT]));
which can be simplified to
return dac_cast<DPTR(PTR_VolatileEntry)>(buckets[SLOT_NEXT]);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The buckets[SLOT_NEXT]
will be pointing to the next larger table, if resize happened while you are doing a lookup. It will be rare. It is possible, even though extremely rare to see more than one new version.
Every new one is 2x larger than previous, so there can't be too many even in the ultimate worst case.
I have run diagnostics tests with bits right before my change and after (multiple times). I do not see any regressions due to the change. |
The new test failure on Win arm64 is caused by #61548 - it is not because of these changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DAC changes lgtm
Thanks!! |
The goal is to simplify use of
m_AvailableTypesLock
Currently
m_AvailableTypesLock
lock may be taken in lookups, and lookups may happen in aGC_NOTRIGGER
scope.Thus the lock is
CRST_UNSAFE_ANYMODE
with an additional requirement that callers, with exception of GC threads, must switch to COOP mode before using this lock - to avoid deadlocks during GC.The requirement it fragile. Besides all other uses of
m_AvailableTypesLock
are in preemptive mode and it is better when preemptive mode can stay preemptive.Once we do not need to take locks in Lookup, we can make
m_AvailableTypesLock
just an ordinary lock.