Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Completely lock-free ClassLoader::LookupTypeKey #61346

Merged
merged 16 commits into from
Nov 14, 2021
Merged

Conversation

VSadov
Copy link
Member

@VSadov VSadov commented Nov 9, 2021

The goal is to simplify use of m_AvailableTypesLock

Currently m_AvailableTypesLock lock may be taken in lookups, and lookups may happen in a GC_NOTRIGGER scope.
Thus the lock is CRST_UNSAFE_ANYMODE with an additional requirement that callers, with exception of GC threads, must switch to COOP mode before using this lock - to avoid deadlocks during GC.
The requirement it fragile. Besides all other uses of m_AvailableTypesLock are in preemptive mode and it is better when preemptive mode can stay preemptive.

Once we do not need to take locks in Lookup, we can make m_AvailableTypesLock just an ordinary lock.

@VSadov
Copy link
Member Author

VSadov commented Nov 9, 2021

@davidwrighton @jkotas - please take a look. Thanks!

@VSadov
Copy link
Member Author

VSadov commented Nov 9, 2021

@hoyosjs - could you take a look at DAC casts in the new code?
They compile, but I am not highly confident it is all correct. A look by someone who knows that better would be helpful.

CrstAvailableParamTypes,
(CrstFlags)(CRST_UNSAFE_ANYMODE | CRST_DEBUGGER_THREAD));
CrstAvailableParamTypes,
CRST_DEBUGGER_THREAD);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hoyosjs - If debugger only does lookups, then it will no longer take this look, then CRST_DEBUGGER_THREAD is not needed.
Can debugger load/publish types?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we can't load assemblies, but it's possible that a FuncEval can load a type I believe.

Copy link
Member Author

@VSadov VSadov Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is funceval running on debugger thread?

I assumed that CRST_DEBUGGER_THREAD does not cover funceaval, since funceval could JIT and JIT does all kind of stuff (including loading assemblies). If that is attributed to the debugger thread, then debugger thread is not different from anything else.
I am not very familiar with how that all works though.

CRST_DEBUGGER_THREAD is not a big nuisance here. I was just not sure it is still necessary.

Copy link
Member Author

@VSadov VSadov Nov 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried removing CRST_DEBUGGER_THREAD and running both regular and diagnostics tests (with Chk bits) - everything runs as before. Not sure if that is enough proof that CRST_DEBUGGER_THREAD is unnecessary.

I just noticed that CRST_DEBUGGER_THREAD is always paired with CRST_UNSAFE_ANYMODE (or sometimes CRST_GC_NOTRIGGER_WHEN_TAKEN), so since we are removing the other one, maybe we do not need CRST_DEBUGGER_THREAD either.

But do not think CRST_DEBUGGER_THREAD is a big nuisance either way - it basically increments/decrements a counter that is checked in couple asserts.

// slot [1] will contain the next version of the table if it resizes
S_SIZE_T cbNewBuckets = S_SIZE_T(cNewBuckets + 2) * S_SIZE_T(sizeof(PTR_VolatileEntry));

// REVIEW: I need a temp array here, relatively small (under 1K elements typically), is this the right heap for that?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use regular C/C++ heap. The loader heaps are not designed for allocating temporary arrays like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the tails long enough to make it worth it? Would it be better to just walk them as necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about the old table, which would be at 2x load factor at this point and that we are holding up somebody's progress (most likely JIT), so should use a tail array as a guarantee against degenerate cases.

However, these chains would be in the new table, and load factor while resizing will be 0 - 0.5, so buckets should be short. There is always a possibility of a case where a lot of items form a chain, but after tweaking the hash function in the previous change, it will be statistically rare .
Also, doubling-up resize can really happen only a few times in the life of the module, so I think the risk of causing an observable pause is low.

I will switch this to just walking.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In System.Linq.Expressions.Tests, which liberally uses generics, the max walk during individual resizes is generally 1 - 5, but every run has some outliers with max walk > 50. I think it is acceptable.

It makes me wonder though if there is a cheap way to bring the clustering further down by improving the hash.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the outliers are all from a different hashtable - EEClassHashTable, which shares the base implementation with EETypeHashTable, but has different elements and hash function.

Its resize factor is 4x, triggered at the same 2x load factor. In theory it should have even shorter buckets at resize.
I will have to look closer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are hashing types by namespace/name and see a lot of types with empty namespace and names like "<>c" .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EEClassHashTable seems to be intentionally placing same-named nested classes in the same bucket.
I think walking the chain is still acceptable. - If these chains are acceptable for lookups, then they are ok for very rare 4x resizes.

EEClassHashTable could use some clean-up in a separate change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a change that fixes the nested type collision issue for EEClassHashTable. It will be a separate PR.

@@ -35,6 +35,7 @@ void LookupMap<TYPE>::SetValueAt(PTR_TADDR pValue, TYPE value, TADDR flags)

value = dac_cast<TYPE>((dac_cast<TADDR>(value) | flags));

// REVIEW: why this is not VolatileStore? What guarantees that we do not have concurrent readers?
Copy link
Member Author

@VSadov VSadov Nov 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think LookupMap<TYPE> can be read concurrently while writer has not left the write lock yet (so on ARM the reader may occasionally not see all the writes). I will look at this separately. Maybe it is ok for some subtle reason.

It would be a separate issue anyways

@VSadov
Copy link
Member Author

VSadov commented Nov 10, 2021

@jkotas @hoyosjs - any more comments on this PR ?

PR review suggestion

Co-authored-by: Jan Kotas <jkotas@microsoft.com>
@VSadov
Copy link
Member Author

VSadov commented Nov 12, 2021

@jkotas @hoyosjs - any more comments on this PR ?

DWORD cNewBuckets = NextLargestPrime(cBuckets * SCALE_FACTOR);
// two extra slots - slot [0] contains the length of the table,
// slot [1] will contain the next version of the table if it resizes
S_SIZE_T cbNewBuckets = S_SIZE_T(cNewBuckets + 2) * S_SIZE_T(sizeof(PTR_VolatileEntry));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
S_SIZE_T cbNewBuckets = S_SIZE_T(cNewBuckets + 2) * S_SIZE_T(sizeof(PTR_VolatileEntry));
S_SIZE_T cbNewBuckets = (S_SIZE_T(cNewBuckets) + S_SIZE_T(2)) * S_SIZE_T(sizeof(PTR_VolatileEntry));

PTR_VolatileEntry *pNewBuckets = (PTR_VolatileEntry*)(void*)GetHeap()->AllocMem_NoThrow(cbNewBuckets);
if (!pNewBuckets)
return;

// element 0 stores the length of the table
((size_t*)pNewBuckets)[0] = cNewBuckets;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to define constants or something for the fake 0 and 1 bucket indices to make this easier to understand?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think making "Length" and "Next" helpers that take an array could make this more clear.

@@ -129,21 +130,24 @@ void DacEnumerableHashTable<DAC_ENUM_HASH_ARGS>::BaseInsertEntry(DacEnumerableHa
// Remember the entry hash code.
pVolatileEntry->m_iHashValue = iHash;

// Compute which bucket the entry belongs in based on the hash.
DWORD dwBucket = iHash % m_cBuckets;
auto curBuckets = GetBuckets();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that uses of auto like this do not help with code readability.

We have rule against it in the repo C# coding conventions. We do not have explicit repo C++ coding conventions, but I think it is still to best to use explicit types when the type is not obvious from the right hand side.


// Compute which bucket the entry belongs in based on the hash.
DWORD dwBucket = iHash % VolatileLoad(&m_cBuckets);
PTR_VolatileEntry* curBuckets = GetBuckets();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think that this works with DAC. The cast from DPTR(PTR_VolatileEntry) to PTR_VolatileEntry* will only marshal the first array entry. This needs to pass DPTR(PTR_VolatileEntry) all the way through to the places we index into it.

Copy link
Member

@hoyosjs hoyosjs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably run this through debugger tests.

src/coreclr/vm/dacenumerablehash.inl Outdated Show resolved Hide resolved
src/coreclr/vm/dacenumerablehash.inl Outdated Show resolved Hide resolved
VolatileLoadBarrier();

// in a case if resize is in progress, look in the new table as well.
auto nextBuckets = ((DPTR(PTR_VolatileEntry)*)pContext->m_curTable)[1];
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hoyosjs - I assume this is correct? (modulo auto). In this case I want to index on the remote side. Is this the right way to do this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto nextBuckets = ((DPTR(PTR_VolatileEntry)*)pContext->m_curTable)[1];
auto nextBuckets = dac_cast<DPTR(PTR_VolatileEntry)>(pContext->m_curTable)[1];

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made GetNext a common helper. With that it is more convenient if m_curTable is (DPTR(PTR_VolatileEntry)


static DPTR(PTR_VolatileEntry) GetNext(DPTR(PTR_VolatileEntry) buckets)
{
return (DPTR(PTR_VolatileEntry))dac_cast<TADDR>(buckets[SLOT_NEXT]);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hoyosjs - is this correct or from TADDR it must be another dac_cast ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like:

    static DPTR(PTR_VolatileEntry) GetNext(DPTR(PTR_VolatileEntry) buckets)
    {
        return dac_cast<DPTR(PTR_VolatileEntry)>(dac_cast<TADDR>(buckets[SLOT_NEXT]));
    }

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand this right buckets[SLOT_NEXT] is pointing to a new hash table that's an array of linked lists? Then it's

return dac_cast<DPTR(PTR_VolatileEntry)>(dac_cast<TADDR>(buckets[SLOT_NEXT]));

which can be simplified to

return dac_cast<DPTR(PTR_VolatileEntry)>(buckets[SLOT_NEXT]);

Copy link
Member Author

@VSadov VSadov Nov 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The buckets[SLOT_NEXT] will be pointing to the next larger table, if resize happened while you are doing a lookup. It will be rare. It is possible, even though extremely rare to see more than one new version.
Every new one is 2x larger than previous, so there can't be too many even in the ultimate worst case.

@VSadov
Copy link
Member Author

VSadov commented Nov 13, 2021

I have run diagnostics tests with bits right before my change and after (multiple times). I do not see any regressions due to the change.

@VSadov
Copy link
Member Author

VSadov commented Nov 13, 2021

The new test failure on Win arm64 is caused by #61548 - it is not because of these changes.

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

Copy link
Member

@hoyosjs hoyosjs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DAC changes lgtm

@VSadov
Copy link
Member Author

VSadov commented Nov 14, 2021

Thanks!!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants