[Mono]: Fix infrequent infinite loop on Mono EventPipe streaming thread. #72517
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As observed by #59296, EventPipe streaming thread could infrequently cause an infinite loop on Mono when cleaning up stack hash map,
ep_rt_stack_hash_remove_all
called fromep_file_write_sequence_point
, flushing buffer memory into file stream.Issue only occurred on Release builds and so far, only observed on OSX, and reproduced in 1 of around 100 runs of the test suite.
After debugging the assembler when hitting the hang, it turns out that one item in the hash map has a hash key, that doesn't correspond to its hash bucket, this scenario should not be possible since items get placed into buckets based on hash key value that doesn't change for the lifetime of the item. This indicates that there is some sort of corruption happening to the key, after it has been added to the hash map.
After some more instrumentation it turns out that insert into the hash map infrequently triggers a replace, but Mono hash table used in EventPipe is setup to insert without replace, meaning it will keep old key but switch and free old value. Stack hash map uses same memory for its key and value, so freeing the old value will also free the key, but since old key is kept, it will point into freed memory and future reuse of that memory region will cause corruption of the hash table key.
This scenario should not be possible since EventPipe code will only add to the hash map, if the item is not already in the hash map. After some further investigation it turns out that the call to
ep_rt_stack_hash_lookup
reports false, while call toep_rt_stack_hash_add
for the same key will hit replace scenario ing_hash_table_insert_replace
.g_hash_table_insert_replace
finds item in the hash map, using callbacks for hash and equal of hash keys. It turns out that the equal callback is defined to returngboolean
, while the callback implementation used in EventPipe is defined to returnbool
.gboolean
is typed asint32_t
on Mono and this is the root cause of the complete issue. On optimized OSX build (potential on other platforms) the callback will do amemcmp
(updating fulleax
register) and when returning from callback, callback will only update first byte ofeax
register to 0/1, keeping upper bits, so ifmemcmp
returns negative value or a positive value bigger than first byte,eax
will contains garbage in byte 2, 3 and 4, but since Mono'sg_hash_table_insert_replace
expectsgboolean
, it will look at completeeax
content meaning if any of the bits in byte 2, 3 or 4 are still set, condition will still be true, even if byte 1 is 0, representing false, incorrectly trigger the replace logic, freeing the old value and key opening up for future corruption of the key, now reference freed memory.Fix is to make sure the callback signatures used with hash map callbacks, match expected signatures of underlying container implementation. Fix also adds a checked build assert into hash map’s add implementation on Mono validating that the added key is not already contained in the hash map enforcing callers to check for existence before calling add on hash map.
NOTE, CoreCLR is not affected by this since the issue is in Mono specific EventPipe layer and custom hash map callbacks are not even in use by CoreCLR, instead it uses underlying C++ hash map, with EventPipeCoreCLRStackHashTraits implementing needed functionality.
Fixes #59296
Fixes #54801