-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tests] System.Text.Json.Tests segfault, for Libraries Test Run release coreclr OSX x64 Release
#47805
Comments
Tagging subscribers to this area: @eiriktsarpalis, @layomia Issue Details
|
seems similar to #46100 |
@ViktorHofer thoughts? I can't seem to find any crash dumps (via an "attachments" tab?) or other information. It may be related to #46100, but I can't find what test(s) were running. I logged this previously about the lack of information available during crash dumps; at a minimum having the active test(s) that were running at the time of the crash would be valuable. As a last resort, I could try running it on a local macbook repeatedly and hope it triggers the exception (with dump). |
crash dump: #46100 (comment) |
@steveharter did the crash dump work for you that @danmoseley shared above? |
We have some fresh dumps for this now, and thanks to recent effort by @safern, getting set up debugging them is practically a single command. I used this query -- let failed = Jobs And I clicked the link in the most recent hit, it pulls up how-to-debug-dump.md, which gives me the magic commands to paste in to get dump, symbols, and binaries pulled down, then open it in lldb, eg dotnet tool install --global runfo
dotnet tool update --global runfo
runfo get-helix-payload -j a18c2113-8e70-40f3-967c-ada5df3ab122 -w System.Text.Json.Tests -o ~/helix_payload/System.Text.Json.Tests\
lldb --core ~/helix_payload/System.Text.Json.Tests/workitems/System.Text.Json.Tests/core.41566 ~/helix_payload/System.Text.Json.Tests/shared/Microsoft.NETCore.App/6.0.0/dotnet It really is super easy now. Unfortunately I don't have a Mac, so I couldn't actually do the lldb part. @steveharter could you please take a look, and tell us how this goes for you? cc @jkotas |
cc @stephentoub in case he's interested in how easy it is to get to dumps on all OS'es by just copy/pasting a couple commands out the log. |
I have looked at 3 recent crash dumps. My notes are at https://gist.github.com/jkotas/e2221257fd59e90a7860a77b06dc0d03 All 3 dumps show consistent pattern: We crash while dispatching virtual interface call from The crash is caused by bit 0x2 set on MethodTable* of the object we are dispatching on. Also, there is a background GC running inside @Maoni0 I believe that this is a problem introduced by #43021. Could you please take a look? |
Tagging subscribers to this area: @dotnet/gc Issue Details
Runfo Tracking Issue: system.text.json.tests OSX failure
Displaying 100 of 583 results Build Result Summary
|
we are trying to get a local repro. @PeterSolMS will be doing the investigation. |
I've been able to get a repro on Windows/amd64 by setting the bit more eagerly (ignoring whether background GC needs it or not), and checking whether BGC needs the background mark bit set where we test the bit. This makes it fail apparently because the object is right before a pinned plug, and we save the object and restore it later. At the place where we test the bit in make_free_list_in_brick it has not been restored yet, apparently. |
ahh right we are not doing this for sweep like we do for compact_plug. but I think we should be able to make this case simpler for sweep by making sure we clear these bits when we recover. |
I've been able to catch 3 crash dumps so far, all of them have the bad object right in front of a pinned plug. In two cases, the size of the bad object was 32 bytes, in one case it was 24 bytes (a boxed System.Boolean). I wonder if we can simply generalize your fix for is_plug_padded, clearing the bits when we save to saved_pre_plug or saved_post_plug, but not for the copy saved to saved_pre_plug_reloc or saved_post_plug_reloc. |
@PeterSolMS yes I think that would work! |
Proposed fix in PR #52183. With this fix in place, the issue has repro'd in days. |
Fix issue #47805 - in this case, the BGC_MARKED_BY_FGC bit (0x2) set in the method table leaked out and caused issues for the user program. In the cases that I've been able to repro, this happened because the bit got set for a short object right in front of a pinned plug, and then saved away by enque_pinned_plug. Later on, in the case of mark & sweep, we check for the bit and reset it, but later we copy the saved object back by calling recover_saved_pinned_info which calls recover_plug_info to do the actual work. This isn't a problem for the compact case, because we copy the saved object back during compact_plug, turn the bit off, then save it again at the end of compact_plug. The fix is to turn off the extra bits at the beginning of enque_pinned_plug and save_post_plug_info for the copy that is later restored in mark & sweep (there are actually two copies saved, one for use during compact and one for use during mark & sweep). This builds on an earlier fix by Maoni for a similar problem with another bit.
PR has been merged. |
@PeterSolMS Thank you! |
log https://helixre107v0xdeko0k025g8.blob.core.windows.net/dotnet-runtime-refs-heads-master-9a64c530bf744066ac/System.Text.Json.Tests/console.f709eb52.log?sv=2019-07-07&se=2021-02-22T19%3A41%3A51Z&sr=c&sp=rl&sig=xU%2FU9Zqz9syAbZ7PJnLwtfBYk6DtHDt4CTs5OIRB6dI%3D
build https://dev.azure.com/dnceng/public/_build/results?buildId=975991&view=logs&j=05e92ac1-194e-59cf-664a-fa72d1cdd19b&t=caea1d4b-c90c-5be1-e57d-c4635079c333
Runfo Tracking Issue: system.text.json.tests OSX failure
Displaying 100 of 583 results
Build Result Summary
The text was updated successfully, but these errors were encountered: