-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tests] System.Text.Json.Tests segfault, for Libraries Test Run checked coreclr Linux_musl x64 Debug
#46100
Comments
Crash during background GC:
|
|
@jkotas is it possible to say where this should go - JIT, JSON, ..? |
|
We need more dumps to see the pattern |
@jkotas I think I see the same problem on Linux x64 with .NET Core 3.1.8:
In another crash, I have the same backtrace, but the heap is corrupted:
|
Crash with this stacktrace is a very generic symptom of GC hole or GC heap corruption. It is unlikely that the problem you are seeing has the same root cause as the problem that this issue is tracking. |
It will likely be necessary to run the STJ tests in a tight loop on a local Linux or OSX environment in attempt to repro the crash and narrow down the tests and call stacks. FWIW here's commits from the two weeks prior to the first reported failure on 12\15\2020 containing keywords including "JIT", "GC", and "OSX": 7696202 Auto-generate all C++ header overrides of the entire jit interface (#45920) |
Do we have the more dumps for this crash now? Note that this is crash during background GC, so the test that triggered the problem is likely finished running by the time the process crashes. The best way to investigate crashes like this one is stresslog. |
No new dumps that I'm aware of although we have been getting other failures that may be related: #47805 |
To make progress on this, we really need to fix the CI to collect the dumps for crashes like this one. I believe that it used to work at one point in not so distant past. Why is the CI not collecting the crash dumps anymore? |
If I go to the last failure listed in #47805 https://dev.azure.com/dnceng/public/_build/results?buildId=982059&view=ms.vss-test-web.build-test-results-tab&runId=30778306&resultId=173593&paneView=dotnet-dnceng.dnceng-build-release-tasks.helix-test-information-tab Does that work? |
What is a little more troublesome is going from Kusto (ie, a query of failures for the tests) to debugging the dump, because there is not a good way to get the dump file without a link to the build. I know @safern was working on a change so that |
I'm about to put a PR into runfo 😄 |
Nice! After that we can update how-to-debug-dump.md to match. |
Are there any dumps for the musl x64 failure that this issue is about? (I am not setup to debug the OSX x64 dump at the moment.) |
Here's what I find Jobs
then looking at just Alpine, I find only two Jobs
both both of these URLs give me <Error>
<Code>AuthenticationFailed</Code>
<Message>Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature. RequestId:f76a8b7b-b01e-00e3-71e3-0ba8a1000000 Time:2021-02-26T02:03:35.7795736Z</Message>
<AuthenticationErrorDetail>Signed expiry time [Mon, 18 Jan 2021 14:40:26 GMT] must be after signed start time [Fri, 26 Feb 2021 02:03:35 GMT]</AuthenticationErrorDetail>
</Error> @MattGal what's up there? if it's deleted, why doesn't it just give me 404? If I comment out the "Source" line (so we get PR's as well, which may hvae their own bugs) I get 23 hits, but only 2 URL's work ConsoleUri The second one looks most promising as it's x64. Of course, it may be a PR issue. @safern I will need your help getting to the dump. |
BTW, no evidence anything started on 12/15. adding
|
As promised: jaredpar/runfo#98. Once that is merged and a new version is published I'll update |
@danmoseley I couldn't find any crash for these tests on Alpine after my improvement where the crash wasn't caused by the PR change itself. let failed = Jobs |
So should we close this issue if we do not see these crashes anymore?
I assume that it is the OSX crash (#47805). Is that right? |
Yes. You are right. The Infrastructure health has both, but this one is falling off the list. it's the other that is hot. I think we can close it. I confirm what @safern saw, with this query Jobs There are two very old hits. |
Segfault seen in a test run: https://dev.azure.com/dnceng/public/_build/results?buildId=922206&view=ms.vss-test-web.build-test-results-tab&runId=29288458&paneView=debug&resultId=173384 on #46048 .
The text was updated successfully, but these errors were encountered: