-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HTTP/3] SIGABRT in stress tests #72696
Comments
Tagging subscribers to this area: @dotnet/ncl Issue DetailsToday's stress test run crashed with segmentation fault after 28 mins https://dev.azure.com/dnceng/public/_build/results?buildId=1897576&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=1451f5f3-0108-5a08-5b92-e984b2a85bbd&l=1570 Funny enough, this run was scheduled with this ed5aa3d commit on top 😄 @rzikm (might be totally unrelated)
|
I'm not able to reproduce it and the crash dumps from the pipeline are useless since the native is in release build. |
@CarnaViire can you please write down when was the first hit, and how often it happens / happened? |
It turned out that only the first occurrence was a segfault (exit code 139), all others are sigabrt (exit code 134). So it is still happening and it was not fixed by Mana's copying. 7/18-8/9 we've had ~32 not-crashing runs (30 min) and
UPD: new occurrences since 8/9
|
sigabrt is likely coming from Assert or unhandled exception. That should be visible IMHO from the dump. |
Abort is coming from MsQuic assert in MsQuicStreamSend https://github.com/microsoft/msquic/blob/d50ee4b831fca4f057031c50d0b31a7484e7208b/src/core/api.c#L1089 It seems like pthread_mutex_lock returned a non-zero exit code https://github.com/microsoft/msquic/blob/a7e98a6bbc609efb5fb2f7ef484827a0f453e816/src/inc/quic_platform_posix.h#L346 |
Update: SIGABRT is most likely a result of the native heap corruption. Address Sanitizer has caught
The same heap corruption most possibly manifests as INVALID_PARAMETER in #73688. I am investigating further. |
Does this fix meet the bar to get backported to the RC1? One of the backport PRs hit this failure there. |
I believe it does @carlossanlop -- this is a significant reliability issue |
While I think I have caught all the send buffers related problems (I'll put up a PR shortly), there are still some problems remaining which also result in crashes. Address Sanitizer catches this:
@nibanks do you possibly have any insights/hints on what could have caused this? |
My initial guess is that you're calling |
Reopening for 7.0 backport |
Today's stress test run crashed with segmentation fault after 28 mins https://dev.azure.com/dnceng/public/_build/results?buildId=1897576&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=1451f5f3-0108-5a08-5b92-e984b2a85bbd&l=1570
Funny enough, this run was scheduled with this ed5aa3d commit on top 😄 @rzikm (might be totally unrelated)
The text was updated successfully, but these errors were encountered: