Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

segfault in libcoreclr.so`CallDescrWorkerWithHandler #66970

Closed
wfurt opened this issue Mar 22, 2022 · 8 comments · Fixed by #68311
Closed

segfault in libcoreclr.so`CallDescrWorkerWithHandler #66970

wfurt opened this issue Mar 22, 2022 · 8 comments · Fixed by #68311
Assignees
Labels
area-VM-coreclr test-enhancement Improvements of test source code
Milestone

Comments

@wfurt
Copy link
Member

wfurt commented Mar 22, 2022

https://dev.azure.com/dnceng/public/_build?definitionId=690&_a=summary
Seems to start failing consistently around end of last week
There is not much info but the xunit even did not start.
Unfortunately this seems to run only on selected PRs so there is no good tracking when it start failing.
I feel we should add periodic runs and guard agains infrastructure changes.

  System.Net.Http.Enterprise.Tests -> /repo/artifacts/bin/System.Net.Http.Enterprise.Tests/Debug/net7.0-unix/System.Net.Http.Enterprise.Tests.dll
  ----- start Mon Mar 21 23:47:19 UTC 2022 =============== To repro directly: =====================================================
  pushd /repo/artifacts/bin/System.Net.Http.Enterprise.Tests/Debug/net7.0-unix
  /repo/artifacts/bin/testhost/net7.0-Linux-Debug-x64/dotnet exec --runtimeconfig System.Net.Http.Enterprise.Tests.runtimeconfig.json --depsfile System.Net.Http.Enterprise.Tests.deps.json xunit.console.dll System.Net.Http.Enterprise.Tests.dll -xml testResults.xml -nologo -notrait category=OuterLoop -notrait category=failing 
  popd
  ===========================================================================================================
  /repo/artifacts/bin/System.Net.Http.Enterprise.Tests/Debug/net7.0-unix /repo/src/libraries/System.Net.Http/tests/EnterpriseTests
  /repo/artifacts/bin/System.Net.Http.Enterprise.Tests/Debug/net7.0-unix/RunTests.sh: line 168: 23527 Segmentation fault      (core dumped) "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.Net.Http.Enterprise.Tests.runtimeconfig.json --depsfile System.Net.Http.Enterprise.Tests.deps.json xunit.console.dll System.Net.Http.Enterprise.Tests.dll -xml testResults.xml -nologo -notrait category=OuterLoop -notrait category=failing $RSP_FILE
  /repo/src/libraries/System.Net.Http/tests/EnterpriseTests
  ----- end Mon Mar 21 23:47:20 UTC 2022 ----- exit code 139 ----------------------------------------------------------

cc: @dotnet/ncl

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Mar 22, 2022
@ghost
Copy link

ghost commented Mar 22, 2022

Tagging subscribers to this area: @dotnet/runtime-infrastructure
See info in area-owners.md if you want to be subscribed.

Issue Details

https://dev.azure.com/dnceng/public/_build?definitionId=690&_a=summary
Seems to start failing consistently around end of last week
There is not much info but the xunit even did not start.
Unfortunately this seems to run only on selected PRs so there is no good tracking when it start failing.
I feel we should add periodic runs and guard agains infrastructure changes.

  System.Net.Http.Enterprise.Tests -> /repo/artifacts/bin/System.Net.Http.Enterprise.Tests/Debug/net7.0-unix/System.Net.Http.Enterprise.Tests.dll
  ----- start Mon Mar 21 23:47:19 UTC 2022 =============== To repro directly: =====================================================
  pushd /repo/artifacts/bin/System.Net.Http.Enterprise.Tests/Debug/net7.0-unix
  /repo/artifacts/bin/testhost/net7.0-Linux-Debug-x64/dotnet exec --runtimeconfig System.Net.Http.Enterprise.Tests.runtimeconfig.json --depsfile System.Net.Http.Enterprise.Tests.deps.json xunit.console.dll System.Net.Http.Enterprise.Tests.dll -xml testResults.xml -nologo -notrait category=OuterLoop -notrait category=failing 
  popd
  ===========================================================================================================
  /repo/artifacts/bin/System.Net.Http.Enterprise.Tests/Debug/net7.0-unix /repo/src/libraries/System.Net.Http/tests/EnterpriseTests
  /repo/artifacts/bin/System.Net.Http.Enterprise.Tests/Debug/net7.0-unix/RunTests.sh: line 168: 23527 Segmentation fault      (core dumped) "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.Net.Http.Enterprise.Tests.runtimeconfig.json --depsfile System.Net.Http.Enterprise.Tests.deps.json xunit.console.dll System.Net.Http.Enterprise.Tests.dll -xml testResults.xml -nologo -notrait category=OuterLoop -notrait category=failing $RSP_FILE
  /repo/src/libraries/System.Net.Http/tests/EnterpriseTests
  ----- end Mon Mar 21 23:47:20 UTC 2022 ----- exit code 139 ----------------------------------------------------------

cc: @dotnet/ncl

Author: wfurt
Assignees: -
Labels:

area-Infrastructure

Milestone: -

@ghost
Copy link

ghost commented Mar 28, 2022

Tagging subscribers to this area: @dotnet/area-infrastructure-libraries
See info in area-owners.md if you want to be subscribed.

Issue Details

https://dev.azure.com/dnceng/public/_build?definitionId=690&_a=summary
Seems to start failing consistently around end of last week
There is not much info but the xunit even did not start.
Unfortunately this seems to run only on selected PRs so there is no good tracking when it start failing.
I feel we should add periodic runs and guard agains infrastructure changes.

  System.Net.Http.Enterprise.Tests -> /repo/artifacts/bin/System.Net.Http.Enterprise.Tests/Debug/net7.0-unix/System.Net.Http.Enterprise.Tests.dll
  ----- start Mon Mar 21 23:47:19 UTC 2022 =============== To repro directly: =====================================================
  pushd /repo/artifacts/bin/System.Net.Http.Enterprise.Tests/Debug/net7.0-unix
  /repo/artifacts/bin/testhost/net7.0-Linux-Debug-x64/dotnet exec --runtimeconfig System.Net.Http.Enterprise.Tests.runtimeconfig.json --depsfile System.Net.Http.Enterprise.Tests.deps.json xunit.console.dll System.Net.Http.Enterprise.Tests.dll -xml testResults.xml -nologo -notrait category=OuterLoop -notrait category=failing 
  popd
  ===========================================================================================================
  /repo/artifacts/bin/System.Net.Http.Enterprise.Tests/Debug/net7.0-unix /repo/src/libraries/System.Net.Http/tests/EnterpriseTests
  /repo/artifacts/bin/System.Net.Http.Enterprise.Tests/Debug/net7.0-unix/RunTests.sh: line 168: 23527 Segmentation fault      (core dumped) "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.Net.Http.Enterprise.Tests.runtimeconfig.json --depsfile System.Net.Http.Enterprise.Tests.deps.json xunit.console.dll System.Net.Http.Enterprise.Tests.dll -xml testResults.xml -nologo -notrait category=OuterLoop -notrait category=failing $RSP_FILE
  /repo/src/libraries/System.Net.Http/tests/EnterpriseTests
  ----- end Mon Mar 21 23:47:20 UTC 2022 ----- exit code 139 ----------------------------------------------------------

cc: @dotnet/ncl

Author: wfurt
Assignees: -
Labels:

area-Infrastructure-libraries, area-Infrastructure, untriaged

Milestone: -

@agocke agocke added area-System.Net.Http and removed area-Infrastructure-libraries area-Infrastructure untriaged New issue has not been triaged by the area owner labels Mar 28, 2022
@ghost
Copy link

ghost commented Mar 28, 2022

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

https://dev.azure.com/dnceng/public/_build?definitionId=690&_a=summary
Seems to start failing consistently around end of last week
There is not much info but the xunit even did not start.
Unfortunately this seems to run only on selected PRs so there is no good tracking when it start failing.
I feel we should add periodic runs and guard agains infrastructure changes.

  System.Net.Http.Enterprise.Tests -> /repo/artifacts/bin/System.Net.Http.Enterprise.Tests/Debug/net7.0-unix/System.Net.Http.Enterprise.Tests.dll
  ----- start Mon Mar 21 23:47:19 UTC 2022 =============== To repro directly: =====================================================
  pushd /repo/artifacts/bin/System.Net.Http.Enterprise.Tests/Debug/net7.0-unix
  /repo/artifacts/bin/testhost/net7.0-Linux-Debug-x64/dotnet exec --runtimeconfig System.Net.Http.Enterprise.Tests.runtimeconfig.json --depsfile System.Net.Http.Enterprise.Tests.deps.json xunit.console.dll System.Net.Http.Enterprise.Tests.dll -xml testResults.xml -nologo -notrait category=OuterLoop -notrait category=failing 
  popd
  ===========================================================================================================
  /repo/artifacts/bin/System.Net.Http.Enterprise.Tests/Debug/net7.0-unix /repo/src/libraries/System.Net.Http/tests/EnterpriseTests
  /repo/artifacts/bin/System.Net.Http.Enterprise.Tests/Debug/net7.0-unix/RunTests.sh: line 168: 23527 Segmentation fault      (core dumped) "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.Net.Http.Enterprise.Tests.runtimeconfig.json --depsfile System.Net.Http.Enterprise.Tests.deps.json xunit.console.dll System.Net.Http.Enterprise.Tests.dll -xml testResults.xml -nologo -notrait category=OuterLoop -notrait category=failing $RSP_FILE
  /repo/src/libraries/System.Net.Http/tests/EnterpriseTests
  ----- end Mon Mar 21 23:47:20 UTC 2022 ----- exit code 139 ----------------------------------------------------------

cc: @dotnet/ncl

Author: wfurt
Assignees: -
Labels:

area-System.Net.Http, area-Infrastructure, untriaged

Milestone: -

@agocke agocke added the untriaged New issue has not been triaged by the area owner label Mar 28, 2022
@wfurt
Copy link
Member Author

wfurt commented Mar 28, 2022

Why did you change the label @agocke ? I feel this got broken by changes outside of HTTP and HTTP itself is not culprit.

@agocke
Copy link
Member

agocke commented Mar 28, 2022

Since this is specific to those tests, it's likely it will need product expertise in those areas to investigate, as opposed to the broader runtime. If this is specific to coreeng infra, then runtime doesn't seem like the right place for this issue.

@karelz karelz added this to the 7.0.0 milestone Mar 29, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Mar 29, 2022
@karelz karelz added the test-enhancement Improvements of test source code label Mar 29, 2022
@wfurt
Copy link
Member Author

wfurt commented Apr 13, 2022

This looks like regression from #65738
thanks @jkotas for the pointer cc: @janvorli

basically when the coreclr starts it fails with SIGSEGCV

(lldb) bt
* thread #1, name = 'dotnet', stop reason = signal SIGSEGV: invalid address (fault address: 0x0)
  * frame #0: 0x00007fff7c81ea08
    frame #1: 0x00007ffff5fa458a libcoreclr.so`CallDescrWorkerWithHandler(pCallDescrData=0x00007fffffffd888, fCriticalCall=NO) at callhelpers.cpp:67:5
    frame #2: 0x00007ffff5fa52f2 libcoreclr.so`MethodDescCallSite::CallTargetWorker(this=0x00007fffffffdb40, pArguments=0x00007fffffffdc60, pReturnValue=0x0000000000000000, cbReturnValue=0) at callhelpers.cpp:538:9
    frame #3: 0x00007ffff5cffe33 libcoreclr.so`MethodDescCallSite::Call(this=0x00007fffffffdb40, pArguments=0x00007fffffffdc60) at callhelpers.h:458:9
    frame #4: 0x00007ffff5da0893 libcoreclr.so`CorHost2::CreateAppDomainWithManager(this=0x0000555555600fe0, wszFriendlyName=u"clrhost", dwFlags=0, wszAppDomainManagerAssemblyName=0x0000000000000000, wszAppDomainManagerTypeName=0x0000000000000000, nProperties=8, pPropertyNames=0x000055555559e650, pPropertyValues=0x000055555559e510, pAppDomainID=0x00007fffffffdf24) at corhost.cpp:630:15
    frame #5: 0x00007ffff5ce708e libcoreclr.so`::coreclr_initialize(exePath="/repo/artifacts/bin/testhost/net7.0-Linux-Debug-x64/dotnet", appDomainFriendlyName="clrhost", propertyCount=8, propertyKeys=0x0000555555596f50, propertyValues=0x0000555555593cc0, hostHandle=0x00007fffffffdf28, domainId=0x00007fffffffdf24) at exports.cpp:254:16
    frame #6: 0x00007ffff7f3e1ef libhostpolicy.so`___lldb_unnamed_symbol26$$libhostpolicy.so + 783
    frame #7: 0x00007ffff7f4dd21 libhostpolicy.so`___lldb_unnamed_symbol118$$libhostpolicy.so + 385
    frame #8: 0x00007ffff7f4d5da libhostpolicy.so`corehost_main + 154
    frame #9: 0x00007ffff7fa6a24 libhostfxr.so`___lldb_unnamed_symbol47$$libhostfxr.so + 1812
    frame #10: 0x00007ffff7fa5129 libhostfxr.so`___lldb_unnamed_symbol45$$libhostfxr.so + 665
    frame #11: 0x00007ffff7fa05db libhostfxr.so`hostfxr_main_startupinfo + 171
    frame #12: 0x000055555556b65a dotnet`___lldb_unnamed_symbol126$$dotnet + 938
    frame #13: 0x000055555556bad0 dotnet`___lldb_unnamed_symbol127$$dotnet + 144
    frame #14: 0x00007ffff6ca1bf7 libc.so.6`__libc_start_main(main=(dotnet`___lldb_unnamed_symbol127$$dotnet), argc=15, argv=0x00007fffffffe4c8, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007fffffffe4b8) at libc-start.c:310
    frame #15: 0x0000555555560029 dotnet`___lldb_unnamed_symbol1$$dotnet + 41
(lldb) ip2md 0x00007fff7c81ea08
Failed to request MethodData, not in JIT code range
IP2MD 0x00007fff7c81ea08  failed

The steps are tedious but it is easy to reproduce once the containers are all built. I don't know if the main trigger is /p:NativeOptimizationDataSupported=false as we fail to build PGO so we build bits without optimization.

https://github.com/dotnet/runtime/blob/main/eng/pipelines/libraries/enterprise/linux.yml

Please let me know @janvorli if you have any thoughts or if I can help anyhow with the investigation.

@danmoseley danmoseley changed the title runtime-libraries enterprise-linux is consistently failing to run segfault in libcoreclr.so`CallDescrWorkerWithHandler Apr 13, 2022
@janvorli
Copy link
Member

@wfurt can you please disassemble the code at 0x00007fff7c81ea08?

@wfurt
Copy link
Member Author

wfurt commented Apr 20, 2022

Seems like compiler bug. The container had old llvm 3.9. When we rebuild with llvm-10 the crash is gone.
I update the container and I'll try it again.

@wfurt wfurt self-assigned this Apr 20, 2022
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Apr 21, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Apr 21, 2022
@ghost ghost locked as resolved and limited conversation to collaborators May 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-VM-coreclr test-enhancement Improvements of test source code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants