Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.IO.Tests crash in CI (Linux arm64) #100441

Open
jkotas opened this issue Mar 29, 2024 · 7 comments
Open

System.IO.Tests crash in CI (Linux arm64) #100441

jkotas opened this issue Mar 29, 2024 · 7 comments
Labels
area-System.IO blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab
Milestone

Comments

@jkotas
Copy link
Member

jkotas commented Mar 29, 2024

  Discovering: System.IO.Tests (method display = ClassAndMethod, method display options = None)
  Discovered:  System.IO.Tests (found 736 of 744 test cases)
  Starting:    System.IO.Tests (parallel test collections = on [2 threads], stop on fail = off)
./RunTests.sh: line 180:    20 Killed                  "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.IO.Tests.runtimeconfig.json --depsfile System.IO.Tests.deps.json xunit.console.dll System.IO.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing $RSP_FILE
/root/helix/work/workitem/e
----- end Fri Mar 29 11:20:29 UTC 2024 ----- exit code 137 ----------------------------------------------------------

Build Information

Build: https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=623676
Build error leg or test failing: System.IO.Tests.WorkItemExecution
Pull request: #100433

Error Message

Fill the error message using step by step known issues guidance.

{
  "ErrorMessage": ["arm64", "System.IO.Tests", "Killed", "-- exit code 137 --"],
  "ErrorPattern": "",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}

Known issue validation

Build: 🔎 https://dev.azure.com/dnceng-public/public/_build/results?buildId=623676
Error message validated: [arm64 System.IO.Tests Killed -- exit code 137 --]
Result validation: ✅ Known issue matched with the provided build.
Validation performed at: 3/29/2024 2:52:42 PM UTC

Report

Build Definition Test Pull Request
793852 dotnet/runtime System.IO.Tests.WorkItemExecution #106924
793027 dotnet/runtime System.IO.Tests.WorkItemExecution #107147
792370 dotnet/runtime System.IO.Tests.WorkItemExecution #107133
791827 dotnet/runtime System.IO.Tests.WorkItemExecution #107117
790653 dotnet/runtime System.IO.Tests.WorkItemExecution #107064
790524 dotnet/runtime System.IO.Tests.WorkItemExecution #107058
790074 dotnet/runtime System.IO.Tests.WorkItemExecution #107038
789864 dotnet/runtime System.IO.Tests.WorkItemExecution #106924
789035 dotnet/runtime System.IO.Tests.WorkItemExecution #106909
789026 dotnet/runtime System.IO.Tests.WorkItemExecution #106988
788782 dotnet/runtime System.IO.Tests.WorkItemExecution #106787
788768 dotnet/runtime System.IO.Tests.WorkItemExecution #106854
786915 dotnet/runtime System.IO.Tests.WorkItemExecution #106735
786597 dotnet/runtime System.IO.Tests.WorkItemExecution #105771
785793 dotnet/runtime System.IO.Tests.WorkItemExecution #106765

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 15
@jkotas jkotas added blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab labels Mar 29, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-io
See info in area-owners.md if you want to be subscribed.

@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Mar 29, 2024
@jozkee jozkee added this to the 9.0.0 milestone Jul 3, 2024
@jozkee jozkee removed the untriaged New issue has not been triaged by the area owner label Jul 3, 2024
@adamsitnik
Copy link
Member

137 means out of memory. The tests started to fail not only in main but also in older branches where we have not touched the code at all: #100558

@dotnet/area-infrastructure-libraries Is it possible that the test VMs simply have less memory available now?

@ViktorHofer
Copy link
Member

ViktorHofer commented Jul 18, 2024

I don't think that we have access to that information for a Helix test client. Might make sense to print some diagnostics in the RunTests.sh/cmd script, i.e. available RAM and disk space.

@carlossanlop
Copy link
Member

carlossanlop commented Jul 18, 2024

Is it possible that the test VMs simply have less memory available now?

@adamsitnik I'd be surprised if something like that happened, but we can double check: @dotnet/dnceng do you know?

The thing is, this OOM failure is only happening in System.IO and System.IO.Net5Compat . I am pretty sure I don't see it anywhere else.

One thing that could help you is that this failure is also happening in 6.0 and 8.0, meaning something got backported, so that could help you narrow down the checkins, as we don't modify System.IO often. Nevermind, you already answered that above.

@carlossanlop
Copy link
Member

This is an intermittent issue, so maybe widen up the dates a bit more? When was the last time a System.IO change happened in servicing before April?

@jkotas
Copy link
Member Author

jkotas commented Jul 18, 2024

The failure was most likely triggered by Linux kernel update, docker container update or test infra update. These updates are rolled out regularly in the background. I do not think it is a good use of time to try to find the exact update that triggered this failure months ago. We won't be able to do much with that information.

The failure is likely triggered by a test that consumes too many resources. It does not have to be direct memory use. For example, the test can be creating too many file handles that manifests as 137. I think we should try to find the offending test or tests, e.g. by trying to reproduce the failure with verbose logging.

@jeffhandley jeffhandley modified the milestones: 9.0.0, 10.0.0 Aug 6, 2024
@JulieLeeMSFT
Copy link
Member

Failed for below leg in runtime-coreclr libraries-pgo/20240810.1

 net9.0-linux-Release-arm64-fullpgo_random_gdv-(Ubuntu.2004.Arm64.Open)Ubuntu.2004.Armarch.Open@mcr.microsoft.com/dotnet-buildtools/prereqs:ubuntu-20.04-helix-arm64v8
  Starting:    System.IO.Tests (parallel test collections = on [2 threads], stop on fail = off)
./RunTests.sh: line 182:    26 Killed                  "$RUNTIME_PATH/dotnet" exec --runtimeconfig System.IO.Tests.runtimeconfig.json --depsfile System.IO.Tests.deps.json xunit.console.dll System.IO.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing $RSP_FILE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-System.IO blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab
Projects
None yet
Development

No branches or pull requests

7 participants