Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NativeAOT legs timing out in CI #102239

Closed
stephentoub opened this issue May 15, 2024 · 13 comments
Closed

NativeAOT legs timing out in CI #102239

stephentoub opened this issue May 15, 2024 · 13 comments
Labels
area-NativeAOT-coreclr blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab os-linux Linux OS (any supported distro)

Comments

@stephentoub
Copy link
Member

stephentoub commented May 15, 2024

Build Information

Build: https://dev.azure.com/dnceng-public/cbb18261-c48f-4abb-8651-8cdcb5474649/_build/results?buildId=675463
Build error leg or test failing: Build / linux-x64 Debug NativeAOT
Pull request: #102176

Error Message

Fill the error message using step by step known issues guidance.

{
  "ErrorMessage": "ran longer than the maximum time of 120 minutes",
  "ErrorPattern": "",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}

Known issue validation

Build: 🔎 https://dev.azure.com/dnceng-public/public/_build/results?buildId=675463
Error message validated: [ran longer than the maximum time of 120 minutes]
Result validation: ✅ Known issue matched with the provided build.
Validation performed at: 5/15/2024 3:00:04 AM UTC

Report

Build Definition Step Name Console log Pull Request
714256 dotnet/runtime linux-x64 Debug NativeAOT Log #103228
714245 dotnet/runtime linux-x64 Release NativeAOT Log #103705
714236 dotnet/runtime linux-x64 Release NativeAOT Log #103574
714225 dotnet/runtime linux-x64 Release NativeAOT Log #103765
714215 dotnet/runtime linux-x64 Release NativeAOT Log #103555
714212 dotnet/runtime linux-x64 Release NativeAOT Log #103738
714171 dotnet/runtime linux-x64 Debug NativeAOT Log #103752
714143 dotnet/runtime linux-x64 Debug NativeAOT Log #100334
714100 dotnet/runtime linux-x64 Release NativeAOT Log #103761
2478036 dotnet-runtime windows-x64 release CoreCLR
2477886 dotnet-runtime windows-x86 release CoreCLR
713594 dotnet/runtime linux-x64 Release NativeAOT Log #103737
713590 dotnet/runtime linux-x64 Release NativeAOT Log #103617
713355 dotnet/runtime linux-x64 Debug NativeAOT Log #103724
713583 dotnet/runtime linux-x64 Debug NativeAOT Log #103607
713565 dotnet/runtime linux-x64 Release NativeAOT Log #103728
713560 dotnet/runtime linux-x64 Release NativeAOT Log #103663
713553 dotnet/runtime linux-x64 Debug NativeAOT Log #103673
713550 dotnet/runtime linux-x64 Release NativeAOT Log #103735
2477796 dotnet-runtime osx-arm64 release CrossAOT_Mono crossaot
713460 dotnet/runtime linux-x64 Release NativeAOT Log #103731
713456 dotnet/runtime linux-x64 Release NativeAOT Log #103104
712017 dotnet/runtime linux-x64 Release NativeAOT Log #103527
713414 dotnet/runtime linux-x64 Release NativeAOT Log #103555
713368 dotnet/runtime linux-x64 Release NativeAOT Log #103725
713359 dotnet/runtime linux-x64 Release NativeAOT Log
713343 dotnet/runtime linux-x64 Release NativeAOT Log #103723
713329 dotnet/runtime linux-x64 Debug NativeAOT Log #103704
713315 dotnet/runtime linux-x64 Release NativeAOT Log #103701
713309 dotnet/runtime linux-x64 Debug NativeAOT Log #103181
713295 dotnet/runtime linux-x64 Release NativeAOT Log #103721
713254 dotnet/runtime linux-x64 Release NativeAOT Log #103617
713243 dotnet/runtime linux-x64 Release NativeAOT Log #102403
713168 dotnet/runtime linux-x64 Debug NativeAOT Log #100334
713158 dotnet/runtime linux-x64 Debug NativeAOT Log #103681
713150 dotnet/runtime linux-x64 Release NativeAOT Log #103444
713124 dotnet/runtime linux-x64 Debug NativeAOT Log #103184
712223 dotnet/runtime linux-x64 Debug NativeAOT Log #103540
712399 dotnet/runtime linux-arm64 Release NativeAOT Log #103680
712339 dotnet/runtime linux-x64 Release NativeAOT Log #103648
711761 dotnet/runtime linux-x64 Release NativeAOT Log #103498
712283 dotnet/runtime linux-x64 Debug NativeAOT Log #103667
712286 dotnet/runtime linux-x64 Release NativeAOT Log #103676
711726 dotnet/runtime linux-x64 Release NativeAOT Log #103612
711695 dotnet/runtime linux-x64 Release NativeAOT Log #103646
712239 dotnet/runtime linux-x64 Release NativeAOT Log #103638
712210 dotnet/runtime linux-x64 Release NativeAOT Log #103673
712203 dotnet/runtime linux-x64 Release NativeAOT Log #103144
712188 dotnet/runtime linux-x64 Release NativeAOT Log #102611
711809 dotnet/runtime linux-x64 Release NativeAOT Log #103649
712154 dotnet/runtime linux-x64 Debug NativeAOT Log #103361
712077 dotnet/runtime linux-x64 Debug NativeAOT Log #103607
712047 dotnet/runtime linux-x64 Release NativeAOT Log #103444
712025 dotnet/runtime linux-x64 Release NativeAOT Log
711038 dotnet/runtime linux-x64 Release NativeAOT Log #103540
711991 dotnet/runtime linux-x64 Release NativeAOT Log #103667
711980 dotnet/runtime linux-x64 Release NativeAOT Log #103668
711952 dotnet/runtime linux-x64 Release NativeAOT Log #103665
711941 dotnet/runtime linux-x64 Release NativeAOT Log #103663
711932 dotnet/runtime linux-x64 Release NativeAOT Log #103594
711915 dotnet/runtime linux-x64 Release NativeAOT Log #103661
711892 dotnet/runtime linux-x64 Release NativeAOT Log #103659
711882 dotnet/runtime linux-x64 Debug NativeAOT Log #103657
711431 dotnet/runtime linux-x64 Release NativeAOT Log #103631
711832 dotnet/runtime linux-x64 Release NativeAOT Log #103620
711814 dotnet/runtime linux-x64 Release NativeAOT Log #103559
711758 dotnet/runtime linux-x64 Release NativeAOT Log #103654
711705 dotnet/runtime linux-x64 Release NativeAOT Log #103648
711619 dotnet/runtime linux-x64 Release NativeAOT Log #101796
711611 dotnet/runtime linux-x64 Release NativeAOT Log #103564
711602 dotnet/runtime linux-x64 Release NativeAOT Log #103560
711587 dotnet/runtime linux-x64 Debug NativeAOT Log #103626
711541 dotnet/runtime linux-x64 Release NativeAOT Log #103574
711514 dotnet/runtime linux-x64 Debug NativeAOT Log #103635
711489 dotnet/runtime linux-x64 Release NativeAOT Log #103634
711454 dotnet/runtime linux-x64 Release NativeAOT Log #103412
711413 dotnet/runtime linux-x64 Debug NativeAOT Log #100334
2476470 dotnet-runtime Performance osx x64 release iOSMono JIT ios_scenarios perfiphone12mini NoJS False False True net9.0 Log
711283 dotnet/runtime windows-x64 Release NativeAOT Log #103570
711048 dotnet/runtime linux-x64 Release NativeAOT Log #103317
710999 dotnet/runtime linux-x64 Release NativeAOT Log #103610
710912 dotnet/runtime CoreCLR Product Build OSX x64 checked Log #103603
710917 dotnet/runtime Build OSX x64 release Runtime_Release Log #103603
709734 dotnet/runtime linux-arm64 Release NativeAOT Log #103274
2475197 dotnet-runtime windows-x64 release CoreCLR
2475087 dotnet-runtime windows-x86 release CoreCLR
709580 dotnet/runtime osx-arm64 Release NativeAOT Log #100334
709579 dotnet/runtime windows-arm64 Release NativeAOT Log
709562 dotnet/runtime windows-arm64 Release NativeAOT Log #103458
709426 dotnet/runtime osx-x64 Release NativeAOT Log #99555
709383 dotnet/runtime osx-x64 Release NativeAOT Log #103339
709363 dotnet/runtime osx-arm64 Release NativeAOT Log #103412
709146 dotnet/runtime windows-x64 Debug NativeAOT Log #103503
708771 dotnet/runtime windows-x64 Debug NativeAOT Log #103504
708213 dotnet/runtime CoreCLR Product Build OSX arm64 release Log #103260
708200 dotnet/runtime CoreCLR Product Build OSX arm64 checked Log
707755 dotnet/runtime windows-x64 Release NativeAOT Log #103326
707735 dotnet/runtime windows-x64 Release NativeAOT Log #103469
2473604 dotnet-runtime windows-arm64 release CoreCLR
2473438 dotnet-runtime windows-x86 release CoreCLR
Displaying 100 of 300 results

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
31 100 300
@stephentoub stephentoub added blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab labels May 15, 2024
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label May 15, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-infrastructure-libraries
See info in area-owners.md if you want to be subscribed.

@MichalStrehovsky
Copy link
Member

Clicking through, the problem is always the same - we finish product build in 20 minutes and send 5 workitems to helix (each of which takes less than a minute to run). We then wait for 100 minutes for these to finish. Then we timeout. Then 2 more hours later, the Helix workitems get finally scheduled and finish.

Digging into Helix logs, it always looks something like this:

    "Delay": "04:17:08.6820000",
    "Duration": "00:00:17.8290000",

We could increase the timeout to 5 hours but that feels excessive.

@agocke
Copy link
Member

agocke commented May 16, 2024

@markwilkie Could you comment on what "Delay" means here? Is there something holding up the run?

@markwilkie markwilkie assigned markwilkie and unassigned markwilkie May 16, 2024
@markwilkie
Copy link
Member

markwilkie commented May 16, 2024

@chcosta - any thoughts as to what 'Delay' means here?

"Delay": "04:17:08.6820000",
"Duration": "00:00:17.8290000",

@chcosta
Copy link
Member

chcosta commented May 16, 2024

looking

@chcosta
Copy link
Member

chcosta commented May 17, 2024

I haven't had much time to dig into this yet

Copy link
Contributor

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

@agocke
Copy link
Member

agocke commented Jun 10, 2024

@chcosta any update here?

@chcosta
Copy link
Member

chcosta commented Jun 10, 2024

Sadly no, I had time to dig in a little, and only got as far as confirming what @MichalStrehovsky was seeing. I couldn't find any additional insight into what caused the Delay, only that it represents the amount of time between queue time and start time.

@steveisok
Copy link
Member

In https://dev.azure.com/dnceng-public/public/_build/results?buildId=705793&view=logs&j=ddb4415b-4613-5bce-e937-0da25336f8b9&t=d2b408ad-4ef2-5ad9-4f1a-57f7ca85d7e0 , I see all the helix jobs completing super fast, but it still times out.

Could this be an issue with test results relay or something post run?

If you look at the helix job list, each one shows nothing abnormal.

https://helix.dot.net/api/jobs/448572db-1838-4ee6-b27c-fb612c6cf3b9/workitems?api-version=2019-06-17

@agocke
Copy link
Member

agocke commented Jun 20, 2024

Yeah still confused why aot is being hit more frequently

@agocke
Copy link
Member

agocke commented Jun 20, 2024

Sven figured it out. It's because the time out for the Native AOT tests is 120 minutes, which is different from many other tests (libraries have 180 for example). This issue is just specifically catching Native AOT tests because the message includes the timeout.

The real cause of all of this is queue overload. It has nothing to do with Native AOT.

@agocke agocke closed this as completed Jun 20, 2024
@dotnet-policy-service dotnet-policy-service bot removed the untriaged New issue has not been triaged by the area owner label Jun 20, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Jul 21, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-NativeAOT-coreclr blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' Known Build Error Use this to report build issues in the .NET Helix tab os-linux Linux OS (any supported distro)
Projects
Archived in project
Development

No branches or pull requests

8 participants