Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wasm] runtime tests timing out on Build Tests step for rolling builds #93134

Closed
radical opened this issue Oct 6, 2023 · 10 comments
Closed

[wasm] runtime tests timing out on Build Tests step for rolling builds #93134

radical opened this issue Oct 6, 2023 · 10 comments
Labels

Comments

@radical
Copy link
Member

radical commented Oct 6, 2023

Runtime tests job has been timing out on rolling builds, specifically on the Build Tests step. The last passing rolling build was https://dev.azure.com/dnceng-public/public/_build/results?buildId=428248 , and the first failing one https://dev.azure.com/dnceng-public/public/_build/results?buildId=429084 .

The changes between them 52aca8c...9dfbb61 .

There were 3 PRs with changes in src/tests:

cc @fanyang-mono @trylek @akoeplinger

Known Issue Error Message

Fill the error message using step by step known issues guidance.

{
  "ErrorMessage": "",
  "ErrorPattern": "",
  "BuildRetry": false,
  "ExcludeConsoleLog": false
}

Report

Summary

24-Hour Hit Count 7-Day Hit Count 1-Month Count
0 0 0
@radical radical added arch-wasm WebAssembly architecture blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' labels Oct 6, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Oct 6, 2023
@ghost
Copy link

ghost commented Oct 6, 2023

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

Issue Details

Runtime tests job has been timing out on rolling builds, specifically on the Build Tests step. The last passing rolling build was https://dev.azure.com/dnceng-public/public/_build/results?buildId=428248 , and the first failing one https://dev.azure.com/dnceng-public/public/_build/results?buildId=429084 .

The changes between them 52aca8c...9dfbb61 .

There were 3 PRs with changes in src/tests:

cc @fanyang-mono @trylek @akoeplinger

Author: radical
Assignees: -
Labels:

arch-wasm, blocking-clean-ci

Milestone: -

@radical radical changed the title [wasm] runtime tests failing timing out on rolling builds [wasm] runtime tests timing out on Build Tests steps for rolling builds Oct 6, 2023
@radical
Copy link
Member Author

radical commented Oct 6, 2023

@trylek The runtime tests job on your PR #92029 timed out too, but the PR was merged before the job was completed. Could you please take a look at this?

@radical radical changed the title [wasm] runtime tests timing out on Build Tests steps for rolling builds [wasm] runtime tests timing out on Build Tests step for rolling builds Oct 6, 2023
@akoeplinger
Copy link
Member

An important thing to notice here is this message which gets shown in the root log:

##[warning]Agent  was purged, cancelling the pipeline.
,##[warning]Received request to deprovision: The request was cancelled by the remote provider.

As far as I know this means the AzDO agent was killed somehow. Maybe we're hitting OOM with the AOT compiler due to the increased number of tests in the merged tests setup? But it's weird given this is a monointerpreter job so I assume it should do little AOTing.

@radical
Copy link
Member Author

radical commented Oct 11, 2023

@lewing we might be hitting container limits for runtime tests again. These have been failing since Oct 5, so we are not running any runtime tests.

@radical
Copy link
Member Author

radical commented Oct 11, 2023

An important thing to notice here is this message which gets shown in the root log:

##[warning]Agent  was purged, cancelling the pipeline.
,##[warning]Received request to deprovision: The request was cancelled by the remote provider.

As far as I know this means the AzDO agent was killed somehow. Maybe we're hitting OOM with the AOT compiler due to the increased number of tests in the merged tests setup? But it's weird given this is a monointerpreter job so I assume it should do little AOTing.

Actually, if it was hitting container limits then it should have failed much sooner, but the job failed/got-cancelled after 4h 1m 9s!

@radical
Copy link
Member Author

radical commented Oct 11, 2023

@akoeplinger is there any way to get the console log from that, even if it might be truncated near the end?

@akoeplinger
Copy link
Member

@radical I don't think so, it looks like AzDO doesn't preserve the log in this case...

but the job failed/got-cancelled after 4h 1m 9s!

It is wildly different between runs, I've also seen it after 1h50m.

@radical
Copy link
Member Author

radical commented Oct 11, 2023

@trylek Could you please take a look at this? I think #92029 might be the one causing this. This is happening consistently since Oct 5.
cc @lewing

Meanwhile, I'm trying a revert - #93362 .

radical added a commit to radical/runtime that referenced this issue Oct 13, 2023
The runtime tests build for wasm have been timing out on CI. It seems that the
webcil conversion part is causing the memory limits on the container to
be hit. Disable this temporarily to get the runtime tests building
again.

In a follow up PR, this part can be moved to helix, same as we do for
non-merged test runners.

Issue: dotnet#93134
@radical radical removed the untriaged New issue has not been triaged by the area owner label Oct 18, 2023
@lewing
Copy link
Member

lewing commented Oct 18, 2023

this is super frustrating, not only does the build get cancelled by infrastructure with no details but then the github integration doesn't see the cancellation reliably so the build gets stuck in an inconsistent state. @JulieLeeMSFT @agocke is there a way to deal with the build analysis/infrastructure issue here?

radical added a commit to radical/runtime that referenced this issue Oct 18, 2023
Runtime tests build for wasm started timing out recently, with the job
getting cancelled, likely due to the build hitting the container limits.

This changes the inner test builds to not run in parallel at all, with
`/m:1`, which unblocks the CI at least.

Issue: dotnet#93134
radical added a commit that referenced this issue Oct 18, 2023
* [wasm] Fix runtime tests build on CI

Runtime tests build for wasm started timing out recently, with the job
getting cancelled, likely due to the build hitting the container limits.

This changes the inner test builds to not run in parallel at all, with
`/m:1`, which unblocks the CI at least.

Issue: #93134

* Disable runtime tests that require native libraries, which are not currently built for wasm
@radical
Copy link
Member Author

radical commented Oct 20, 2023

Fixed by #93646 .

@radical radical closed this as completed Oct 20, 2023
@radical radical removed the blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' label Oct 20, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Nov 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants