-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime-dev-innerloop leg is frequently timing out on PRs #49309
Comments
Tagging subscribers to this area: @dotnet/runtime-infrastructure Issue Detailse.g.
|
Sad to see that some of the devices on CI can't build dotnet/runtime in under an hour. As a short term mitigation we probably want to bump the timeout to 90min. |
Actually by looking at the timestamps it seems like the build hung after creating the Microsoft.NETCore.App.Runtime package.
It spent 6 mins doing nothing there. |
Why do you think it's doing nothing? That could be crossgen as part of the runtime pack or any other component sequenced into the pack or after it which doesn't log anything. The binlog might tell. |
Hmm I didn't expect crossgen to take 6 mins but based on the binlog it seems like that is what was running before the build was killed. Is it intentional that ReadyToRunCompiler task is invoked per assembly we want to compile? https://github.com/dotnet/sdk/blob/94d0b50de0b3e58242c7d1e2ec606e0f408a37ee/src/Tasks/Microsoft.NET.Build.Tasks/targets/Microsoft.NET.Publish.targets#L404 -- rather than invoking it once with the whole list? cc: @trylek |
Yes, I believe this pause corresponds to Crossgen2. Right now RunReadyToRunCompiler is designed so that it processes just a single file - and if it did support multiple files, it would still need to spin a loop over the assemblies calling the compiler individually as the compiler itself doesn't support multiple output files (I've always been told that doesn't belong to the compiler as it's a duplication of build orchestration msbuild should be used for). While Crossgen2 does support internal parallelism, it's much less efficient than running multiple instances of the compiler process so I'd ideally expect RunReadyToRunCompiler to be run using the msbuild parallel process build; I'm not sure if it actually happens. |
The shared framework invokes the ReadyToRunCompiler per runtime package once for all assemblies in it: https://github.com/dotnet/arcade/blob/6cc4c1e9e23d5e65e88a8a57216b3d91e9b3d8db/src/Microsoft.DotNet.SharedFramework.Sdk/targets/sharedfx.targets#L70. Meaning the ReadyToRunCompiler task in the SDK will only be invoked once and gets all assemblies as an input. I believe we have two options:
@trylek thoughts? |
Hmm, I must admit I don't yet understand clearly where the loop over the input assemblies is getting expanded. The RunReadyToRunCompiler task apparently expects just a single CompilationEntry as its input: I guess it may be the case that this is somehow automatically expanded by msbuild when running the task from this place, when it recognizes that is passes multiple items in If we want to achieve parallelization, I guess there are two ways to go about it: first is rewriting the bit in At the first glance I think it would be much easier to modify the script to use parallel msbuild subproject build as the entire implementation of ToolTask / RunReadyToRunCompiler is centered around running a single compilation process and changing this concept would require basically throwing away all the basic underpinnings provided by ToolTask and manually open code the parallel process execution loop. |
FYI, we plan to improve logging when running crossgen2; see the second item in dotnet/sdk#15466. |
Still happening very frequently so we should apply the short term fix (bumping the timeout). |
Mitigates reported issues: #49309.
@safern you are right, I should have double checked. The _CreateR2RImages target batching happens sequentially which results in the non parallel behavior. I wonder if that behavior is by design or just something that hasn't been optimized yet. Aside from the crossgen discussion, we should diff builds that time out and ones from a few weeks ago that don't and see if perf somehow unintentionally regressed. |
In the absence of parallelization it is easily possible that Crossgen2 compilation of the framework assemblies is slower - according to perf repo measurements, for larger assemblies like Corelib we can be actually faster than Crossgen1 due to the internal parallelism but for small builds the managed startup will likely always be outperformed by the native Crossgen1 app, at least until we start selfhosting (compiling Crossgen2 with itself) and optimizing the startup path. For now I'm afraid that a certain degree of compilation throughput regression is known and expected, of course if it's 3 times slower, that wouldn't be expected and likely signifies a more substantial problem. I switched the framework over in this PR: |
Mitigates reported issues: #49309.
The timeout was bumped to 90 minutes and a root cause analysis showed that machines got slower (probably related to core count reduction). There was no regression in the actual build of dotnet/runtime apart from the usual churn. Let's track potential optimizations to crossgen2 invocation in a separate issue. Closing. |
Thanks Viktor, that sounds good to me especially in light of the fact that last Monday we found out that in the offending time period Crossgen2 wasn't being used at all due to a bug I had made in the original check-in of the publishing switchover, it was only last Wednesday I merged in the fix. |
Do you know which version of the SDK the fix will be in? Preview 3? |
The fix is actually in the runtime repo - the installer script wasn't passing the property |
e.g.
https://dev.azure.com/dnceng/public/_build/results?buildId=1027241&view=logs&jobId=e93c566c-b6be-5acb-6fc2-de2972d01fec
#49270 (comment)
The text was updated successfully, but these errors were encountered: