-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Opt Into Perf Logging #6274
Opt Into Perf Logging #6274
Conversation
…ript for official builds
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
Note to self: Contact dnceng and ask the failures in copying perf logging artifacts. https://devdiv.visualstudio.com/DevDiv/_build/results?buildId=4565213&view=results shows a failure that it couldn't find the files to copy. Is BuildConfiguration not set at this point in |
Perf logs can fail to publish if msbuild nodes hang onto them.
Not the end of the world. Looks like CoreOnWindows published its perf logs. CoreOnMac and CoreOnLinux don't seem to have captured any perflogs. @brianrob is there any reason they shouldn't be? |
@benvillalobos, I would expect that if the environment variable is set to a writeable location, then it should just work. Are you able to reproduce this locally? If not, it might be worth it to add some |
…for macos and linux
Apparently it isn't common to set paths in environment variables with quotes.
Is disabling nodereuse not closing out nodes fast enough?
The environment variable is being set properly, logs are being created and populated. The publish step is running into issues because of a lock on a log file. Uploading perf logs statusWindows full: consistently fails There's typically a lock on a log file, which is strange because msbuild runs with
I tried to get the publish task to allow multiple retries, but it looks like that parameter just isn't being used: microsoft/azure-pipelines-tasks#11451 Some ideas: Total side note: I noticed our cibuilds still try to publish netcoreapp2.1 logs, when our logs are net5.0 now. I don't think those publish steps are necessary. |
Interesting. Ideally we don't make upload of the log conditional, as this is a nice functional test of the whole system. I think it could be worth scripting a check to see what process is locking the file. It's going to be an MSBuild process, but perhaps the process is still alive even though we think it shouldn't be. The file name includes the PID for the process that owns the log. Could you fetch the command line for the PID and print it to the console? Since this is limited to Windows, you could do something like:
|
{ | ||
Console.WriteLine("!!! We threw!\nInner Exception: {0}\nMessage: {1}", e.InnerException, e.Message); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are Console.WriteLines the right way to log? If so, are they meant to be user-facing? If so, shouldn't we localize them? Either way, why so many exclamation points? Also, this should be much simpler if it's actually needed:
Console.WriteLine("!!! We threw!\nInner Exception: {0}\nMessage: {1}", e.InnerException, e.Message); | |
Console.WriteLine(e.Message + "Inner Exception: " + e.InnerException); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was part of a debug commit that will be reverted when the PR is ready to go in.
Unfortunately it looks like passing parameters to this task (ie, retry 5 times) isn't supported, as its deprecated. And moving to a newer task would require updating the pool of machines we're running on. I'm not sure how involved that process would be, but I assume it's a yak to be shaved. |
@brianrob Get-ChildItem -Path "artifacts\log\Debug\PerformanceLogs\*" | Where-Object {$_.Extension -eq '.log'} | ForEach-Object {
$s = $_.BaseName.Split('-')[1];
Write-Host "Checking Process ID: $s"
Get-WmiObject -Query "SELECT CommandLine FROM Win32_Process WHERE ProcessID = $s"
} Details from the process that's locking the file Windows FullLooks legit.
And This one may be a newer process that spawned and happened to take the same PID.
Looks legit.
Windows CoreLooks like it caught the previous powershell script running? Or it's catching itself? Likely catching itself.
This one looks legit.
|
I have a It's quick and dirty, but we could insert a script that does:
After the build script runs. Or change the timing where a |
Ready for review. Recommend reviewing the diff rather than by commits. Lots of test commits used. The final diff looks a lot cleaner. I think it's good to go unless we have other ideas for getting around the failed perf log upload. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love the taskkill, but this LGTM other than that!
.vsts-dotnet-ci.yml
Outdated
@@ -34,6 +34,10 @@ jobs: | |||
mergeTestResults: true | |||
continueOnError: true | |||
condition: always() | |||
- powershell: | | |||
taskkill /f /im msbuild.exe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This very much looks like a workaround. Why are there hanging processes in the first place? Can we fix that? Also, this makes it a little cleaner:
taskkill /f /im msbuild.exe | |
taskkill /f /im msbuild.exe /im vbcscompiler.exe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know you can pass /im multiple times!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to clarify here: My expectation is that there were always processes hanging around. I don't believe that this change modifies process lifetime. You could try running dotnet build-server shutdown
to kill the compiler server. One thing to confirm - are we getting full perf logs from these runs, or are they truncated because of the call to taskkill
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we kill the processes, the file that they generate isn't completed:
[2021-04-05T19:37:33.6451866Z] Event=Microsoft-Build/MSBuildExeStart ProcessID=7688 ThreadID=1 commandLine="D:\a\1\s\artifacts\bin\Microsoft.Build.Engine.UnitTests\Debug\net472\MSBuild.exe /nologo /nodemode:1 /nodeReuse:true /low:false"
[2021-04-05T19:37:34.0831946Z] Event=Microsoft-Build/RequestThreadProcStart ProcessID=7688 ThreadID=11
[2021-04-05T19:37:34.0872085Z] Event=Microsoft-Build/BuildProjectStart ProcessID=7688 ThreadID=11 projectPath="C:\Users\VssAdministrator\AppData\Local\Temp\041qbmta.yej\tmp4965ff255b434b4e8c54fa1b6dc9cf10.tmp"
[2021-04-05T19:37:34.0962009Z] Event=Microsoft-Build/LoadDocumentStart ProcessID=7688 ThreadID=11 fullPath="C:\Users\VssAdministrator\AppData\Local\Temp\041qbmta.yej\tmp4965ff255b434b4e8c54fa1b6dc9cf10.tmp"
[2021-04-05T19:37:34.1042039Z] Event=Microsoft-Build/LoadDocumentStop ProcessID=7688 ThreadID=11 fullPath="C:\Users\VssAdministrator\AppData\Local\Temp\041qbmta.yej\tmp4965ff255b434b4e8c54fa1b6dc9cf10.tmp"
[2021-04-05T19:37:34.1072967Z] Event=Microsoft-Build/ParseStart ProcessID=7688 ThreadID=11 projectFileName="C:\Users\VssAdministrator\AppData\Local\Temp\041qbmta.yej\tmp4965ff255b434b4e8c54fa1b6dc9cf10.tmp"
[2021-04-05T19:37:34.1142095Z] Event=Microsoft-Build/ParseStop ProcessID=7688 ThreadID=11 projectFileName="C:\Users\VssAdministrator\AppData\Local\Temp\041qbmta.yej\tmp4965ff255b434b4e8c54fa1b6dc9cf10.tmp"
[2021-04-05T19:37:34.3042076Z] Event=Microsoft-Build/TargetStart ProcessID=7688 ThreadID=11 targetName="msbuild"
[2021-04-05T19:37:34.3582095Z] Event=Microsoft-Build/LoadDocumentStart ProcessID=7688 ThreadID=11 fullPath="D:\a\1\s\artifacts\bin\Microsoft.Build.Engine.UnitTests\Debug\net472\Microsoft.Common.tasks"
[2021-04-05T19:37:34.3602064Z] Event=Microsoft-Build/LoadDocumentStop ProcessID=7688 ThreadID=11 fullPath="D:\a\1\s\artifacts\bin\Microsoft.Build.Engine.UnitTests\Debug\net472\Microsoft.Common.tasks"
[2021-04-05T19:37:34.3602064Z] Event=Microsoft-Build/ParseStart ProcessID=7688 ThreadID=11 projectFileName="D:\a\1\s\artifacts\bin\Microsoft.Build.Engine.UnitTests\Debug\net472\Microsoft.Common.tasks"
[2021-04-05T19:37:34.3621959Z] Event=Microsoft-Build/ParseStop ProcessID=7688 ThreadID=11 projectFileName="D:\a\1\s\artifacts\bin\Microsoft.Build.Engine.UnitTests\Debug\net472\Microsoft.Common.tasks"
[2021-04-05T19:37:34.4402082Z] Event=Microsoft-Build/TargetStop ProcessID=7688 ThreadID=11 targetName="msbuild"
[2021-04-05T19:37:34.4442095Z] Event=Microsoft-Build/BuildProjectStop ProcessID=7688 ThreadID=11 projectPath="C:\Users\VssAdministrator\AppData\Local\Temp\041qbmta.yej\tmp4965ff255b434b4e8c54fa1b6dc9cf10.tmp" targets="msbuild"
[2021-04-05T19:37:34.4462038Z] Event=Microsoft-Build/RequestThreadProcStop ProcessID=7688 ThreadID=11
[2021-04-05T19:37:34.5532097Z] Event=Microsoft-Build/RequestThreadProcStart ProcessID=7688 ThreadID=11
[2021-04-05T19:37:34.5532097Z] Event=Microsoft-Build/BuildProjectStart ProcessID=7688 ThreadID=11 projectPath="C:\Users\VssAdministrator\AppData\Local\Temp\041qbmta.yej\tmp4965ff255b434b4e8c54fa1b6dc9cf10.tmp"
[2021-04-05T19:37:34.5571985Z] Event=Microsoft-Build/TargetStart ProcessID=7688 ThreadID=11 targetName="msbuild"
[2021-04-05T19:37:34.5612210Z] Event=Microsoft-Build/TargetStop ProcessID=7688 ThreadID=11 targetName="msbuild"
[2021-04-05T19:37:34.5612210Z] Event=Microsoft-Build/BuildProjectStop ProcessID=7688 ThreadID=11 projectPath="C:\Users\VssAdministrator\AppData\Local\Temp\041qbmta.yej\tmp4965ff255b434b4e8c54fa1b6dc9cf10.tmp" targets="msbuild"
[2021-04-05T19:37:34.5612210Z] Event=Microsoft-Build/RequestThreadProcStop ProcessID=7688 ThreadID=11
[2021-04-05T19:40:57.3420042Z] Event=Microsoft-Build/RequestThreadProcStart ProcessID=7688 ThreadID=11
[2021-04-05T19:40:57.3420042Z] Event=Microsoft-Build/BuildProjectStart ProcessID=7688 ThreadID=11 projectPath="C:\Users\VssAdministrator\AppData\Local\Temp\041qbmta.yej\tmpe37eba57a280484fb0a6015308bebaff.tmp"
[2021-04-05T19:40:57.3430119Z] Event=Microsoft-Build/LoadDocumentStart ProcessID=7688 ThreadID=11 fullPath="C:\Users\VssAdministrator\App
Notice how it ends on LoadDocumentStart
and has no MSBuildExeStop
I'll try dotnet build-server
shutdown, which I didn't realize was an option!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If certain events hadn't fired, wouldn't that still mean those events wouldn't fire as it tries to get out of the build as fast as possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I'd like to get this in a state that is "good enough" if we want to see this perf data and act on it. Even if it's not the perfect solution. Part of me suspects digging into this node issue could take longer than expected. I've been digging into it without much result so far. The current idea is to set the MSBUILDDISABLENODEREUSE env var to 1 and see if the node will realize it needs to die.
Notes from PR review: We want to get Rainer's take on it.
|
… for the killed process. Verify it logged everything it needed to
Current status is setting MSBUILDDISABLENODEREUSE to 1 with no custom scripts and the node properly shuts down. Obviously the bigger issue here is a node being passed nodeReuse true. Running this a second time to be safe and I'll be fine merging it as-is and investigating further. |
When did we switch from waiting a week is ok to merging despite known issues? |
@Forgind What? We didn't. I want this in a state that is "acceptable to merge" should we decide that we want to so we can act on the perf data. My last comment can be interpreted as "The nodereuse thing looks like a bug, but setting the environment variable will allow us to act on this perf data while I dig into what the real issue here is". |
This sounded to me like merging early. If you meant to say that you think it's ready to merge but still intend to wait a week, then sure, though I'm less confident. |
After mulling this over some, I don't think this should be blocked on the hanging process. It sheds light on a potential issue but isn't the cause. Locally run |
Feedback from PR review: Maybe the hanging node is actually spawned from a unit test. I didn't consider this and it sounds very plausible! |
|
… environment variable
This reverts commit 6a4a460.
Resolved the issue by disabling DOTNET_PERFLOG_DIR for all unit test assemblies. Created an issue tracking hanging assemblies here: #6344. Confirmed locally that no extra perf logs get generated from tests. CI should succeed with no issues. |
Fixes #5900
Context
It's time we opt into performance logging: #5861
Changes Made
DOTNET_PERFLOG_DIR
directory if it doesn't exist.eng\cibuild_bootstrapped_msbuild.cmd
and the.sh
equivalent.taskkill /f /im msbuild.exe
andtaskkill /f /im VBCSCompiler.exe
(these tasks were seen as holding a lock on some generated perf files). If we don't kill these tasks before trying to copy over the perf logs, it will fail the entire perf-log upload because the out of proc node (node mode 1) is still holding a lock on the file.Testing
We should see perf logs under
artifacts/log/<configuration>/PerformanceLogs
Notes
Don't review commit by commit 😬 needless to say, this PR should be squashed.