Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSBuildWorkspace.OpenSolutionAsync hangs indefinitely on 4.9.0+ in Windows 2022 Azure CI/CD #75967

Open
kamronbatman opened this issue Nov 19, 2024 · 14 comments
Labels
Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead

Comments

@kamronbatman
Copy link

kamronbatman commented Nov 19, 2024

Version Used:

        <PackageReference Include="Microsoft.Build.Locator" Version="1.7.1" />
        <PackageReference Include="Microsoft.CodeAnalysis.CSharp" Version="4.11.0" />
        <PackageReference Include="Microsoft.CodeAnalysis.CSharp.Workspaces" Version="4.11.0" />
        <PackageReference Include="Microsoft.CodeAnalysis.Workspaces.MSBuild" Version="4.11.0" />

Steps to Reproduce:

  1. Writing a dotnet tool that opens a solution to run through all projects. This code hangs indefinitely:
        MSBuildLocator.RegisterDefaults();

        using var workspace = MSBuildWorkspace.Create();
        workspace.WorkspaceFailed += (_, args) => Console.WriteLine(args.Diagnostic.Message);

        var solution = await workspace.OpenSolutionAsync(solutionPath);

Expected Behavior:

The code should move past OpenSolutionAsync, returning the workspace and finish running the application.

Actual Behavior:

Specifically on Azure Pipelines, latest Windows 2022, with .NET 9:

jobs:
- job: BuildWindows
  displayName: 'Windows Server 2022'
  pool:
    vmImage: 'windows-latest'

  steps:
  - task: UseDotNet@2
    displayName: 'Install .NET 9'
    inputs:
      useGlobalJson: true

The code above hangs indefinitely.

Link to the PR in our repo that replicates this:
https://github.com/modernuo/SerializationGenerator/pull/43/files#diff-f2b522070e7bfd744704829ee17ae88ea259383f12f3a45fb2d860a2a0c3e19c

In that PR, I also tried to clean up any possible deadlocks, but nothing changed.

Repo that uses this tool and it's PR:

https://github.com/modernuo/ModernUO/pull/1984/files#diff-7afd3bcf0d0c06d6f87c451ef06b321beef770ece4299b3230bb280470ada2f6

The azure pipelines yaml file we use is here:
https://github.com/modernuo/ModernUO/pull/1984/files#diff-7915b9b726a397ae7ba6af7b9703633d21c031ebf21682f3ee7e6a4ec52837a5

The tool is run through a cmd file as part of the build process:
https://github.com/modernuo/ModernUO/blob/1d08572926ad3e6376fbebb1ec169624f9cfdb74/publish.cmd#L46

I cannot replicate this locally on Windows 11, nor on other operating systems. If I had to guess, it is related to how .NET is installed/path'd on that docker

@dotnet-issue-labeler dotnet-issue-labeler bot added Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead labels Nov 19, 2024
@kamronbatman
Copy link
Author

Per conversation on discord, @dibarbet you said you will try to take a look. Please contact me or let me know how I can help. I haven't had time to set up another azure pipelines just for a minimal reproducible snippet, sorry.

@kamronbatman kamronbatman changed the title MSBuildWorkspace.OpenSolutionAsync hangs indefinitely on 4.9+ in Windows 2022 Azure CI/CD MSBuildWorkspace.OpenSolutionAsync hangs indefinitely on 4.9.0+ in Windows 2022 Azure CI/CD Nov 19, 2024
@dibarbet
Copy link
Member

Been taking a look at this and nothing obvious is standing out. Can you try the following:

Update the code to pass in the progress parameter to OpenSolutionAsync that writes to the console:

public static async Task<IEnumerable<Project>> GetProjectsAsync(string solutionPath)
{
    ...
    var solution = await workspace.OpenSolutionAsync(solutionPath, progress: new Progress());
    ...
}

private class Progress : IProgress<ProjectLoadProgress>
{
    public void Report(ProjectLoadProgress value)
    {
        Console.WriteLine($"{value.Operation} completed for {value.FilePath} ({value.TargetFramework}) in {value.ElapsedTime.TotalMilliseconds}ms");
    }
}

That should write logs similar to

Running Migrations for C:\Users\dabarbet\source\repos\trash\SerializationGenerator\SerializationGenerator.sln
Evaluate completed for C:\Users\dabarbet\source\repos\trash\SerializationGenerator\ModernUO.Serialization.Generator\ModernUO.Serialization.Generator.csproj () in 636.8839ms
Build completed for C:\Users\dabarbet\source\repos\trash\SerializationGenerator\ModernUO.Serialization.Generator\ModernUO.Serialization.Generator.csproj () in 276.8386ms
Resolve completed for C:\Users\dabarbet\source\repos\trash\SerializationGenerator\ModernUO.Serialization.Generator\ModernUO.Serialization.Generator.csproj (netstandard2.0) in 141.4972ms
...

Beyond that I think we'll have to get a hang dump (probably via procdump)

@dibarbet
Copy link
Member

I also wasn't able to get a minimal repro. For example I created this dotnet tool that loads a solution successfully on windows-2022 github action images - https://github.com/dibarbet/MSBuildWorkspaceConsole/actions/runs/11923292319/job/33231361423

Based on this link - https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/hosted?view=azure-devops&tabs=yaml , the github image should be the same as the azure devops one.

I would guess something related to the specific solution or environment being setup is causing an issue, though not sure what that would be.

@kamronbatman
Copy link
Author

kamronbatman commented Nov 20, 2024

Been taking a look at this and nothing obvious is standing out. Can you try the following:

Update the code to pass in the progress parameter to OpenSolutionAsync that writes to the console:

public static async Task<IEnumerable> GetProjectsAsync(string solutionPath)
{
...
var solution = await workspace.OpenSolutionAsync(solutionPath, progress: new Progress());
...
}

private class Progress : IProgress
{
public void Report(ProjectLoadProgress value)
{
Console.WriteLine($"{value.Operation} completed for {value.FilePath} ({value.TargetFramework}) in {value.ElapsedTime.TotalMilliseconds}ms");
}
}
That should write logs similar to

Running Migrations for C:\Users\dabarbet\source\repos\trash\SerializationGenerator\SerializationGenerator.sln
Evaluate completed for C:\Users\dabarbet\source\repos\trash\SerializationGenerator\ModernUO.Serialization.Generator\ModernUO.Serialization.Generator.csproj () in 636.8839ms
Build completed for C:\Users\dabarbet\source\repos\trash\SerializationGenerator\ModernUO.Serialization.Generator\ModernUO.Serialization.Generator.csproj () in 276.8386ms
Resolve completed for C:\Users\dabarbet\source\repos\trash\SerializationGenerator\ModernUO.Serialization.Generator\ModernUO.Serialization.Generator.csproj (netstandard2.0) in 141.4972ms
...

Beyond that I think we'll have to get a hang dump (probably via procdump)

I'll do this and see what happens. Regarding a procdump, if you have steps I can throw together for the CI/CD to do that, let me know.

Note that with no change to code or solution that is opened, v4.8.0 worked fine.

EDIT: With the progress change, here is the run:
https://dev.azure.com/modernuo/modernuo/_build/results?buildId=6447

@kamronbatman
Copy link
Author

kamronbatman commented Nov 20, 2024

So the run shows:

Generating serialization migration schema...
Running Migrations for ModernUO.sln
Evaluate completed for D:\a\1\s\Projects\Server\Server.csproj () in 1587.7229ms

Contrast to locally (or other CI/CD runs like github actions with linux/macos):

Running Migrations for C:\Repositories\ModernUO\ModernUO.sln
Evaluate completed for C:\Repositories\ModernUO\Projects\Server\Server.csproj () in 418.6091ms
Build completed for C:\Repositories\ModernUO\Projects\Server\Server.csproj () in 537.2811ms
Resolve completed for C:\Repositories\ModernUO\Projects\Server\Server.csproj (net9.0) in 101.3583ms
Evaluate completed for C:\Repositories\ModernUO\Projects\UOContent\UOContent.csproj () in 303.8839ms
Build completed for C:\Repositories\ModernUO\Projects\UOContent\UOContent.csproj () in 463.5592ms
Resolve completed for C:\Repositories\ModernUO\Projects\UOContent\UOContent.csproj (net9.0) in 54.8129ms
Evaluate completed for C:\Repositories\ModernUO\Projects\Server.Tests\Server.Tests.csproj () in 52.6765ms
Build completed for C:\Repositories\ModernUO\Projects\Server.Tests\Server.Tests.csproj () in 198.41ms
Evaluate completed for C:\Repositories\ModernUO\Projects\Application\Application.csproj () in 28.9806ms
Build completed for C:\Repositories\ModernUO\Projects\Application\Application.csproj () in 63.5944ms
Resolve completed for C:\Repositories\ModernUO\Projects\Application\Application.csproj (net9.0) in 4.6206ms
Resolve completed for C:\Repositories\ModernUO\Projects\Server.Tests\Server.Tests.csproj (net9.0) in 108.1894ms
Evaluate completed for C:\Repositories\ModernUO\Projects\UOContent.Tests\UOContent.Tests.csproj () in 37.7079ms
Build completed for C:\Repositories\ModernUO\Projects\UOContent.Tests\UOContent.Tests.csproj () in 101.6578ms
Resolve completed for C:\Repositories\ModernUO\Projects\UOContent.Tests\UOContent.Tests.csproj (net9.0) in 5.6494ms
Completed migrations for Server

@dibarbet
Copy link
Member

@kamronbatman interesting note here - I took your .NET9 PR and ran it against github actions - appeared to have no issues there
https://github.com/dibarbet/ModernUO/tree/actions_only - I don't have access to an azure pipeline to run it against. I'll see if I can figure out procdump instructions tomorrow.

@dibarbet
Copy link
Member

@kamronbatman
Copy link
Author

@kamronbatman interesting note here - I took your .NET9 PR and ran it against github actions - appeared to have no issues there https://github.com/dibarbet/ModernUO/tree/actions_only - I don't have access to an azure pipeline to run it against. I'll see if I can figure out procdump instructions tomorrow.

Good to know, thanks. I stopped using GitHub actions because it wasn't part of the unlimited actions for public repos. There was also another issue in the past but I don't remember what it was.

@dibarbet
Copy link
Member

@kamronbatman try something similar to this - dibarbet/ModernUO@202ae64

You may have to tweak it a bit to make sure it captures the dump on hang and uploads correctly.

It is incredibly mystifying that the github actions run passes, but fails on AzDo with the exact same image.

@kamronbatman
Copy link
Author

kamronbatman commented Nov 21, 2024

@dibarbet - The tool never exits, so can we put a 5minutes timer on the proc dump PS1 and kill the process? Otherwise I don't see how this would work. 🤔

PR Changes for the azure run:
https://github.com/modernuo/ModernUO/pull/1984/files#diff-7915b9b726a397ae7ba6af7b9703633d21c031ebf21682f3ee7e6a4ec52837a5

https://dev.azure.com/modernuo/modernuo/_build/results?buildId=6452&view=logs&j=3e8797c7-5b0a-5f2c-07a4-1bc5e60a122e&t=8bbe3afb-863e-5d44-0040-a95fcabd47cd

  • It's "hung" and doesn't dump.

EDIT: I added messages, and it never gets past running the schema generator. We might need to tweak the ps1 so it's non-blocking?

@dibarbet
Copy link
Member

dibarbet commented Nov 21, 2024

@kamronbatman procdump should collect a hang dump after the process doesn't respond for 5 seconds. IIRC it should also exit after that (and we only wait on the procdump process exiting). Either procdump doesn't actually exit after it collects a dump or procdump isn't detecting a hang. How we fix it depends on which one is happening.

I've adjusted the script so it should live write the process output to the console, instead of waiting until the end. This should help us tell if procdump is collecting anything or not - dibarbet/ModernUO@52a27e1

@kamronbatman
Copy link
Author

@kamronbatman procdump should collect a hang dump after the process doesn't respond for 5 seconds. IIRC it should also exit after that (and we only wait on the procdump process exiting). Either procdump doesn't actually exit after it collects a dump or procdump isn't detecting a hang. How we fix it depends on which one is happening.

I've adjusted the script so it should live write the process output to the console, instead of waiting until the end. This should help us tell if procdump is collecting anything or not - dibarbet/ModernUO@52a27e1

Generating serialization migration schema...
Procdump folder: D:\a\1\s\/procDump
Downloading procdump
Running schema generator
Running procdump
Waiting for proc dump exit
Running Migrations for ModernUO.sln

ProcDump v11.0 - Sysinternals process dump utility
Copyright (C) 2009-2022 Mark Russinovich and Andrew Richards
Sysinternals - www.sysinternals.com

Process:               dotnet.exe (5328)
Process image:         C:\hostedtoolcache\windows\dotnet\dotnet.exe
CPU threshold:         n/a
Performance counter:   n/a
Commit threshold:      n/a
Threshold seconds:     10
Hung window check:     Enabled
Log debug strings:     Disabled
Exception monitor:     Disabled
Exception filter:      [Includes]
                       *
                       [Excludes]
Terminate monitor:     Disabled
Cloning type:          Disabled
Concurrent limit:      n/a
Avoid outage:          n/a
Number of dumps:       1
Dump folder:           D:\a\1\s\/procDump\
Dump filename/mask:    PROCESSNAME_YYMMDD_HHMMSS
Queue to WER:          Disabled
Kill after dump:       Disabled


Press Ctrl-C to end monitoring without terminating the process.

Evaluate completed for D:\a\1\s\Projects\Server\Server.csproj () in 1578.1493ms

@dibarbet
Copy link
Member

Thanks - it looks like procdump isn't detecting a hang and not collecting any dumps.

Try this instead - it will collect a dump every 5 seconds up to 5 total. You may have to adjust the time depending on how long it takes to get to the broken state - dibarbet/ModernUO@14ea3c6

@kamronbatman
Copy link
Author

@dibarbet I sent you a link to the dumps via discord.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-IDE untriaged Issues and PRs which have not yet been triaged by a lead
Projects
None yet
Development

No branches or pull requests

2 participants