Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve coreclr windows native build time #33872

Closed
jashook opened this issue Mar 20, 2020 · 31 comments
Closed

Improve coreclr windows native build time #33872

jashook opened this issue Mar 20, 2020 · 31 comments

Comments

@jashook
Copy link
Contributor

jashook commented Mar 20, 2020

Some ideas floating around are moving off of the msbuild cmake project generator and onto the ninja or nmake generators.

Moving off of the msbuild generator allows us to remove another full framework dependency in our build. In addition, this is an attempt to improve our native build parallelism on windows. This work will hopefully allow our native build on windows to compete with our linux builds.

This is related to #33510.

/cc @janvorli @jkoritzinsky @jkotas

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Mar 20, 2020
@jashook
Copy link
Contributor Author

jashook commented Mar 20, 2020

See #33510 (comment)

@jashook jashook removed the untriaged New issue has not been triaged by the area owner label Mar 20, 2020
@jashook
Copy link
Contributor Author

jashook commented Mar 20, 2020

With the changes @jkoritzinsky has to build using ninja I have managed to build the coreclr runtime under windows docker with a minimal vs installation.

@jashook
Copy link
Contributor Author

jashook commented Mar 20, 2020

Docker files:

FROM mcr.microsoft.com/dotnet/framework/sdk:4.8-windowsservercore-ltsc2019

# Setup package management
RUN powershell -Command \
	Set-ExecutionPolicy Bypass -Scope Process -Force; iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1'))

RUN powershell -Command \
	Install-PackageProvider -Name chocolatey -Force
# escape=`

# Use the latest Windows Server Core image with .NET Framework 4.8.
FROM jashook/chocolatey:latest

# Restore the default Windows shell for correct batch processing.
SHELL ["cmd", "/S", "/C"]

# Download the Build Tools bootstrapper.
ADD https://aka.ms/vs/16/release/vs_buildtools.exe C:\TEMP\vs_buildtools.exe

# Install Build Tools excluding workloads and components with known issues.
RUN C:\TEMP\vs_buildtools.exe --quiet --wait --norestart --nocache `
    --installPath C:\BuildTools `
    --add Microsoft.VisualStudio.Workload.VCTools --includeRecommended

RUN DEL C:\TEMP\vs_buildtools.exe

# Default to PowerShell if no other command specified.
CMD ["powershell.exe", "-NoLogo", "-ExecutionPolicy", "Bypass"]
FROM jashook/vs2019-build-tools:latest

# Install coreclr dependencies
RUN choco install cmake -y
RUN choco install python3 -y
RUN choco install git -y

@jashook
Copy link
Contributor Author

jashook commented Mar 20, 2020

Note that "minimal" is still quite large...

@jashook
Copy link
Contributor Author

jashook commented Mar 20, 2020

/cc @lpereira

@jkotas
Copy link
Member

jkotas commented Mar 20, 2020

What is the "build-runtime.cmd" wall-clock time using msbuild vs. Ninja on Windows on the same machine?

Moving off of the msbuild generator allows us to remove another full framework dependency in our build

I do not see this as a good reason. .NET Framework is going to be part of Windows forever.

@janvorli
Copy link
Member

The benefit of using the msbuild generator is also that you can then open the generated solution / projects in visual studio with all the symbol definitions etc being correct, so intellisense can provide the best results.
So I think that if we added support for ninja on Windows, we should still keep the msbuild support.

@jkoritzinsky
Copy link
Member

Visual Studio has support for opening CMake projects directly with their open-folder feature. We should investigate if that gives a comparable experience to the MSBuild projects for development.

@jashook
Copy link
Contributor Author

jashook commented Mar 20, 2020

If not I do not mind living with multi project generators.

@jashook
Copy link
Contributor Author

jashook commented Mar 20, 2020

I do not see this as a good reason. .NET Framework is going to be part of Windows forever.

Removing dependencies on Full framework in our build is general goodness. Reducing our dependencies in our build is a measurable goal that will hopefully enable us to use leaner windows docker files. Although I will admit, at the moment, until msvc supports installing outside vs, we still cannot fully drop the dependency.

@jkotas
Copy link
Member

jkotas commented Mar 20, 2020

Although I will admit, at the moment, until msvc supports installing outside vs, we still cannot fully drop the dependency.

Right. Also, msvc and all other tooling we use on Windows would need to work on the leaner Windows images. That is a lot of problems to solve.

@am11
Copy link
Member

am11 commented Mar 23, 2020

Generally other build systems also come with additional dependencies and set of gotchas, e.g. arguments about meson and gyp being better than cmake, due to their leaner syntax etc., also do not always win due to their dependency on python, while cmake (with its share weaknesses) is (mainly) a build system for C/C++ project written in C/C++ without dozen of random dependencies. I do not feel strongly about ninja, but it itself being an additional dependency, and if there is no significant (or as @jkotas put, wall-clock time) improvement on Windows, then it is not quite appealing as advertised. At some point this option was added for Unix, but was phased out over time:

__UseNinja=0

(and nobody seems to miss it, Mac build still takes < 5 minutes to compile the runtime native parts and scales out nicely to all available cores)

@jashook
Copy link
Contributor Author

jashook commented Mar 26, 2020

Here is some preliminary timing data.

TIme in minutes

Generator Machine .\src\coreclr\build-runtime.cmd x64 checked (average)
MsBuild (clean) F16_v2 (2.4 GHz Intel Xeon® E5-2673 v3 16 core) 4
Ninja (clean) F16_v2 (2.4 GHz Intel Xeon® E5-2673 v3 16 core) 2.5
MsBuild (rebuild) F16_v2 (2.4 GHz Intel Xeon® E5-2673 v3 16 core) .5
Ninja (rebuild) F16_v2 (2.4 GHz Intel Xeon® E5-2673 v3 16 core) .2

@jashook
Copy link
Contributor Author

jashook commented Mar 26, 2020

Note that the timing data on the experiment brings it close to the linux build time

@jkoritzinsky
Copy link
Member

That timing for an MSBuild clean build seems really fast.

@jashook
Copy link
Contributor Author

jashook commented Mar 26, 2020

Will run 10 times and average.

@jashook
Copy link
Contributor Author

jashook commented Mar 26, 2020

Also working on 2 core and 4 core vm build times. Will post probably early tomorrow.

@am11
Copy link
Member

am11 commented Mar 26, 2020

It would be nice if we also compare the degree of parallelism. Other build systems might be using higher -j, while we seem to be missing /maxcpucount or /m switch for msbuild or /MP for cl. Can be captured with runtime\build.cmd -cmakeargs -DCMAKE_VERBOSE_MAKEFILE=ON.

@jashook
Copy link
Contributor Author

jashook commented Apr 6, 2020

f16_v2 MsBuild f16_v2 Ninja d2_v3 MsBuild d2_v3 Ninja d4_v3 MsBuild d4_v3 Ninja
4.642 3.039 33.576 25.516 19.144 12.602

Speedup

Percentage of how much faster the Ninja build is, compared to the msbuild generator.

f16_v2 d2_v3 d4_v3
35% 28% 34%

@jashook
Copy link
Contributor Author

jashook commented Apr 6, 2020

@am11 the msbuild/ninja comparison is difficult entirely to compare in the way you are referring to because the msbuild generator has two levels of parallelization, the msbuild invocation and the cl invocations. Generally we choose to run msbuild serially, so that each vcxproj can run fan out in the native build of that proj. The largest downside to this is each vcxproj has to serially wait for the fan out and fan back to happen before running the next vcxproj build.

The ninja generator does not have this problem, which is why it was interesting to prototype.

@jkotas
Copy link
Member

jkotas commented Apr 6, 2020

we choose to run msbuild serially,

What prevents us from choosing to run msbuild in parallel?

@jkoritzinsky
Copy link
Member

We do run MSBuild in parallel. The current setup in master runs MSBuild and cl both in parallel.

@jashook
Copy link
Contributor Author

jashook commented Apr 6, 2020

We do run MSBuild in parallel. The current setup in master runs MSBuild and cl both in parallel.

My information is old then thank you @jkoritzinsky :)

This will certainly cause issues with over using the machine resources during the build then. As each msbuild will also pass maxcpucount to each vcxproj invocation.

@lpereira
Copy link
Contributor

lpereira commented Apr 6, 2020

This will certainly cause issues with over using the machine resources during the build then. As each msbuild will also pass maxcpucount to each vcxproj invocation.

Is there a way to limit the number of processes msbuild spawns, regardless of -maxCpuCount?

GNU make has a command line parameter to specify that it shouldn't spawn new processes if the system load is beyond a certain threshold (-l), so a common trick of calling make in parallel in these situations is: nice --adjustment=20 make -l $NUMBER_OF_CPUS -j. The highest nice value makes the system more responsive during build, the -j option without a parameter is for unbounded parallelism, which is then curbed by -l $NUMBER_OF_CPUS (because a load of N is usually fine on Linux on a system with N CPUs). If I'm not mistaken, ninja does a similar thing.

@am11
Copy link
Member

am11 commented Apr 7, 2020

Is there a way to limit the number of processes msbuild spawns

/MP and CL_MPCount are our friends. I remembered reading a blog article from VCR team on degree of parallelism couple of years ago, can't find the link anymore. In my experience, as verbose as their XML syntax maybe, we can do [m]any things with msbuild and vcxproj project system, and cmake has means to pass those values to underlying build system.

@jkoritzinsky
Copy link
Member

The problem is that we have a very weird balance between large vcxprojs and small vcxprojs, so we'd have to dynamically determine the best balance or do a ton of additional manual testing to figure out how to balance MSBuild parallelism with CL.exe parallelism. I've done tests, and doing one or the other causes a massive drop in build performance on an 8 core machine (40-50% time increase on the CoreCLR native build last time I tried).

And honestly, I feel that the time investment spent on that to make it dynamically adjust correctly for the core count of the machine is a waste when we can instead switch to Ninja and significantly increase performance (with the 25-35% improvement as observed above) for significantly less work and more reliable results.

@jkotas
Copy link
Member

jkotas commented Apr 7, 2020

Can we work with the msbuild and C++ teams to find the solution for this? I expect that number of other projects have the same problem.

@jashook
Copy link
Contributor Author

jashook commented Apr 7, 2020

They are aware of the issue we face. /cc @jaredpar

@jkotas
Copy link
Member

jkotas commented Apr 7, 2020

What's their recommendation to solve it? Do they plan to do anything about it?

@danmoseley
Copy link
Member

danmoseley commented May 1, 2020

(Not sure which MSBuild issue to comment on.) At one time there was a prototype where MSBuild would dump the graph at the end of the build which could then seed choices in the next build. That seeding could maybe have been committed into the repo. Much of the time the scheduler has a whole bunch of projects to pick from to go next, but it doesn't know which depend on which, nor how large each is. If there are a small number of large, depended on projects, and lots of small, end of chain projects, the choices may have room for improvement. I am not sure whether they are considering that or whether the code has changed enough that it would be more work now than it was then. Also there is a big difference between a working prototype and a solid shippable feature. And I've no doubt the scheduler has approved substantially since back then. I think we had some simple heuristics in (such as prefer .proj over .*proj). But the coreclr problem reminded me of this approach.

@trylek
Copy link
Member

trylek commented Nov 30, 2020

@jkoritzinsky can this be closed now that you merged in the Ninja support?

@ghost ghost locked as resolved and limited conversation to collaborators Dec 30, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants