Use better logic for picking which node to build first #376

evmar · 2012-07-27T19:42:53Z

See discussion in #60

RedX2501 · 2014-04-30T12:53:00Z

What is the status on #60 and #232 ? Has any of the changes proposed on those been merged? Has a different mechanism being implemented?

nico · 2014-09-22T00:31:16Z

https://github.com/nico/ninja/compare/cpsched?expand=1 is a proof-of-concept implementation of letting Plan::FindWork use a priority list for scheduling, and implements a critical-path scheduling algorithm to build the priority list. I've only tested it on ninja itself so far.

orenbenkiki · 2015-02-04T06:58:04Z

Is this likely to be merged? I have a pretty involved old-style Makefile I would like to speed up. My main problem is critical path scheduling - I have some tests that take a long time (almost a minute) to run, and it is nearly impossible to trick make -j to schedule these ahead doing a lot of other smaller tasks that could happen in parallel. I can live with the ~10 seconds overall make tax, but waiting another minute for the build to finish because of bad scheduling is starting to get to me. I was hoping ninja could solve my problem, but if it doesn't do critical path analysis either, then it seems all I'll gain is the 10 seconds or so, not worth the effort of converting the Makefile...

BTW, about #60 - it is easy to provide this functionality within a critical-path algorithm by artificially increasing the criticality of tasks that failed when they were last attempted. The "correct" amount to increase by would make a good PHD thesis (involving things like the probability that this new build will succeed, how long it takes to edit the sources to fix the error, and the phase of the moon); in practice adding some configurable amount should be fine.

nico · 2015-02-05T00:25:21Z

I tried that on larger projects, and in practice it seemed to do way worse than the current scheduling approach, maybe because it's worse for the disk cache. In practice, most processes seem to be shortlived and there are many of them, so being nice to the disk cache (which the current behavior is) seems more important than being smart about the order. If there are long-running tasks like in your case, the tradeoff is different though.

Have you tried using the branch linked to above and measure how much it helps?

orenbenkiki · 2015-02-05T06:52:12Z

What I did is log (out of the existing Makefile) when does each task start, end, and how much CPU % it consumes (this would be lower for an IO intensive task). I then visualized this on a diagram. I can see that my few long-living, high-CPU tasks come "too late", while a ton of short-lived, low CPU % tasks come at the start (where Make doesn't even manage to spawn enough of them to saturate the CPUs). If I could manually rearrange the order they were executed, I could reduce my 3.5 minute build time by about 1 minute (the dependencies allow for that). Also, if I could say "run more of the low-CPU % tasks in parallel" I could probably get it down for 2m for a full build. I think I should be able to do that by using pools in Ninja...

The Makefile is pretty involved (several hundred lines of rules, using GNU make macros and wildcard expansions and so on). Converting it to Ninja would take "some effort" - I'm not even certain what the best way would be (would I be able to control everything I want using CMAKE, such as pools etc.? Should I just write some shell/python/ruby script that directly spits out the Ninja rules? etc.).

There are other things I can do to speed things up. For example, thanks to my "wonderful" IT department, the compiler and the libraries are installed on an NFS disk... sigh. I could use precompiled headers to reduce some of that pain but there's a GCC bug about such headers that prevents us from doing so (I opened a bug report in GCC, I guess it might be fixed in a few years...). I could do a private install of GCC on a local disk in each and every machine...

The problem is that this is my "day job" and I have a lot on my plate. My build time is 3.5 minutes for a full build, and a "nothing to do build" is almost 20 seconds. These numbers aren't enough for me to justify taking a lot of time off my development tasks to "goof off playing with the build system". But it gets annoying and my build times will just grow as we pour more code into the system. I need to be careful deciding where to spend my efforts. Switching to Ninja, even speculatively, would depend on whether I have a good expectation it will actually help.

Your point about the disk cache is interesting. It would make sense that "immediately" using the results of a previous step would be more efficient than returning to it after a "long time". Also, using the same source files in parallel steps would be more efficient than running these steps in different times. And, of course, doing a critical path analysis would also help...

Optimal scheduling is NP-complete on the best of days. The general solution is "critical chain analysis" (it takes resource contention into account - e.g., considering the effect of pools). There are reasonable, well-known heuristics for it. But it doesn't take into account issues like different times due to caching, as you pointed out. They can be extended to cover these but the hard part would be to estimate the effects of the caching on a given schedule (do we saturate the disk cache here? how much will it cost? etc.).

Perhaps one could get "most of" of the goodness is with combining simple tricks. At any given time, you have a list of nodes you can schedule. Suppose you had a modular scheduler, allowing to plug-in different (possibly competing) heuristics which act as tie-breakers. Say each is giving a score to each node, and you decide by the sum of the scores. A configuration file would enable/disable, give parameters, and apply a linear scale to the results of each heuristic, to calibrate it against the others.

You could then provide heuristics like "prefer to consume task results as soon as they were created", as well as "prefer to read the same source files in parallel tasks that run together", "prefer to earlier run tasks that failed on a previous build", and also classic critical path/chain analysis. People would be able to easily contribute new heuristics, and people would be able to tweak their configuration files to get an optimal schedule for their projects.

Being obsessed with build speed is what Ninja is all about, right? :-)

evmar · 2015-02-05T07:36:06Z

One approach is to continue to use Makefiles but do more work ahead of time in a script that generates your Makefiles. This script can easily be changed to use Ninja at some point if it looks promising.

(CMake can be thought of as one choice for the language to write your script in. But if your Makefiles are complex then maybe you'll think it's easier to write your own Perl script or whatever.)

maximuska · 2015-02-05T08:20:15Z

The Makefile is pretty involved (several hundred lines of rules, using GNU
make macros and wildcard expansions and so on). Converting it to Ninja
would take "some effort" - I'm not even certain what the best way would be
(would I be able to control everything I want using CMAKE, such as pools
etc.? Should I just write some shell/python/ruby script that directly spits
out the Ninja rules? etc.).

I had a similar dilemma upon a time. It took me ~3 evenings to *prototype
*a replacement of our proprietry build system with ninja to get a feeling
of how fast will that be with ninja, I suggest you giving a try to that
approach to see if you want to proceed to a complete solution.

There is a makefiles debugging hack
http://www.drdobbs.com/tools/debugging-makefiles/197003338 which you
probably can abuse to prototype a ninja manifest from makefile, which is
overriding 'SHELL' variable to dump the commands and their dependencies to
a file. E.g., add to your top makefile: SHELL = $(warning === $@ : ($^)*
===) /bin/bash -v*, and postprocess the output of a clean build.

One of the best (IMO) decision which I've made in that respect was not to
rewrite the manifest manually, rather to "enhance" the original build
system to generate ninja manifest instead of running the build. It turned
out to be important because not everybody were happy to switch to ninja,
and this hack allows us using ninja pretty without any extra maintenance,
as updating the legacy build system definitions are reflected automatically
in the generated manifest. (However if you are certain you can convenience
everybody to replace the build system entirely, go 'all in', of course).

Maxim

orenbenkiki · 2015-02-05T08:31:23Z

I wish that make had a --ninja flag that would just dump a Ninja file (it is, after all, pretty much a syntax of the internal structures that make creates after expanding all macros etc.). The shell trick is nice, I can use it as a basis. But I have dynamic lists of rules (based on file lists extracted from the file system, etc.) so it can't be a permanent solution.

Ideally I need a 2-phase system - regenerate the Ninja file when the file lists change (rarely) and a fast build (using Ninja). The 1st phase can be "almost anything".

maximuska · 2015-02-05T20:46:39Z

It will be interesting to hear the conclusions, if you make any progress
with that!
On 5 Feb 2015 10:31, "Oren Ben-Kiki" notifications@github.com wrote:

I wish that make had a --ninja flag that would just dump a Ninja file (it
is, after all, pretty much a syntax of the internal structures that make
creates after expanding all macros etc.). The shell trick is nice, I can
use it as a basis. But I have dynamic lists of rules (based on file lists
extracted from the file system, etc.) so it can't be a permanent solution.

Ideally I need a 2-phase system - regenerate the Ninja file when the file
lists change (rarely) and a fast build (using Ninja). The 1st phase can be
"almost anything".

—
Reply to this email directly or view it on GitHub
#376 (comment).

nyh · 2015-03-31T16:13:29Z

I have run into exactly same issue described by issue #60: I have a complex and slow to compile C++ project, where full compilation takes almost 10 minutes, a lot of the code is in header files, and changing one header file causes 50 source files to need recompiling.
In my usual development cycle, I modify one .hh and one .cc file (for example), and recompile. I'd like that .cc file to be compiled first - instead of compiling 50 other .cc files first, and only then realizing I have a syntax error in that single .cc I actually changed.

nathanaelg · 2015-09-28T13:48:02Z

I have a case where one task in my build is (unfortunately) completely serial, has no build time dependencies and provides one library which is needed at link time by one executable. What I want to do is make sure that it is the first thing that is run (if it needs to be run) but because it is single threaded I want to run other stuff at the same time as well. Any ideas as to how to achieve that?

maximuska · 2015-09-28T15:39:26Z

Wouldn't adding a build order dependency of this target do that?

On 4:48PM, Mon, Sep 28, 2015 nathanaelg notifications@github.com wrote:

I have a case where one task in my build is (unfortunately) completely
serial, has no build time dependencies and provides one library which is
needed at link time by one executable. What I want to do is make sure that
it is the first thing that is run (if it needs to be run) but because it is
single threaded I want to run other stuff at the same time as well. Any
ideas as to how to achieve that?

—
Reply to this email directly or view it on GitHub
#376 (comment).

nathanaelg · 2015-09-28T15:57:56Z

If I make everything dependent on the slow task then it will definitely start first, but then everything else will only start once it has completed (which is a waste of time). If I make nothing dependent on it, and only the executable dependent on the slow task it seems to start quite late in the build.

jgunthorpe · 2015-10-30T20:09:03Z

Seconded @nyh - I also have a large C++ project, and have the same desire.

Have an option for ninja to sort the current set of runnable targets by newest modification date of the first dependency, or something similar. The goal is to compile the files I've actually been changing first, and then go on to compile the files that haven't been changed.

A secondary great idea would be to record failures and have the next ninja run sort previously failed targets first. ie if a header file is changed to fix a build error then I want to know quickly that the fix is good instead of waiting for the failed .cc to eventually come up.

To be clear, this is optimizing the compile-build-fix error-compile cycle (try to minimize time to error), not minimizing total build time as other have been talking about.

comicfans · 2017-09-20T16:37:12Z

currently I have one very long build job (10 minutes to complete linking) among other short jobs, and seems ninja schedule this job too late, makes total build time longer than necessary, as profiler shows:

but if I build the longest target only, it have only a few dependencies

I think ninja can improve this by

when have history build log(and user didn't give explicit targets order) , try building longest path first (we already have complete history data)
user provide explicit targets in command line ,earlier appeared target takes higher priority, so user can guild ninja to get better choice (when there is no history build log, or parts of some history build ).

I think both can be implemented by simply reverse walking the dependency graph to find longest path, for first one, we can:

mark all final exit nodes (which is not used as inputs of any rules) as start point distance 0
walk back to its inputs nodes , mark these nodes distance =rule build time+ outputs distance . if it's already marked, longer one wins. (if no history build time for rule ,just use 1 unit)
rule priority is same as the output nodes distance
always choose to build highest one , we always choose the edge to reach current longest path earliest.

for second one , we can repeat previously steps, just exclude all targets listed in command line from first step, when all nodes marked , makes last target(in command line) outputs distance = current longest distance , and mark from this target again . repeat until all target used as start point.
later marked path always have higher distance than previous pass, so earlier target always takes higher priority . this also works when targets have common subgraph. edges in subgraph always marked as highest , when subgraph built, higher target reaches earlier.

-----update
I've created a test branch to implement this , most based on nico's cpsched branch, with explicit build order, my build time decreasing from 22 min to 14 min, much closer to the longest build job only time (13 min)

when targets is given in command line , gives higher build priority to target appeared earlier.for example you know longestjob1 longjob2 in project have not been built before, you can run ninja longestjob1 longjob2 all to guild ninja schedule them first to improve overall build time. (this is only needed when such long job have never been built before for first time. ninja always auto schedule the best build order for all targets which built before). this is implmented by assign weight * priority as initial value when compute edge critical time. weight is computed as "total time of these targets if built in serial" so its big enough to assume forward edges of higher priority target always got higher value than edges on lower ones.

travisdowns · 2022-05-07T04:51:21Z

I can't wait for the the PR in #2019 to be merged, it will be awesome.

In many cases it would cut our build times by a ~third, because we have one file (with no input task dependencies) which takes 2x as long as most files: this file should be built immediately at the start but ninja choose to build it near the end, adding about 1 minute to a 3 minute build. Almost any old scheduling algorithm examining .ninja_log would solve this immediately.

jhasse · 2024-03-01T08:49:47Z

fixed by #2177

kasper93 mentioned this issue Jul 30, 2016

Heuristic: Run jobs with large inputs before jobs with small inputs #1175

Closed

glennsl mentioned this issue Feb 24, 2017

Bsb build order is seemingly non-deterministic (wrong-deterministic?) rescript-lang/rescript-compiler#1264

Closed

comicfans mentioned this issue Sep 21, 2017

support explicit build order #1333

Open

jhasse mentioned this issue Dec 4, 2018

Library most depended on builds last causing serialized link #1504

Closed

jhasse mentioned this issue May 8, 2019

[RFE] compile modified files first #1570

Open

peterbell10 mentioned this issue Aug 25, 2021

Add critical path scheduler to improve build times #2019

Closed

jhasse mentioned this issue Jun 20, 2022

Idea: remember what translation units were most slow to compile and schedule these jobs earlier. #2157

Closed

jhasse added this to the 1.12.0 milestone Mar 1, 2024

jhasse closed this as completed Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use better logic for picking which node to build first #376

Use better logic for picking which node to build first #376

evmar commented Jul 27, 2012

RedX2501 commented Apr 30, 2014

nico commented Sep 22, 2014

orenbenkiki commented Feb 4, 2015

nico commented Feb 5, 2015

orenbenkiki commented Feb 5, 2015

evmar commented Feb 5, 2015

maximuska commented Feb 5, 2015

orenbenkiki commented Feb 5, 2015

maximuska commented Feb 5, 2015

nyh commented Mar 31, 2015

nathanaelg commented Sep 28, 2015

maximuska commented Sep 28, 2015

nathanaelg commented Sep 28, 2015

jgunthorpe commented Oct 30, 2015

comicfans commented Sep 20, 2017 •

edited

Loading

travisdowns commented May 7, 2022

jhasse commented Mar 1, 2024

Use better logic for picking which node to build first #376

Use better logic for picking which node to build first #376

Comments

evmar commented Jul 27, 2012

RedX2501 commented Apr 30, 2014

nico commented Sep 22, 2014

orenbenkiki commented Feb 4, 2015

nico commented Feb 5, 2015

orenbenkiki commented Feb 5, 2015

evmar commented Feb 5, 2015

maximuska commented Feb 5, 2015

orenbenkiki commented Feb 5, 2015

maximuska commented Feb 5, 2015

nyh commented Mar 31, 2015

nathanaelg commented Sep 28, 2015

maximuska commented Sep 28, 2015

nathanaelg commented Sep 28, 2015

jgunthorpe commented Oct 30, 2015

comicfans commented Sep 20, 2017 • edited Loading

travisdowns commented May 7, 2022

jhasse commented Mar 1, 2024

comicfans commented Sep 20, 2017 •

edited

Loading