Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use better logic for picking which node to build first #376

Closed
evmar opened this issue Jul 27, 2012 · 17 comments
Closed

Use better logic for picking which node to build first #376

evmar opened this issue Jul 27, 2012 · 17 comments
Labels
Milestone

Comments

@evmar
Copy link
Collaborator

evmar commented Jul 27, 2012

See discussion in #60

@RedX2501
Copy link
Contributor

What is the status on #60 and #232 ? Has any of the changes proposed on those been merged? Has a different mechanism being implemented?

@nico
Copy link
Collaborator

nico commented Sep 22, 2014

https://github.com/nico/ninja/compare/cpsched?expand=1 is a proof-of-concept implementation of letting Plan::FindWork use a priority list for scheduling, and implements a critical-path scheduling algorithm to build the priority list. I've only tested it on ninja itself so far.

@orenbenkiki
Copy link

Is this likely to be merged? I have a pretty involved old-style Makefile I would like to speed up. My main problem is critical path scheduling - I have some tests that take a long time (almost a minute) to run, and it is nearly impossible to trick make -j to schedule these ahead doing a lot of other smaller tasks that could happen in parallel. I can live with the ~10 seconds overall make tax, but waiting another minute for the build to finish because of bad scheduling is starting to get to me. I was hoping ninja could solve my problem, but if it doesn't do critical path analysis either, then it seems all I'll gain is the 10 seconds or so, not worth the effort of converting the Makefile...

BTW, about #60 - it is easy to provide this functionality within a critical-path algorithm by artificially increasing the criticality of tasks that failed when they were last attempted. The "correct" amount to increase by would make a good PHD thesis (involving things like the probability that this new build will succeed, how long it takes to edit the sources to fix the error, and the phase of the moon); in practice adding some configurable amount should be fine.

@nico
Copy link
Collaborator

nico commented Feb 5, 2015

I tried that on larger projects, and in practice it seemed to do way worse than the current scheduling approach, maybe because it's worse for the disk cache. In practice, most processes seem to be shortlived and there are many of them, so being nice to the disk cache (which the current behavior is) seems more important than being smart about the order. If there are long-running tasks like in your case, the tradeoff is different though.

Have you tried using the branch linked to above and measure how much it helps?

@orenbenkiki
Copy link

What I did is log (out of the existing Makefile) when does each task start, end, and how much CPU % it consumes (this would be lower for an IO intensive task). I then visualized this on a diagram. I can see that my few long-living, high-CPU tasks come "too late", while a ton of short-lived, low CPU % tasks come at the start (where Make doesn't even manage to spawn enough of them to saturate the CPUs). If I could manually rearrange the order they were executed, I could reduce my 3.5 minute build time by about 1 minute (the dependencies allow for that). Also, if I could say "run more of the low-CPU % tasks in parallel" I could probably get it down for 2m for a full build. I think I should be able to do that by using pools in Ninja...

The Makefile is pretty involved (several hundred lines of rules, using GNU make macros and wildcard expansions and so on). Converting it to Ninja would take "some effort" - I'm not even certain what the best way would be (would I be able to control everything I want using CMAKE, such as pools etc.? Should I just write some shell/python/ruby script that directly spits out the Ninja rules? etc.).

There are other things I can do to speed things up. For example, thanks to my "wonderful" IT department, the compiler and the libraries are installed on an NFS disk... sigh. I could use precompiled headers to reduce some of that pain but there's a GCC bug about such headers that prevents us from doing so (I opened a bug report in GCC, I guess it might be fixed in a few years...). I could do a private install of GCC on a local disk in each and every machine...

The problem is that this is my "day job" and I have a lot on my plate. My build time is 3.5 minutes for a full build, and a "nothing to do build" is almost 20 seconds. These numbers aren't enough for me to justify taking a lot of time off my development tasks to "goof off playing with the build system". But it gets annoying and my build times will just grow as we pour more code into the system. I need to be careful deciding where to spend my efforts. Switching to Ninja, even speculatively, would depend on whether I have a good expectation it will actually help.

Your point about the disk cache is interesting. It would make sense that "immediately" using the results of a previous step would be more efficient than returning to it after a "long time". Also, using the same source files in parallel steps would be more efficient than running these steps in different times. And, of course, doing a critical path analysis would also help...

Optimal scheduling is NP-complete on the best of days. The general solution is "critical chain analysis" (it takes resource contention into account - e.g., considering the effect of pools). There are reasonable, well-known heuristics for it. But it doesn't take into account issues like different times due to caching, as you pointed out. They can be extended to cover these but the hard part would be to estimate the effects of the caching on a given schedule (do we saturate the disk cache here? how much will it cost? etc.).

Perhaps one could get "most of" of the goodness is with combining simple tricks. At any given time, you have a list of nodes you can schedule. Suppose you had a modular scheduler, allowing to plug-in different (possibly competing) heuristics which act as tie-breakers. Say each is giving a score to each node, and you decide by the sum of the scores. A configuration file would enable/disable, give parameters, and apply a linear scale to the results of each heuristic, to calibrate it against the others.

You could then provide heuristics like "prefer to consume task results as soon as they were created", as well as "prefer to read the same source files in parallel tasks that run together", "prefer to earlier run tasks that failed on a previous build", and also classic critical path/chain analysis. People would be able to easily contribute new heuristics, and people would be able to tweak their configuration files to get an optimal schedule for their projects.

Being obsessed with build speed is what Ninja is all about, right? :-)

@evmar
Copy link
Collaborator Author

evmar commented Feb 5, 2015

One approach is to continue to use Makefiles but do more work ahead of time in a script that generates your Makefiles. This script can easily be changed to use Ninja at some point if it looks promising.

(CMake can be thought of as one choice for the language to write your script in. But if your Makefiles are complex then maybe you'll think it's easier to write your own Perl script or whatever.)

@maximuska
Copy link
Contributor

The Makefile is pretty involved (several hundred lines of rules, using GNU
make macros and wildcard expansions and so on). Converting it to Ninja
would take "some effort" - I'm not even certain what the best way would be
(would I be able to control everything I want using CMAKE, such as pools
etc.? Should I just write some shell/python/ruby script that directly spits
out the Ninja rules? etc.).

I had a similar dilemma upon a time. It took me ~3 evenings to *prototype
*a replacement of our proprietry build system with ninja to get a feeling
of how fast will that be with ninja, I suggest you giving a try to that
approach to see if you want to proceed to a complete solution.

There is a makefiles debugging hack
http://www.drdobbs.com/tools/debugging-makefiles/197003338 which you
probably can abuse to prototype a ninja manifest from makefile, which is
overriding 'SHELL' variable to dump the commands and their dependencies to
a file. E.g., add to your top makefile: SHELL = $(warning === $@ : ($^)*
===) /bin/bash -v*, and postprocess the output of a clean build.

One of the best (IMO) decision which I've made in that respect was not to
rewrite the manifest manually, rather to "enhance" the original build
system to generate ninja manifest instead of running the build. It turned
out to be important because not everybody were happy to switch to ninja,
and this hack allows us using ninja pretty without any extra maintenance,
as updating the legacy build system definitions are reflected automatically
in the generated manifest. (However if you are certain you can convenience
everybody to replace the build system entirely, go 'all in', of course).

Maxim

@orenbenkiki
Copy link

I wish that make had a --ninja flag that would just dump a Ninja file (it is, after all, pretty much a syntax of the internal structures that make creates after expanding all macros etc.). The shell trick is nice, I can use it as a basis. But I have dynamic lists of rules (based on file lists extracted from the file system, etc.) so it can't be a permanent solution.

Ideally I need a 2-phase system - regenerate the Ninja file when the file lists change (rarely) and a fast build (using Ninja). The 1st phase can be "almost anything".

@maximuska
Copy link
Contributor

It will be interesting to hear the conclusions, if you make any progress
with that!
On 5 Feb 2015 10:31, "Oren Ben-Kiki" notifications@github.com wrote:

I wish that make had a --ninja flag that would just dump a Ninja file (it
is, after all, pretty much a syntax of the internal structures that make
creates after expanding all macros etc.). The shell trick is nice, I can
use it as a basis. But I have dynamic lists of rules (based on file lists
extracted from the file system, etc.) so it can't be a permanent solution.

Ideally I need a 2-phase system - regenerate the Ninja file when the file
lists change (rarely) and a fast build (using Ninja). The 1st phase can be
"almost anything".


Reply to this email directly or view it on GitHub
#376 (comment).

@nyh
Copy link

nyh commented Mar 31, 2015

I have run into exactly same issue described by issue #60: I have a complex and slow to compile C++ project, where full compilation takes almost 10 minutes, a lot of the code is in header files, and changing one header file causes 50 source files to need recompiling.
In my usual development cycle, I modify one .hh and one .cc file (for example), and recompile. I'd like that .cc file to be compiled first - instead of compiling 50 other .cc files first, and only then realizing I have a syntax error in that single .cc I actually changed.

@nathanaelg
Copy link

I have a case where one task in my build is (unfortunately) completely serial, has no build time dependencies and provides one library which is needed at link time by one executable. What I want to do is make sure that it is the first thing that is run (if it needs to be run) but because it is single threaded I want to run other stuff at the same time as well. Any ideas as to how to achieve that?

@maximuska
Copy link
Contributor

Wouldn't adding a build order dependency of this target do that?

On 4:48PM, Mon, Sep 28, 2015 nathanaelg notifications@github.com wrote:

I have a case where one task in my build is (unfortunately) completely
serial, has no build time dependencies and provides one library which is
needed at link time by one executable. What I want to do is make sure that
it is the first thing that is run (if it needs to be run) but because it is
single threaded I want to run other stuff at the same time as well. Any
ideas as to how to achieve that?


Reply to this email directly or view it on GitHub
#376 (comment).

@nathanaelg
Copy link

If I make everything dependent on the slow task then it will definitely start first, but then everything else will only start once it has completed (which is a waste of time). If I make nothing dependent on it, and only the executable dependent on the slow task it seems to start quite late in the build.

@jgunthorpe
Copy link

Seconded @nyh - I also have a large C++ project, and have the same desire.

Have an option for ninja to sort the current set of runnable targets by newest modification date of the first dependency, or something similar. The goal is to compile the files I've actually been changing first, and then go on to compile the files that haven't been changed.

A secondary great idea would be to record failures and have the next ninja run sort previously failed targets first. ie if a header file is changed to fix a build error then I want to know quickly that the fix is good instead of waiting for the failed .cc to eventually come up.

To be clear, this is optimizing the compile-build-fix error-compile cycle (try to minimize time to error), not minimizing total build time as other have been talking about.

@comicfans
Copy link

comicfans commented Sep 20, 2017

currently I have one very long build job (10 minutes to complete linking) among other short jobs, and seems ninja schedule this job too late, makes total build time longer than necessary, as profiler shows:
all
but if I build the longest target only, it have only a few dependencies
only
I think ninja can improve this by

  1. when have history build log(and user didn't give explicit targets order) , try building longest path first (we already have complete history data)
  2. user provide explicit targets in command line ,earlier appeared target takes higher priority, so user can guild ninja to get better choice (when there is no history build log, or parts of some history build ).

I think both can be implemented by simply reverse walking the dependency graph to find longest path, for first one, we can:

  1. mark all final exit nodes (which is not used as inputs of any rules) as start point distance 0
  2. walk back to its inputs nodes , mark these nodes distance =rule build time+ outputs distance . if it's already marked, longer one wins. (if no history build time for rule ,just use 1 unit)
  3. rule priority is same as the output nodes distance
    always choose to build highest one , we always choose the edge to reach current longest path earliest.

for second one , we can repeat previously steps, just exclude all targets listed in command line from first step, when all nodes marked , makes last target(in command line) outputs distance = current longest distance , and mark from this target again . repeat until all target used as start point.
later marked path always have higher distance than previous pass, so earlier target always takes higher priority . this also works when targets have common subgraph. edges in subgraph always marked as highest , when subgraph built, higher target reaches earlier.

-----update
I've created a test branch to implement this , most based on nico's cpsched branch, with explicit build order, my build time decreasing from 22 min to 14 min, much closer to the longest build job only time (13 min)
improve

comicfans referenced this issue in comicfans/ninja Jan 5, 2018
when targets is given in command line , gives higher build priority to target
appeared earlier.for example you know longestjob1 longjob2 in project have not
been built before, you can run
ninja longestjob1 longjob2 all
to guild ninja schedule them first to improve overall build time. (this is
only needed when such long job have never been built before for first
time. ninja always auto schedule the best build order for all targets which
built before).
this is implmented by assign weight * priority as initial value when compute
edge critical time. weight is computed as "total time of these targets if
built in serial" so its big enough to assume forward edges of higher priority
target always got higher value than edges on lower ones.
@travisdowns
Copy link

I can't wait for the the PR in #2019 to be merged, it will be awesome.

In many cases it would cut our build times by a ~third, because we have one file (with no input task dependencies) which takes 2x as long as most files: this file should be built immediately at the start but ninja choose to build it near the end, adding about 1 minute to a 3 minute build. Almost any old scheduling algorithm examining .ninja_log would solve this immediately.

@jhasse
Copy link
Collaborator

jhasse commented Mar 1, 2024

fixed by #2177

@jhasse jhasse closed this as completed Mar 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.