Long running compile commands can cause incorrect dirty status computation #1162

jdrouhard · 2016-06-06T20:53:21Z

This was a bit of a head scratcher for us and took us awhile to nail down. It seems that ninja is looking purely at the mtimes of inputs and outputs to determine if an edge is dirty. This doesn't seem to be quite sufficient, because a long running compile command might have an input that is changed during the compilation, which would still result in an output file with a newer mtime than the most recent input file, even though the output file is actually dirty since it was started with an older version of one of its inputs.

This is difficult to reproduce, but I did the following:

Create a gcc wrapper that splits compile commands into two: a preprocess step and a compile step.
Add a sleep between the preprocess and compile to simulate a long running compile command.

Create a simple build.ninja file that just compiles a single test file into an executable.

# build.ninja
rule cxx
  command = /path/to/gcc-wrapper -o $out -c $in
rule linker
  command = /path/to/gcc-wrapper $in -o $out
build test.c.o: cxx /path/to/test.c
build repro: linker test.c.o

// test.c
int main() { return 0; }

Run ninja using the wrapper. While it's sleeping, change the source file (eg, return a different value).

After it finishes linking, running the executable will return the original value (as expected), but subsequent ninja runs will not detect that the output is dirty and will report nothing to do, even though it is stale.

This is an issue for us here as we have a lot of large translation units and tend to have a rapid modify/build/test cycle.

The following code is where it checks the mtimes of the most recent input against the output:

ninja/src/graph.cc

Lines 161 to 184 in b231274

    
           // Dirty if the output is older than the input. 
        
           if (most_recent_input && output->mtime() < most_recent_input->mtime()) { 
        
             TimeStamp output_mtime = output->mtime(); 
        
             // If this is a restat rule, we may have cleaned the output with a restat 
        
             // rule in a previous run and stored the most recent input mtime in the 
        
             // build log.  Use that mtime instead, so that the file will only be 
        
             // considered dirty if an input was modified since the previous run. 
        
             bool used_restat = false; 
        
             if (edge->GetBindingBool("restat") && build_log() && 
        
                 (entry = build_log()->LookupByOutput(output->path()))) { 
        
               output_mtime = entry->restat_mtime; 
        
               used_restat = true; 
        
             } 
        
             if (output_mtime < most_recent_input->mtime()) { 
        
               EXPLAIN("%soutput %s older than most recent input %s " 
        
                       "(%d vs %d)", 
        
                       used_restat ? "restat of " : "", output->path().c_str(), 
        
                       most_recent_input->path().c_str(), 
        
                       output_mtime, most_recent_input->mtime()); 
        
               return true; 
        
             } 
        
           }

Though I'm not sure what the best (fastest) way to actually solve this would be, I'm willing to take a crack at it if we come up with a decent idea for how to do so.

The text was updated successfully, but these errors were encountered:

jimon · 2016-06-06T21:04:57Z

I guess it's fine for any build system (ninja, make, etc) to make an assumption "inputs are not changing while command is running", so build system should write down to cache mtimes of inputs before it started executing a command, and than build system should gracefully pickup new changes in second run. At least this is what I expect from any build system on the market.

evmar · 2016-06-06T23:23:13Z

Two thoughts:

Editing files while you compile is a recipe for confusing build outputs, even if the build system is correct. I think the Google build system (bazel) actually aborts your build if it detects that you modified a file while it was building. We could make Ninja do that by re-stat()ing all inputs after a command completes and deleting the output if they changed.
You probably don't like that answer. Making this just work is probably more involved. I have come to understand that using mtimes at all for time comparisons is a mistake. A better approach is to compute some sort of hash of "all the input stuff" -- which includes the the set of input files, their mtimes, and the command line -- and then rebuild if that hash ever changes. (That model also would make it trivial to extend that into a system which completely ignores mtimes and just uses the hash of the inputs' contents as well, which is a feature people have asked for.) We in fact have something sorta like this already in place for handling command line changes -- see the HashCommand right below and imagine also feeding it all the input mtimes. However, implementing this would be a pretty involved change to Ninja, and I just thought of the idea right now so I haven't yet spotted the flaws in it.

dpranke · 2016-06-06T23:39:41Z

On Mon, Jun 6, 2016 at 4:23 PM, Evan Martin notifications@github.com
wrote:

Two thoughts:

Editing files while you compile is a recipe for confusing build
outputs, even if the build system is correct. I think the Google build
system (bazel) actually aborts your build if it detects that you modified a
file while it was building. We could make Ninja do that by re-stat()ing all
inputs after a command completes and deleting the output if they changed.
2.

You probably don't like that answer. Making this just work is probably
more involved. I have come to understand that using mtimes at all for time
comparisons is a mistake. A better approach is to compute some sort of hash
of "all the input stuff" -- which includes the the set of input files,
their mtimes, and the command line -- and then rebuild if that hash ever
changes. (That model also would make it trivial to extend that into a
system which completely ignores mtimes and just uses the hash of the
inputs' contents as well.) We in fact have something sorta like this
already in place for handling command line changes. However, implementing
this would be a pretty involved change to Ninja, and I just thought of the
idea right now so I haven't yet spotted the flaws in it.

I think it is very difficult if not impossible to do (2) and have things
still be fast, unless you keep the build tool running and watch for file
changes. My vague understanding is that this is one of the reasons chrome
(or NaCl?) moved away from SCons in the first place (because SCons used MD5
hashes by default), but that was before my time ...

-- Dirk

evmar · 2016-06-06T23:44:23Z

Sorry, I mixed two ideas there: there's 2a (mix file mtimes into the hash of "all inputs"), which should be about as fast as the current code and will solve this bug.

And then I mentioned content hashing as idea 2b that this would enable, but it's something to consider another time. (It definitely has speed challenges but they are surmountable.)

jdrouhard · 2016-06-07T21:56:33Z

@martine: What about a simpler solution (for now at least), where the log file keeps track of the mtime of the most recent input for each output at the time the build was started? This would require a log version bump, but I've got this working locally. I can clean it up and submit the pull request if you are interested in seeing it.

Basically, in Builder::FinishCommand(), I loop through the edge inputs (similar to the loop when restat is in use), and store the most recent input's timestamp on the LogEntry. Then in DependencyScan::RecomputeOutputDirty(), I use that stored timestamp instead of the output's actual mtime if it's available when comparing it against the new most recent input's mtime. This fixes the reproduceable described above.

jdrouhard · 2016-06-07T23:51:12Z

Or instead of storing the most recent input's time, we could just store the time that the edge started building and use that as the output's mtime on subsequent runs. The most recent input's time doesn't quite work with generated inputs.

maximuska · 2016-06-08T07:12:12Z

Rather than changing ninja, wouldn't it be sufficient to wrap your build commands in a script which would record mtime (or md5) of the input files before running the tool, verify these are still the same after the tool has written the output, and touching the changed inputs to make these newer than the output (or deleting the output and failing the command if you consider this an error)?

This won't be quite trivial with the implicit dependencies (I mean header files which are only discovered in process of building the target, using 'dep files'), but none of your proposals seems to address that either.

jdrouhard · 2016-12-02T16:14:08Z

Just to update this issue, I did resolve the bug a few months back with my pull request referenced above. This could technically be solved with a wrapper script, but is unnecessarily complex and error-prone. The build tool should philosophically be able to handle cases such as this.

My fix resolves all the issues I listed, plus the concern @maximuska brought up. On subsequent builds, it will always use the time the output began building as its "mtime" so any implicit dependency discovery or input file differences will be compared against the time the output began building instead of comparing against its actual filesystem mtime.

It makes ninja's dirty output checking actually correct by removing the race condition that occurs by changing an input between the output starting its build and the output being written to disk.

If there are any qualms about my pull request, we can start a discussion there.

jdrouhard mentioned this issue Jun 9, 2016

Record most recent input's mtime in log file for correct dirty status computation #1165

Closed

nico mentioned this issue Nov 15, 2018

Race condition when input file is updated while compiling #1497

Closed

jhasse added the bug label Nov 16, 2018

jhasse added this to the 1.10.0 milestone Nov 16, 2018

jhasse removed this from the 1.10.0 milestone Aug 30, 2019

jdrouhard mentioned this issue Mar 15, 2020

Change build log to always log the most recent input mtime #1753

Merged

jhasse closed this as completed in #1753 Mar 20, 2021

jdrouhard mentioned this issue Mar 28, 2021

Provide resiliency against inputs changing during the build #1943

Merged

jhasse added this to the 1.12.0 milestone May 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Long running compile commands can cause incorrect dirty status computation #1162

Long running compile commands can cause incorrect dirty status computation #1162

jdrouhard commented Jun 6, 2016

jimon commented Jun 6, 2016

evmar commented Jun 6, 2016 •

edited

Loading

dpranke commented Jun 6, 2016

evmar commented Jun 6, 2016

jdrouhard commented Jun 7, 2016

jdrouhard commented Jun 7, 2016

maximuska commented Jun 8, 2016 •

edited

Loading

jdrouhard commented Dec 2, 2016

Long running compile commands can cause incorrect dirty status computation #1162

Long running compile commands can cause incorrect dirty status computation #1162

Comments

jdrouhard commented Jun 6, 2016

jimon commented Jun 6, 2016

evmar commented Jun 6, 2016 • edited Loading

dpranke commented Jun 6, 2016

evmar commented Jun 6, 2016

jdrouhard commented Jun 7, 2016

jdrouhard commented Jun 7, 2016

maximuska commented Jun 8, 2016 • edited Loading

jdrouhard commented Dec 2, 2016

evmar commented Jun 6, 2016 •

edited

Loading

maximuska commented Jun 8, 2016 •

edited

Loading