Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record most recent input's mtime in log file for correct dirty status computation #1165

Closed
wants to merge 1 commit into from

Conversation

jdrouhard
Copy link
Contributor

@jdrouhard jdrouhard commented Jun 9, 2016

This adds support for recording an edge's most recent input file's mtime in the build log. This fixes #1162, by giving ninja a better metric for determining whether an edge's outputs are dirty.

If an edge's output files' mtimes are compared to the most recent input's mtime, edges might be calculated as clean even if they are actually dirty. While an edge is running its rule to produce its outputs and an input to the edge is updated before the outputs are written to disk, then subsequent runs will think that the outputs are newer than the inputs, even though the inputs have actually been updated and may be different than what were used to produce those outputs. This can be disastrous if the resulting object files still link, since some of the object files might be out of date.

Instead of comparing output mtimes, ninja will add the most recent input's mtime to the build log file (calculated when the edge first calculates its dirty status) when an edge completes. On subsequent runs, ninja will use this value to compare to the edge's most recent input's mtime to determine whether the outputs are dirty, which is provably more correct and safer, since outputs are guaranteed to actually be up to date.

A test is included for recording the correct input_time in the log file, ensuring that subsequent ninja runs utilize that value correctly.

@jdrouhard
Copy link
Contributor Author

rebased on current origin/master

@jdrouhard
Copy link
Contributor Author

rebased on current origin/master

@jdrouhard
Copy link
Contributor Author

@evmar @nico Is there any way you guys could take a look at this and get it merged in? We're still experiencing the negative effects of ninja without this patch and it's been over a year now since I originally opened the pull request. Thank you for all the hard work you put in to make such a great build tool.

@jdrouhard jdrouhard force-pushed the log_input_mtime branch 2 times, most recently from 8b2e64d to b40919b Compare June 23, 2017 15:50
@nico
Copy link
Collaborator

nico commented Oct 2, 2017

This adds a whole bunch of machinery for something most people don't seem to need, and that has a workaround that doesn't need ninja changes (see discussion on #1162). So I'm a bit reluctant about merging this.

@jdrouhard
Copy link
Contributor Author

The workaround doesn’t actually work. Wrapping the build commands seems like a good idea initially, but it doesn’t capture implicit dependencies.

This pull request removes an inherent race condition between the time an output starts building and the time it finishes. I don’t think this code is complex enough to not merge, when it actually makes ninja produce more correct builds. Correctness is a good thing, and worth the tiny bit of extra complexity to ensure it.

@jseba
Copy link

jseba commented Oct 9, 2017

Wrappers are a nightmare. They introduce a layer of indirection that "hopefully" doesn't cause any sort of conflict with the tool. Plus the fact the only good use of a wrapper is to provide a nicer interface around the tool in question to reduce the complexity of the tool in question (see Clang's and GCC's drivers).

Also, as I understand the reason for the change, Ninja right now cannot prove that it's build state is valid. If there is any way for a file to be modified after a job for an output is started, the build state is now indeterminate, as it has no way of detecting that the input files changed between when the job started and when it finished. This changes makes it so that the output's mtime is snapshotted as the job is started, which means that any further changes outside Ninja's control can't leave it in an invalid state.

I would think that a build tool that can't prove that it's in a valid state isn't very useful.

@jdrouhard
Copy link
Contributor Author

@nico I understand your reluctance to merge this. However, I just want to implore that you reconsider the discussion in #1162 and realize that the workarounds mentioned do not work.

If we consider the number one goal of a build system, isn't it to ensure that all builds are correct and repeatable? Without my fix, ninja is fundamentally broken, as you can't truly guarantee that outputs are up to date (try running the simple test case I added without my fix in place). This fix isn't even that invasive--it adds a new field to the log file, but that's about the most complex thing it does.

Please consider merging this early in the release cycle so it can get as much testing as possible. I think we can all agree that a provably correct build system is worth it. I think everyone needs this even if they don't realize it.

@neumannsteve
Copy link

This would be a very useful change, as a build system that you can't trust is... suboptimal. The change itself appears to be fairly minimal.

@jdrouhard
Copy link
Contributor Author

@nico @evmar I am currently working on updating this pull request to use the new high resolution timestamps that was recently merged.

@jdrouhard jdrouhard changed the title Record edge's build start time in log file Record most recent input's mtime in log file for correct dirty status computation Oct 22, 2017
@jdrouhard
Copy link
Contributor Author

I changed this PR's implementation a bit after going through the changes to have nanosecond timestamp precision. Instead of taking a snapshot of the system time when an edge begins generating its outputs, the log file will now just record the mtime of the most recent input as calculated when ninja is determining whether an edge is dirty.

This is even more correct than my original PR, and dovetails nicely with #1219 since this improvement is based solely on mtimes of actual files (instead of also trying to compare to a snapshot of the current system time). I updated the initial description of the PR to reflect the new implementation and to explain how this fixes the original problem.

@jdrouhard jdrouhard force-pushed the log_input_mtime branch 3 times, most recently from 1a01827 to 5ace915 Compare October 22, 2017 21:02
…tus computation

If an edge's output files' mtimes are compared to the most recent
input's mtime, edges might be calculated as clean even if they are
actually dirty. While an edge is running its rule to produce its
outputs and an input to the edge is updated before the outputs are
written to disk, then subsequent runs will think that the outputs are
newer than the inputs, even though the inputs have actually been updated
and may be different than what were used to produce those outputs.

Instead of comparing output mtimes, ninja will add the most recent
input's mtime to the build log file when an edge completes if all
implicit dependencies were stat'd before the output began to build (all
implicit dependencies were already loaded via the deps log file). It
looks at both the explicit and implicit dependencies to get the most
recent input. On subsequent runs, ninja will use this value to compare
to the edge's most recent input's mtime to determine whether the outputs
are dirty if a valid mtime was recorded in the log file.
@dkrikun
Copy link

dkrikun commented Dec 13, 2017

Hello, any chamce this could be merged soon? Thank you!

@jhasse
Copy link
Collaborator

jhasse commented Nov 16, 2018

Hello, any chamce this could be merged soon?

I will review this soon, but it definitely won't make it into 1.9, sorry.

@jdrouhard
Copy link
Contributor Author

@jhasse Thank you for taking some time to look at it. Before you get too far in reviewing, I'd like to hear your thoughts on the two possible implementations to solve #1162.

This PR currently does not solve the issue 100% of time, as it requires all inputs to already be in the deplog before it will use the input mtimes for dirty status checking. This means that the very first ninja run, or any new input dependencies that are introduced are still subject to the race condition.

The other solution (which was the first approach I took), is to take a system time stamp right when a build begins, and stamp that time in the deplog for each generated output. This makes ninja correct 100% of the time, but might require some various system-dependent mechanisms for getting the most accurate (file-system mtime-equivalent) system time. Linux, Windows, and FreeBSD all have their own syscalls and interfaces for this.

I actually prefer (2) here, but before I modify the PR to do it this way, I'd like to get consensus. (2) will require some additional platform-specific code for getting the current system time, but would make ninja 100% correct.

@nico
Copy link
Collaborator

nico commented Nov 18, 2018

FWIW I'd still recommend that you maintain this on your fork as mentioned above. Re "fundamentally broken": That's true for this use case, but that use case isn't common, and it doesn't mean ninja isn't still useful. Make has had the same issue for decades from what I understand. (mtime-based tools are also "fundamentally broken" on NFS shares, but if you build on an NFS share you already have bigger problems, etc.)

@jhasse
Copy link
Collaborator

jhasse commented Apr 17, 2019

That's true for this use case, but that use case isn't common

I use ninja in my IDE. Often, after starting a build, I see an error and fix it while the build is still running. I have to make sure not to save the file while ninja. I think it's a common scenario.

What isn't common though, that this leads to a bug that someone notices. But that makes it even more relevant in my opinion: Hard to reproduce bugs are the worst ;)

I actually prefer (2) here, but before I modify the PR to do it this way, I'd like to get consensus. (2) will require some additional platform-specific code for getting the current system time, but would make ninja 100% correct.

Not sure if that would make it 100% correct, as there's the difference between the system time and the time the filesystem gets updated (i.e. is the filesystem operation atomic? And are all the clocks synchronized?).

@jdrouhard
Copy link
Contributor Author

I actually prefer (2) here, but before I modify the PR to do it this way, I'd like to get consensus. (2) will require some additional platform-specific code for getting the current system time, but would make ninja 100% correct.

Not sure if that would make it 100% correct, as there's the difference between the system time and the time the filesystem gets updated (i.e. is the filesystem operation atomic? And are all the clocks synchronized?).

100% is probably a bit optimistic. Of course it depends on the timestamp granularity of the filesystem being used, and how the filesystem decides to record the mtime when writing files. If we allow ninja to record the most granular timestamp we can glean from the OS as possible when running a command for an output, we can dramatically (and depending on filesystem--completely) remove the race condition that this PR tries to solve. Your point about the difference between system time and time the filesystem gets updated shouldn't matter with my second solution, since the timestamp for the system time (which is recorded in the deplog) is taken before the job is even started, and the only time that matters on the other end is the time the filesystem uses to stamp the mtime attribute.

Scenarios:

  1. Timestamp taken by ninja is more accurate than the filesystem timestamp, so the ninja timestamp appears to be farther in the future than the mtime of the output file:
    Ninja produces a false positive (spurious rebuild) the next time the build is run. I prefer this to missing a necessary rebuild.
  2. Filesystem rounds times down (might write a time in the near-past):
    Same situation as above. Ninja might produce a false positive and cause a spurious rebuild when not necessary. Still preferred.

@jhasse
Copy link
Collaborator

jhasse commented Apr 21, 2019

Sounds good :)

@jdrouhard
Copy link
Contributor Author

Ok, I'll work on adjusting this PR to go down that route. Stay tuned.

@jhasse jhasse added this to the 1.11.0 milestone Aug 30, 2019
@jdrouhard
Copy link
Contributor Author

@jhasse I plan on getting back to this soon. Life kinda got in the way last summer and this fell down the priority list.

@evmar
Copy link
Collaborator

evmar commented Mar 3, 2020

Is it a guarantee that the timestamps used by the file system match the system time? I wouldn't naively expect them to be necessarily related -- e.g. stuff like different timestamp resolution, network file systems, delayed writes and so on. I think if you're making a decision based on fs-reported timestamps the correct thing is to use those timestamps as if they were opaque.

(disclaimer: haven't touched ninja in like 8 years at this point)

@jdrouhard
Copy link
Contributor Author

@evmar I've had awhile to think about this problem again. I agree that using a system time to compare against filesystem times is probably not the ideal solution for many reasons, one of which is (as you mention) that we can't reliably use a filesystem-reported mtime since it could be the system time of another machine, or not a time at all.

Avery Pennarun has a great summary of the problems with mtime comparisons for build tools in his blog: https://apenwarr.ca/log/20181113

I think I'm going to close this PR and think about possible ways to incorporate some of redo's ideas into ninja.

@evmar
Copy link
Collaborator

evmar commented Mar 3, 2020

My take: the best option is to hash the inputs -- not their contents, but rather the names and mtimes of the inputs treated as a blob. Then if anything changes (time forwards or backwards) we rebuild.

@jdrouhard
Copy link
Contributor Author

That's essentially where I'm going with it, but I might hash/track more than the name and mtime of inputs. I'm considering inode number and file size as well.

The issue I'm still trying to resolve is how to do initial dependency discovery. The first time ninja builds an output, it doesn't know all its dependencies yet. The compiler can generate this information (-MMD flag which generates a .d file), but until we have that, we can't hash the input information. This means that the first time we build an output, we have the same issue where one of the inputs might change before the output is written to disk, even though that output was created with now-stale inputs. I believe redo fixes this by doing a separate dependency discovery step before running the rule to create the final output.

I want to make the final solution resilient against all of these types of issues.

@jdrouhard
Copy link
Contributor Author

@jhasse I have another version of this PR that resolves this issue rather cleanly. It can't be perfect (due to the nature of implicit dependency discovery via depfiles), but it gets close. And when used in combination with @bradking's dyndep functionality, it will be 100% correct.

Do you want me to force push this new implementation to this PR, or close this one and create a new one?

@jhasse
Copy link
Collaborator

jhasse commented Mar 15, 2020

Close this one and create a new one :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Long running compile commands can cause incorrect dirty status computation
7 participants