Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider enabling unstable fingerprint check for cache-loaded query results too #85783

Open
michaelwoerister opened this issue May 28, 2021 · 4 comments
Labels
A-incr-comp Area: Incremental compilation A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@michaelwoerister
Copy link
Member

Correctness of incremental compilation relies on the assumption that a query always yields the same result for the same set of inputs, i.e. that queries are deterministic functions. We recently discovered that checking fingerprint stability of query results can detect violations of this assumption and that breaking this assumption can lead to miscompilations (see #84970).

This issue argues that a similar set of circumstances can arise in the code path where we have not yet enabled the fingerprint stability check by default.

First some background on how miscompilations can happen: When the query system is asked to execute a query Q in incremental mode it will first walk the dependency graph of the query, trying to mark Q's dep-node as green. If all of Q's inputs are green, Q itself is green too and the system assumes that its result is the same as in the previous compilation session. While walking the graph for marking, all the nodes encountered are also marked as green -- however, their corresponding queries are not necessarily executed. Only later if another query actually needs the result of one such intermediate query Q_i, it will be executed. If at that point the already green query yields a different result than what we previously relied on, we end up with an inconsistent (and therefore dangerous) state: Some dependents of Q_i have loaded their result from the cache under the assumption that Q_i's result was the same as in the previous session; but other dependents of Q_i have computed their own result with a different value for Q_i as input:

                 +-------- Q1 (assumes Q_i == x)
                 |
                 v 
(...) <-------- Q_i 
                 ^
                 |
                 +-------- Q2 (computes its result with Q_i == x' as input)

Assuming for example that Q1 is a vtable layout with methods in the order m1, m2, m3 and Q2 is an indirect method call to m2 -- assuming that m2 is at index 0 because Q_i yielded the vtable order m2, m3, m1 when re-executed -- then we can see how this can lead to a miscompilation where the method call that Q2 represents calls the wrong method.

Now, #83007 turned on fingerprint stability checks for all cases where a query is re-executed by the query system. This will catch violations of the query determinism assumption because we compare the fingerprint on disk (corresponding to Q_i == x) to the fingerprint of the newly computed value (corresponding to Q_i == x'). If the fingerprints differ we know there is a dangerous mismatch and abort the compilation session.

Note, however, that the same kind of mismatch can occur when the value for Q_i is loaded from disk: if Q_i is marked as green and Q1 thus loads its result from disk under the assumption that Q_i == x but then we later load the result of Q_i from disk and -- because of a bug in query result serialization -- we get Q_i == x', then we can end up with the same kind of dangerous inconsistency as in the other case. The diagram above would apply to this case too, no modification needed.

In summary: both unstable query provider implementations and buggy query result (de-)serialization can lead to the same kind of dangerous inconsistencies during incremental compilation, but we currently only have a safety check for the first of the two cases.

I personally believe that the risk of buggy serialization code is smaller than the risk of unstable query provider implementations but I still think it is prudent to check both cases.

@Aaron1011, do you agree with my analysis of the situation?

cc @rust-lang/compiler

@michaelwoerister michaelwoerister added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-incr-comp Area: Incremental compilation C-bug Category: This is a bug. labels May 28, 2021
@Aaron1011
Copy link
Member

@michaelwoerister I agree with everything you wrote. The only reason that I didn't turn on hash verification for deserislized query results was because of the performance impact - hopefully, we can speed up the re-hashing and comparison.

@michaelwoerister
Copy link
Member Author

Great, thanks for taking a look, @Aaron1011!

@wesleywiser
Copy link
Member

Perhaps we could start by enabling this for rustbuild? At least then, we will get some testing during rustc development.

@inquisitivecrystal
Copy link
Contributor

@rustbot label +A-query-system

@rustbot rustbot added the A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) label Jun 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-incr-comp Area: Incremental compilation A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants