Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no_hash queries bypass -Z verify-ich checks #85857

Open
michaelwoerister opened this issue May 31, 2021 · 7 comments
Open

no_hash queries bypass -Z verify-ich checks #85857

michaelwoerister opened this issue May 31, 2021 · 7 comments
Labels
A-incr-comp Area: Incremental compilation A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@michaelwoerister
Copy link
Member

It was recently shown that "incremental compilation hash verification" (i.e. -Z verify-ich) is an important tool for finding potentially dangerous bugs in query provider implementations.

However, this verification of query result hashes naturally can only work if we actually compute hashes for the results of a given query. This is not the case for queries marked with the no_hash attribute. As a consequence, for such queries, an indeterministic query provider implementation cannot be detected by the measures we recently put in place.

There are two reasons for making a query no_hash:

  • Performance: hashing some query results is a waste of time because
    1. it is highly unlikely that a re-computed query can turn green or
    2. the result of the query is effectively hashed a second time via cheap "projection queries".
  • Necessity: The result of the query does not have a HashStable implementation and thus cannot be hashed.

There is also some good news here: if a query is marked as eval_always in addition to no_hash we don't have to worry about the kind of inconsistency bug that caused us to make the 1.52.1 point release because in that case there's always a single source of truth for the value of a query result. These also seem to be the kinds of queries that fall into the "necessity" category.

However, there still might be queries that are no_hash but not eval_always. There's usual a good (performance-related) reason for marking them as such. The question is: how can we make them still be checked by -Zverify-ich?

(@rust-lang/wg-incr-comp, @Aaron1011, or @cjgillot, please double check my thinking here, especially whether the combination of eval_always and no_hash is indeed safe)

cc #85783

@jonas-schievink jonas-schievink added the A-incr-comp Area: Incremental compilation label May 31, 2021
@Aaron1011
Copy link
Member

I don't think a no_hash but not eval_always query can actually lead to miscompilations. Any query that directly depends on a no_hash query always gets re-run, since we have no way of knowing whether or not the no_hash query is green. As a result, we will never implicitly re-use the result of the query - if the queries which directly depend on it see a changed result, their result will change as well (making them red).

I think no_hash effectively implies eval_always - any query that depends (directly or indirectly) on a no_hash query will cause that no_hash query to get re-run, since it is always marked as red. However, trying to check these queries could still be useful for detecting queries that accidentally depend on untracked global state. While such queries cannot cause a miscompilation, they can cause unnecessarily instability (e.g. the vtable order changing globally and consistently when we could have made it stay the same), and uncessarily invalidation of other queries (which would have ended up with a green result had the no_hash query produced the same result).

@Aaron1011
Copy link
Member

The problem, of course, is that using no_hash for performance reasons becomes pointless if we need to compute the hash anyway for -Z verify-ich.

@michaelwoerister
Copy link
Member Author

I think no_hash effectively implies eval_always - any query that depends (directly or indirectly) on a no_hash query will cause that no_hash query to get re-run, since it is always marked as red.

That's not actually the case, I think. Let's make sure we talk about the same thing and that I'm not overlooking something. Here's the diagram from #85783 again:

                +-------- Q1 (assumes Q_i == x)
                 |
                 v 
(...) <-------- Q_i 
                 ^
                 |
                 +-------- Q2 (computes its result with Q_i == x' as input)

Let's assume that Q_i is no_hash but not eval_always. That means that when we start marking Q1, we'll try to mark Q_i as green. It being no_hash has no influence on red-green marking at this point. If we can mark all of Q_i's inputs (denoted as (...) in the diagram) as green, we'll just mark Q_i as green too. Later, when Q2 needs the actual value of Q_i the query is re-executed. At no point did no_hash come into play here. The attribute only makes a difference if we can't mark a dep-node green and have to re-run the query for that reason. Does that make sense?

The problem, of course, is that using no_hash for performance reasons becomes pointless if we need to compute the hash anyway for -Z verify-ich.

Yes indeed 😃 I think we'll need to find a wider solution (that can also take care of #85783). That might involve turning on these checks just for nightly or creating large automated test suites that run with all possible checks and assertions turned on. We just have to find some way to catch these kinds of bugs early.

@cjgillot
Copy link
Contributor

My understanding is the same as @michaelwoerister's: no_hash queries can be marked as green iff all their dependencies are green. no_hash + eval_always essentially means "always red".

There is an order-dependency I don't like: if we try to mark green before computing the result, the query may be green (iff all its dependencies are). If we compute the result before trying to mark green, the query will always be red, no matter the state of the dependencies.

The issue is perf. This is the case for instance with thir_body.

@Aaron1011
Copy link
Member

It looks like my understanding of no_hash was incorrect - thanks @michaelwoerister and @cjgillot for correcting me.

Based on the previous incremental issues we've seen, I think the best way to catch bugs is to have end users running the compiler with the relevant assertions. Ideally, we could improve the hashing performance to the point where no_hash is no longer needed for perf, but I don't think that's very likely.

@michaelwoerister
Copy link
Member Author

There is an order-dependency I don't like: if we try to mark green before computing the result, the query may be green (iff all its dependencies are). If we compute the result before trying to mark green, the query will always be red, no matter the state of the dependencies.

Interesting. I don't think we do that though, that is, we'll always try to mark a query green before re-executing or loading it from the cache, right? If that is not the case anymore we'll need to restore that protocol again (and put assertions in place to keep it that way).

@inquisitivecrystal
Copy link
Contributor

@rustbot label +A-query-system +T-compiler

@rustbot rustbot added A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-incr-comp Area: Incremental compilation A-query-system Area: The rustc query system (https://rustc-dev-guide.rust-lang.org/query.html) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

6 participants