update new trait solver docs (rust-lang#1802)

* rewrite requirements/invariants * add some more info about the trait solver * CI * review
matthiaskrgr · Oct 5, 2023 · b98af7d · b98af7d
1 parent b7e20c4
commit b98af7d
Show file tree

Hide file tree

Showing 4 changed files with 212 additions and 78 deletions.
diff --git a/src/SUMMARY.md b/src/SUMMARY.md
@@ -124,6 +124,7 @@
         - [Goals and clauses](./traits/goals-and-clauses.md)
         - [Canonical queries](./traits/canonical-queries.md)
     - [Next-gen trait solving](./solve/trait-solving.md)
+        - [Invariants of the type system](./solve/invariants.md)
         - [The solver](./solve/the-solver.md)
         - [Canonicalization](./solve/canonicalization.md)
         - [Coinduction](./solve/coinduction.md)

diff --git a/src/solve/invariants.md b/src/solve/invariants.md
@@ -0,0 +1,145 @@
+# Invariants of the type system
+
+FIXME: This file talks about invariants of the type system as a whole, not only the solver
+
+There are a lot of invariants - things the type system guarantees to be true at all times -
+which are desirable or expected from other languages and type systems. Unfortunately, quite
+a few of them do not hold in Rust right now. This is either a fundamental to its design or
+caused by bugs and something that may change in the future.
+
+It is important to know about the things you can assume while working on - and with - the
+type system, so here's an incomplete and inofficial list of invariants of
+the core type system:
+
+- ✅: this invariant mostly holds, with some weird exceptions, you can rely on it outside
+of these cases
+- ❌: this invariant does not hold, either due to bugs or by design, you must not rely on
+it for soundness or have to be incredibly careful when doing so
+
+### `wf(X)` implies `wf(normalize(X))` ✅
+
+If a type containing aliases is well-formed, it should also be
+well-formed after normalizing said aliases. We rely on this as
+otherwise we would have to re-check for well-formedness for these
+types.
+
+This is unfortunately broken for `<fndef as FnOnce<..>>::Output` due to implied bounds,
+resulting in [#114936].
+
+### applying inference results from a goal does not change its result ❌
+
+TODO: this invariant is formulated in a weird way and needs to be elaborated.
+Pretty much: I would like this check to only fail if there's a solver bug:
+https://github.com/rust-lang/rust/blob/2ffeb4636b4ae376f716dc4378a7efb37632dc2d/compiler/rustc_trait_selection/src/solve/eval_ctxt.rs#L391-L407
+
+If we prove some goal/equate types/whatever, apply the resulting inference constraints,
+and then redo the original action, the result should be the same.
+
+This unfortunately does not hold - at least in the new solver - due to a few annoying reasons.
+
+### The trait solver has to be *locally sound* ✅
+
+This means that we must never return *success* for goals for which no `impl` exists. That would
+mean we assume a trait is implemented even though it is not, which is very likely to result in
+actual unsoundness. When using `where`-bounds to prove a goal, the `impl` will be provided by the
+user of the item.
+
+This invariant only holds if we check region constraints. As we do not check region constraints
+during implicit negative overlap check in coherence, this invariant is broken there. As this check
+relies on *completeness* of the trait solver, it is not able to use the current region constraints
+check - `InferCtxt::resolve_regions` - as its handling of type outlives goals is incomplete.
+
+### Normalization of semantically equal aliases in empty environments results in a unique type ✅
+
+Normalization for alias types/consts has to have a unique result. Otherwise we can easily 
+implement transmute in safe code. Given the following function, we have to make sure that
+the input and output types always get normalized to the same concrete type.
+
+```rust
+fn foo<T: Trait>(
+    x: <T as Trait>::Assoc
+) -> <T as Trait>::Assoc {
+    x
+}
+```
+
+Many of the currently known unsound issues end up relying on this invariant being broken.
+It is however very difficult to imagine a sound type system without this invariant, so
+the issue is that the invariant is broken, not that we incorrectly rely on it.
+
+### Generic goals and their instantiations have the same result ✅
+
+Pretty much: If we successfully typecheck a generic function concrete instantiations
+of that function should also typeck. We should not get errors post-monomorphization.
+We can however get overflow errors at that point.
+
+TODO: example for overflow error post-monomorphization
+
+This invariant is relied on to allow the normalization of generic aliases. Breaking
+it can easily result in unsoundness, e.g. [#57893](https://github.com/rust-lang/rust/issues/57893)
+
+### Trait goals in empty environments are proven by a unique impl ✅
+
+If a trait goal holds with an empty environment, there should be a unique `impl`,
+either user-defined or builtin, which is used to prove that goal. This is
+necessary to select a unique method. It 
+
+We do however break this invariant in few cases, some of which are due to bugs,
+some by design:
+- *marker traits* are allowed to overlap as they do not have associated items
+- *specialization* allows specializing impls to overlap with their parent
+- the builtin trait object trait implementation can overlap with a user-defined impl:
+[#57893]
+
+### The type system is complete ❌
+
+The type system is not complete, it often adds unnecessary inference constraints, and errors
+even though the goal could hold.
+
+- method selection
+- opaque type inference
+- handling type outlives constraints
+- preferring `ParamEnv` candidates over `Impl` candidates during candidate selection
+in the trait solver
+
+#### The type system is complete during the implicit negative overlap check in coherence ✅
+
+During the implicit negative overlap check in coherence we must never return *error* for
+goals which can be proven. This would allow for overlapping impls with potentially different
+associated items, breaking a bunch of other invariants.
+
+This invariant is currently broken in many different ways while actually something we rely on.
+We have to be careful as it is quite easy to break:
+- generalization of aliases
+- generalization during subtyping binders (luckily not exploitable in coherence)
+
+### Trait solving must be (free) lifetime agnostic ✅
+
+Trait solving during codegen should have the same result as during typeck. As we erase
+all free regions during codegen we must not rely on them during typeck. A noteworthy example
+is special behavior for `'static`.
+
+We also have to be careful with relying on equality of regions in the trait solver.
+This is fine for codegen, as we treat all erased regions as equal. We can however
+lose equality information from HIR to MIR typeck.
+
+The new solver "uniquifies regions" during canonicalization, canonicalizing `u32: Trait<'x, 'x>`
+as `exists<'0, '1> u32: Trait<'0, '1>`, to make it harder to rely on this property.
+
+### Removing ambiguity makes strictly more things compile ❌
+
+Ideally we *should* not rely on ambiguity for things to compile.
+Not doing that will cause future improvements to be breaking changes.
+
+Due to *incompleteness* this is not the case and improving inference can result in inference
+changes, breaking existing projects.
+
+### Semantic equality implies structural equality ✅
+
+Two types being equal in the type system must mean that they have the
+same `TypeId` after instantiating their generic parameters with concrete
+arguments. This currently does not hold: [#97156].
+
+[#57893]: https://github.com/rust-lang/rust/issues/57893
+[#97156]: https://github.com/rust-lang/rust/issues/97156
+[#114936]: https://github.com/rust-lang/rust/issues/114936
diff --git a/src/solve/the-solver.md b/src/solve/the-solver.md
@@ -6,12 +6,71 @@ approach.
 
 [chalk]: https://rust-lang.github.io/chalk/book/recursive.html
 
-The basic structure of the solver is a pure function
-`fn evaluate_goal(goal: Goal<'tcx>) -> Response`.
-While the actual solver is not fully pure to deal with overflow and cycles, we are
-going to defer that for now.
+## A rough walkthrough
 
-To deal with inference variables and to improve caching, we use
-[canonicalization](./canonicalization.md).
+The entry-point of the solver is `InferCtxtEvalExt::evaluate_root_goal`. This
+function sets up the root `EvalCtxt` and then calls `EvalCtxt::evaluate_goal`,
+to actually enter the trait solver.
 
-TODO: write the remaining code for this as well.
+`EvalCtxt::evaluate_goal` handles [canonicalization](./canonicalization.md), caching,
+overflow, and solver cycles. Once that is done, it creates a nested `EvalCtxt` with a
+separate local `InferCtxt` and calls `EvalCtxt::compute_goal`, which is responsible for the
+'actual solver behavior'. We match on the `PredicateKind`, delegating to a separate function
+for each one.
+
+For trait goals, such a `Vec<T>: Clone`, `EvalCtxt::compute_trait_goal` has
+to collect all the possible ways this goal can be proven via
+`EvalCtxt::assemble_and_evaluate_candidates`. Each candidate is handled in
+a separate "probe", to not leak inference constraints to the other candidates.
+We then try to merge the assembled candidates via `EvalCtxt::merge_candidates`.
+
+
+## Important concepts and design pattern
+
+### `EvalCtxt::add_goal`
+
+To prove nested goals, we don't directly call `EvalCtxt::compute_goal`, but instead
+add the goal to the `EvalCtxt` with `EvalCtxt::all_goal`. We then prove all nested
+goals together in either `EvalCtxt::try_evaluate_added_goals` or
+`EvalCtxt::evaluate_added_goals_and_make_canonical_response`. This allows us to handle
+inference constraints from later goals.
+
+E.g. if we have both `?x: Debug` and `(): ConstrainToU8<?x>` as nested goals,
+then proving `?x: Debug` is initially ambiguous, but after proving `(): ConstrainToU8<?x>`
+we constrain `?x` to `u8` and proving `u8: Debug` succeeds.
+
+### Matching on `TyKind`
+
+We lazily normalize types in the solver, so we always have to assume that any types
+and constants are potentially unnormalized. This means that matching on `TyKind` can easily
+be incorrect.
+
+We handle normalization in two different ways. When proving `Trait` goals when normalizing
+associated types, we separately assemble candidates depending on whether they structurally
+match the self type. Candidates which match on the self type are handled in
+`EvalCtxt::assemble_candidates_via_self_ty` which recurses via
+`EvalCtxt::assemble_candidates_after_normalizing_self_ty`, which normalizes the self type
+by one level. In all other cases we have to match on a `TyKind` we first use
+`EvalCtxt::try_normalize_ty` to normalize the type as much as possible.
+
+### Higher ranked goals
+
+In case the goal is higher-ranked, e.g. `for<'a> F: FnOnce(&'a ())`, `EvalCtxt::compute_goal`
+eagerly instantiates `'a` with a placeholder and then recursively proves
+`F: FnOnce(&'!a ())` as a nested goal.
+
+### Dealing with choice
+
+Some goals can be proven in multiple ways. In these cases we try each option in
+a separate "probe" and then attempt to merge the resulting responses by using
+`EvalCtxt::try_merge_responses`. If merging the responses fails, we use
+`EvalCtxt::flounder` instead, returning ambiguity. For some goals, we try
+incompletely prefer some choices over others in case `EvalCtxt::try_merge_responses`
+fails.
+
+## Learning more
+
+The solver should be fairly self-contained. I hope that the above information provides a
+good foundation when looking at the code itself. Please reach out on zulip if you get stuck
+while doing so or there are some quirks and design decisions which were unclear and deserve
+better comments or should be mentioned here.
diff --git a/src/solve/trait-solving.md b/src/solve/trait-solving.md
@@ -39,77 +39,6 @@ which does not have any nested goals. Therefore `Vec<T>: Clone` holds.
 The trait solver can either return success, ambiguity or an error as a [`CanonicalResponse`].
 For success and ambiguity it also returns constraints inference and region constraints.
 
-## Requirements
-
-Before we dive into the new solver lets first take the time to go through all of our requirements
-on the trait system. We can then use these to guide our design later on.
-
-TODO: elaborate on these rules and get more precise about their meaning.
-Also add issues where each of these rules have been broken in the past
-(or still are).
-
-### 1. The trait solver has to be *sound*
-
-This means that we must never return *success* for goals for which no `impl` exists. That would
-simply be unsound by assuming a trait is implemented even though it is not. When using predicates
-from the `where`-bounds, the `impl` will be proved by the user of the item.
-
-### 2. If type checker solves generic goal concrete instantiations of that goal have the same result
-
-Pretty much: If we successfully typecheck a generic function concrete instantiations
-of that function should also typeck. We should not get errors post-monomorphization.
-We can however get overflow as in the following snippet:
-
-```rust
-fn foo<T: Trait>(x: )
-```
-
-### 3. Trait goals in empty environments are proven by a unique impl
-
-If a trait goal holds with an empty environment, there is a unique `impl`,
-either user-defined or builtin, which is used to prove that goal.
-
-This is necessary for codegen to select a unique method.
-An exception here are *marker traits* which are allowed to overlap.
-
-### 4. Normalization in empty environments results in a unique type
-
-Normalization for alias types/consts has a unique result. Otherwise we can easily implement
-transmute in safe code. Given the following function, we have to make sure that the input and
-output types always get normalized to the same concrete type.
-```rust
-fn foo<T: Trait>(
-    x: <T as Trait>::Assoc
-) -> <T as Trait>::Assoc {
-    x
-}
-```
-
-### 5. During coherence trait solving has to be complete
-
-During coherence we never return *error* for goals which can be proven. This allows overlapping
-impls which would break rule 3.
-
-### 6. Trait solving must be (free) lifetime agnostic
-
-Trait solving during codegen should have the same result as during typeck. As we erase
-all free regions during codegen we must not rely on them during typeck. A noteworthy example
-is special behavior for `'static`.
-
-We also have to be careful with relying on equality of regions in the trait solver.
-This is fine for codegen, as we treat all erased regions as equal. We can however
-lose equality information from HIR to MIR typeck.
-
-### 7. Removing ambiguity makes strictly more things compile
-
-We *should* not rely on ambiguity for things to compile.
-Not doing that will cause future improvements to be breaking changes.
-
-### 8. semantic equality implies structural equality
-
-Two types being equal in the type system must mean that they have the same `TypeId`.
-
-
 [solve]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_trait_selection/solve/index.html
 [`Goal`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_trait_selection/traits/solve/struct.Goal.html
 [`Predicate`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.Predicate.html