From 29f412a6abfb4c19116cb836cd135240bb602fc1 Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Wed, 6 Aug 2014 17:01:19 -0400 Subject: [PATCH 1/2] Initial commit --- ...0000-bounds-on-object-and-generic-types.md | 424 ++++++++++++++++++ 1 file changed, 424 insertions(+) create mode 100644 active/0000-bounds-on-object-and-generic-types.md diff --git a/active/0000-bounds-on-object-and-generic-types.md b/active/0000-bounds-on-object-and-generic-types.md new file mode 100644 index 00000000000..c10bc6dab13 --- /dev/null +++ b/active/0000-bounds-on-object-and-generic-types.md @@ -0,0 +1,424 @@ +- Start Date: 2014-08-06 +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) + +# Summary + +- Remove the special-case bound `'static` and replace with a generalized + *lifetime bound* that can be used on objects and type parameters. +- Remove the rules that aim to prevent references from being stored + into objects and replace with a simple lifetime check. +- Tighten up type rules pertaining to reference lifetimes and + well-formed types containing references. +- Introduce explicit lifetime bounds (`'a:'b`), with the meaning that + the lifetime `'a` outlives the lifetime `'b`. These exist today but + are always inferred; this RFC adds the ability to specify them + explicitly, which is sometimes needed in more complex cases. + +# Motivation + +Currently, the type system is not supposed to allow references to +escape into object types. However, there are various bugs where it +fails to prevent this from hapenning. Moreover, it is very useful (and +frequently necessary) to store a reference into an object. Moreover, +the current treatment of generic types is in some cases naive and not +obviously sound. + +# Detailed design + +## Lifetime bounds on parameters + +The heart of the new design is the concept of a *lifetime bound*. In fact, +this (sort of) exists today in the form of the `'static` bound: + + fn foo(x: A) { ... } + +Here, the notation `'static` means "all borrowed content within `A` +outlives the lifetime `'static`". (Note that when we say that +something outlives a lifetime, we mean that it lives *at least that +long*. In other words, for any lifetime `'a`, `'a` outlives `'a`. This +is similar to how we say that every type `T` is a subtype of itself.) + +In the newer design, it is possible to use an arbitrary lifetime as a +bound, and not just `'static`: + + fn foo<'a, A:'a>(x: A) { ... } + +Explicit lifetime bounds like this are in fact only rarely necessary, +for two reasons: + +1. The compiler is often able to infer this relationship from the argument + and return types. More on this below. +2. It is only important to bound the lifetime of a generic type like + `A` when one of two things is happening (and both of these are + cases where the inference generally is sufficient): + - A borrowed pointer to an `A` instance (i.e., value of type `&A`) + is being consumed or returned. + - A value of type `A` is being closed over into an object reference + (or closure, which per the unboxed closures RFC is really the + same thing). + +Note that, per RFC 11, these lifetime bounds may appear in types as +well (this is important later on). For example, an iterator might be +declared: + + struct Items<'a, T:'a> { + v: &'a Collection + } + +Here, the constraint `T:'a` indicates that the data being iterated +over must live at least as long as the collection (logically enough). + +### At most one explicit lifetime bound is permitted + +For simplicity, we permit at most one *explicit* lifetime bound on any +given parameter type. That means that the following function is illegal: + + fn foo<'a,'b,A:'a+'b>() { ... } + +Remember that if there are multiple lifetime bounds, it implies that +all of them must hold. That means that if, in fact, `A` outlives both +`'a` and `'b` then either one of them is shorter than the other, the +two are the same, or there is a third lifetime that outlives them +both. Therefore, the function above can be rewritten as follows (using +explicit lifetime bounds, specified below): + + fn foo<'a,'b,'c:'a+'b,A:'c>() { ... } + +As far as I know, this situation has not arisen once in the codebase. + +## Lifetime bounds on object types + +Like parameters, all object types have a lifetime bound. Unlike +parameter types, however, object types are *required* to have exactly +one bound. This bound can be either specified explicitly or derived +from the traits that appear in the object type. In general, the rule is +as follows: + +- If an explicit bound is specified, use that. +- Otherwise, let S be the set of lifetime bounds we can derive. +- Otherwise, if S contains 'static, use 'static. +- Otherwise, if S is a singleton set, use that. +- Otherwise, error. + +Here are some examples: + + trait IsStatic : 'static { } + trait Is<'a> : 'a { } + trait IsNothing { } + + // Type Bounds + // IsStatic 'static + // Is<'a> 'a + // IsStatic+Is<'a> 'static+'a + // IsStatic+'a 'static+'a + // IsStatic+Is<'a>+'b 'static,'a,'b + +In general no object type is permitted to have zero bounds. Therefore, +if an object type with no derivable bounds appears, we will supply a +default lifetime using the normal rules: + + trait Writer { /* no derivable bounds */ } + struct Foo<'a> { + Box, // Error: try Box or Box + Box, // OK: Send implies 'static + &'a Writer, // Error: try &'a (Writer+'a) + } + + fn foo(a: Box, // OK: Sugar for Box where 'a fresh + b: &Writer) // OK: Sugar for &'a (Writer+'b) where 'a, 'b fresh + { ... } + +This kind of annotation can seem a bit tedious when using object types +extensively, though type aliases can help quite a bit: + + type WriterObj = Box; + type WriterRef<'a> = &'a (Writer+'a); + +The unresolved questions section discussed possibles ways to lighten +the burden. + +## Specifying relations between lifetimes + +Currently, when a type or fn has multiple lifetime parameters, there +is no facility to explicitly specify a relationship between them. For +example, in a function like this: + + fn foo<'a, 'b>(...) { ... } + +the lifetimes `'a` and `'b` are declared as independent. In some +cases, though, it can be important that there be a relation between +them. In most cases, these relationships can be inferred (and in fact +are inferred today, see below), but it is useful to be able to state +them explicitly (and necessary in some cases, see below). + +A *lifetime bound* is written `'a:'b` and it means that "`'a` outlives +`'b`". For example, if `foo` were declared like so: + + fn foo<'a, 'b:'a>(...) { ... } + +that would indicate that the lifetime '`a` was shorter than (or equal +to) `'b`. + +## The "type must outlive" and well-formedness relation + +Many of the rules to come make use of a "type must outlive" relation, +written `T outlives 'a`. This relation means primarily that all +borrowed data in `T` is known to have a lifetime of at least '`a` +(hence the name). However, the relation also guarantees various basic +lifetime constraints are met. For example, for every reference type +`&'b U` that is found within `T`, it would be required that `U +outlives 'b` (and that `'b` outlives `'a`). + +In fact, `T outlives 'a` is defined on another function `WF(T:'a)`, +which yields up a list of lifetime relations that must hold for `T` to +be well-formed and to outlive `'a`. It is not necessary to understand +the details of this relation in order to follow the rest of the RFC, I +will defer its precise specification to an appendix below. + +For this section, it suffices to give some examples: + + // int always outlives any region + WF(int : 'a) = [] + + // a reference with lifetime 'a outlives 'b if 'a outlives 'b + WF(&'a int : 'b) = ['a : 'b] + + // the outer reference must outlive 'c, and the inner reference + // must outlive the outer reference + WF(&'a &'b int : 'c) = ['a : 'c, 'b : 'a] + + // Object type with bound 'static + WF(SomeTrait+'static : 'a) = ['static : 'a] + + // Object type with bound 'a + WF(SomeTrait+'a : 'b) = ['a : 'b] + +## Rules for when object closure is legal + +Whenever data of type `T` is closed over to form an object, the type +checker will require that `T outlives 'a` where `'a` is the primary +lifetime bound of the object type. + +## Rules for types to be well-formed + +Currently we do not apply any tests to the types that appear in type +declarations. Per RFC 11, however, this should change, as we intend to +enforce trait bounds on types, wherever those types appear. Similarly, +we should be requiring that types are well-formed with respect to the +`WF` function. This means that a type like the following would be +illegal without a lifetime bound on the type parameter `T`: + + struct Ref<'a, T> { c: &'a T } + +This is illegal because the field `c` has type `&'a T`, which is only +well-formed if `T:'a`. Per usual practice, this RFC does not propose +any form of inference on struct declarations and instead requires all +conditions to be spelled out (this is in contrast to fns and methods, +see below). + +## Rules for expression type validity + +We should add the condition that for every expression with lifetime +`'e` and type `T`, then `T outlives 'e`. We already enforce this in +many special cases but not uniformly. + +## Inference + +The compiler will infer lifetime bounds on both type parameters and +region parameters as follows. Within a function or method, we apply +the wellformedness function `WF` to each function or parameter type. +This yields up a set of relations that must hold. The idea here is +that the caller could have type checked unless the types of the +arguments were well-formed, so that implies that the callee can assume +that those well-formedness constraints hold. + +As an example, in the following function: + + fn foo<'a, A>(x: &'a A) { ... } + +the callee here can assume that the type parameter `A` outlives the +lifetime `'a`, even though that was not explicitly declared. + +Note that the inference also pulls in constraints that were declared +on the types of arguments. So, for example, if there is a type `Items` +declared as follows: + + struct Items<'a, T:'a> { ... } + +And a function that takes an argument of type `Items`: + + fn foo<'a, T>(x: Items<'a, T>) { ... } + +The inference rules will conclude that `T:'a` because the `Items` type +was declared with that bound. + +In practice, these inference rules largely remove the need to manually +declare lifetime relations on types. When porting the existing library +and rustc over to these rules, I had to add explicit lifetime bounds +to exactly one function (but several types, almost exclusively +iterators). + +Note that this sort of inference is already done. This RFC simply +proposes a more extensive version that also includes bounds of the +form `X:'a`, where `X` is a type parameter. + +# What does all this mean in practice? + +This RFC has a lot of details. The main implications for end users are: + +1. Object types must specify a lifetime bound when they appear in a type. + This most commonly means changing `Box` to `Box` + and `&'a Trait` to `&'a Trait+'a`. +2. For types that contain references to generic types, lifetime bounds + are needed in the type definition. This comes up most often in iterators: + + struct Items<'a, T:'a> { + x: &'a [T] + } + + Here, the presence of `&'a [T]` within the type definition requires + that the type checker can show that `T outlives 'a` which in turn + requires the bound `T:'a` on the type definition. These bounds are + rarely outside of type definitions, because they are almost always + implied by the types of the arguments. +3. It is sometimes, but rarely, necessary to use lifetime bounds, + specifically around double indirections (references to references, + often the second reference is contained within a struct). For + example: + + struct GlobalContext<'global> { + arena: &'global Arena + } + + struct LocalConenxt<'local, 'global:'local> { + x: &'local mut Context<'global> + } + + Here, we must know that the lifetime `'global` outlives `'local` in + order for this type to be well-formed. + +# Phasing + +Some parts of this RFC require new syntax and thus must be phased in. +The current plan is to divide the implementation three parts: + +1. Implement support for everything in this RFC except for region bounds + and requiring that every expression type be well-formed. Enforcing + the latter constraint leads to type errors that require lifetime + bounds to resolve. +2. Implement support for `'a:'b` notation to be parsed under a feature + gate `issue_5723_bootstrap`. +3. Implement the final bits of the RFC: + - Bounds on lifetime parameters + - Wellformedness checks on every expression + - Wellformedness checks in type definitions + +Parts 1 and 2 can be landed simultaneously, but part 3 requires a +snapshot. Parts 1 and 2 have largely been written. Depending on +precisely how the timing works out, it might make sense to just merge +parts 1 and 3. + +# Drawbacks / Alternatives + +If we do not implement some solution, we could continue with the +current approach (but patched to be sound) of banning references from +being closed over in object types. I consider this a non-starter. + +# Unresolved questions + +## Inferring wellformedness bounds + +Under this RFC, it is required to write bounds on struct types which are +in principle inferable from their contents. For example, iterators +tend to follow a pattern like: + + struct Items<'a, T:'a> { + x: &'a [T] + } + +Note that `T` is bounded by `'a`. It would be possible to infer these +bounds, but I've stuck to our current principle that type definitions +are always fully spelled out. The danger of inference is that it +becomes unclear *why* a particular constraint exists if one must +traverse the type hierarchy deeply to find its origin. This could +potentially be addressed with better error messages, though our track +record for lifetime error messages is not very good so far. + +## Default trait bounds + +When referencing a trait object, it is almost *always* the case that one follows +certain fixed patterns: + +- `Box` +- `Rc` (once DST works) +- `&'a (Trait+'a)` +- and so on. + +You might think that we should simply provide some kind of defaults +that are sensitive to where the `Trait` appears. The same is probably +true of struct type parameters (in other words, `&'a SomeStruct<'a>` +is a very comon pattern). + +However, there are complications: + +- What about a type like `struct Ref<'a, T> { x: &'a T }`? `Ref<'a, Trait>` + should really work the same way as `&'a Trait`. +- There *are* reasons to want a type like `Box`. For example, + the macro parser includes a function like: + + fn make_macro_ext<'cx>(cx: &'cx Context, ...) -> Box + + In other words, this function returns an object that closes over the + macro context. In such a case, if `Box` implies a static + bound, then taking ownership of this macro object would require a signature + like: + + fn take_macro_ext<'cx>(b: Box) { } + + Note that the `'cx` variable is only used in one place. It's purpose + is just to disable the `'static` default that would otherwise be + inserted. + +# Appendix: Definition of the outlives relation and well-formedness + +To make this more specific, we can "formally" model the Rust type +system as: + + T = scalar (int, uint, fn(...)) // Boring stuff + | *const T // Unsafe pointer + | *mut T // Unsafe pointer + | Id

// Nominal type (struct, enum) + | &'x T // Reference + | &'x mut T // Mutable reference + | {TraitReference

}+'x // Object type + | X // Type variable + P = {'x} + {T} + +We can define a function `WF(T : 'a)` which, given a type `T` and +lifetime `'a` yields a list of `'b:'c` or `X:'d` pairs. For each pair +`'b:'c`, the lifetime `'b` must outlive the lifetime `'c` for the type +`T` to be well-formed in a location with lifetime `'a`. For each pair +`X:'d`, the type parameter `X` must outlive the lifetime `'d`. + +- `WF(int : 'a)` yields an empty list +- `WF(X:'a)` where `X` is a type parameter yields `(X:'a)`. +- `WF(Foo

:'a)` where `Foo

` is an enum or struct type yields: + - For each lifetime parameter `'b` that is contravariant or invariant, + `'b : 'a`. + - For each type parameter `T` that is covariant or invariant, the + results of `WF(T : 'a)`. + - The lifetime bounds declared on `Foo`'s lifetime or type parameters. + - The reasoning here is that if we can reach borrowed data with + lifetime `'a` through `Foo<'a>`, then `'a` must be contra- or + invariant. Covariant lifetimes only occur in "setter" + situations. Analogous reasoning applies to the type case. +- `WF(T:'a)` where `T` is an object type: + - For the primary bound `'b`, `'b : 'a`. + - For each derived bound `'c` of `T`, `'b : 'c` + - Motivation: The primary bound of an object type implies that all + other bounds are met. This simplifies some of the other + formulations and does not represent a loss of expressiveness. + +We can then say that `T outlives 'a` if all lifetime relations +returned by `WF(T:'a)` hold. From 041534c2682ede69af5056f00183d88749806b53 Mon Sep 17 00:00:00 2001 From: Niko Matsakis Date: Thu, 7 Aug 2014 07:08:06 -0400 Subject: [PATCH 2/2] Fix various typos and add an appdenix motivating the restriction to exactly one lifetime. --- ...0000-bounds-on-object-and-generic-types.md | 72 +++++++++++++++---- 1 file changed, 59 insertions(+), 13 deletions(-) diff --git a/active/0000-bounds-on-object-and-generic-types.md b/active/0000-bounds-on-object-and-generic-types.md index c10bc6dab13..0fa673bf396 100644 --- a/active/0000-bounds-on-object-and-generic-types.md +++ b/active/0000-bounds-on-object-and-generic-types.md @@ -19,7 +19,7 @@ Currently, the type system is not supposed to allow references to escape into object types. However, there are various bugs where it -fails to prevent this from hapenning. Moreover, it is very useful (and +fails to prevent this from happening. Moreover, it is very useful (and frequently necessary) to store a reference into an object. Moreover, the current treatment of generic types is in some cases naive and not obviously sound. @@ -105,7 +105,6 @@ Here are some examples: trait IsStatic : 'static { } trait Is<'a> : 'a { } - trait IsNothing { } // Type Bounds // IsStatic 'static @@ -124,11 +123,11 @@ default lifetime using the normal rules: Box, // OK: Send implies 'static &'a Writer, // Error: try &'a (Writer+'a) } - + fn foo(a: Box, // OK: Sugar for Box where 'a fresh - b: &Writer) // OK: Sugar for &'a (Writer+'b) where 'a, 'b fresh + b: &Writer) // OK: Sugar for &'b (Writer+'c) where 'b, 'c fresh { ... } - + This kind of annotation can seem a bit tedious when using object types extensively, though type aliases can help quite a bit: @@ -138,6 +137,9 @@ extensively, though type aliases can help quite a bit: The unresolved questions section discussed possibles ways to lighten the burden. +See Appendix B for the motivation on why object types are permitted to +have exactly one lifetime bound. + ## Specifying relations between lifetimes Currently, when a type or fn has multiple lifetime parameters, there @@ -155,10 +157,10 @@ them explicitly (and necessary in some cases, see below). A *lifetime bound* is written `'a:'b` and it means that "`'a` outlives `'b`". For example, if `foo` were declared like so: - fn foo<'a, 'b:'a>(...) { ... } + fn foo<'x, 'y:'x>(...) { ... } -that would indicate that the lifetime '`a` was shorter than (or equal -to) `'b`. +that would indicate that the lifetime '`x` was shorter than (or equal +to) `'y`. ## The "type must outlive" and well-formedness relation @@ -229,7 +231,7 @@ The compiler will infer lifetime bounds on both type parameters and region parameters as follows. Within a function or method, we apply the wellformedness function `WF` to each function or parameter type. This yields up a set of relations that must hold. The idea here is -that the caller could have type checked unless the types of the +that the caller could not have type checked unless the types of the arguments were well-formed, so that implies that the callee can assume that those well-formedness constraints hold. @@ -291,7 +293,7 @@ This RFC has a lot of details. The main implications for end users are: arena: &'global Arena } - struct LocalConenxt<'local, 'global:'local> { + struct LocalContext<'local, 'global:'local> { x: &'local mut Context<'global> } @@ -345,6 +347,9 @@ traverse the type hierarchy deeply to find its origin. This could potentially be addressed with better error messages, though our track record for lifetime error messages is not very good so far. +Also, there is a potential interaction between this sort of inference +and the description of default trait bounds below. + ## Default trait bounds When referencing a trait object, it is almost *always* the case that one follows @@ -358,12 +363,17 @@ certain fixed patterns: You might think that we should simply provide some kind of defaults that are sensitive to where the `Trait` appears. The same is probably true of struct type parameters (in other words, `&'a SomeStruct<'a>` -is a very comon pattern). +is a very common pattern). However, there are complications: -- What about a type like `struct Ref<'a, T> { x: &'a T }`? `Ref<'a, Trait>` - should really work the same way as `&'a Trait`. +- What about a type like `struct Ref<'a, T:'a> { x: &'a T }`? `Ref<'a, + Trait>` should really work the same way as `&'a Trait`. One way that + I can see to do this is to drive the defaulting based on the default + trait bounds of the `T` type parameter -- but if we do that, it is + both a non-local default (you have to consult the definition of + `Ref`) and interacts with the potential inference described in the + previous section. - There *are* reasons to want a type like `Box`. For example, the macro parser includes a function like: @@ -422,3 +432,39 @@ lifetime `'a` yields a list of `'b:'c` or `X:'d` pairs. For each pair We can then say that `T outlives 'a` if all lifetime relations returned by `WF(T:'a)` hold. + +# Appendix B: Why object types must have exactly one bound + +The motivation is that handling multiple bounds is overwhelmingly +complicated to reason about and implement. In various places, +constraints arise of the form `all i. exists j. R[i] <= R[j]`, where +`R` is a list of lifetimes. This is challenging for lifetime +inference, since there are many options for it to choose from, and +thus inference is no longer a fixed-point iteration. Moreover, it +doesn't seem to add any particular expressiveness. + +The places where this becomes important are: + +- Checking lifetime bounds when data is closed over into an object type +- Subtyping between object types, which would most naturally be + contravariant in the lifetime bound + +Similarly, requiring that the "master" bound on object lifetimes outlives +all other bounds also aids inference. Now, given a type like the +following: + + trait Foo<'a> : 'a { } + trait Bar<'b> : 'b { } + + ... + + let x: Box+Bar<'b>> + +the inference engine can create a fresh lifetime variable `'0` for the +master bound and then say that `'0:'a` and `'0:'b`. Without the +requirement that `'0` be a master bound, it would be somewhat unclear +how `'0` relates to `'a` and `'b` (in fact, there would be no +necessary relation). But if there is no necessary relation, then when +closing over data, one would have to ensure that the closed over data +outlives *all* derivable lifetime bounds, which again creates a +constraint of the form `all i. exists j.`.