Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anonymous/placeholder lifetime "'_". #1177

Closed
wants to merge 1 commit into from
Closed

Conversation

eddyb
Copy link
Member

@eddyb eddyb commented Jun 26, 2015

Initial implementation available at rust-lang/rust#26598.

@erickt
Copy link

erickt commented Jun 26, 2015

Rendered.

@arielb1
Copy link
Contributor

arielb1 commented Jun 26, 2015

I think we could make '_ as a lifetime-name a lint for 1.3 (+ maybe also 1.2) - you can always α-rename it away.


# Detailed design

In `resolve_lifetime`: if the lifetime to be resolved matches "'_", store
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"'_" should be in backticks so GitHub doesn’t render the text after it italicised.

@bluss
Copy link
Member

bluss commented Jun 27, 2015

Removing existing '_ sounds like an OK bugfix. It could be an error to declare a lifetime named _.

I've wanted this feature, tried to use it only to find it didn't exist.

@nikomatsakis nikomatsakis added the T-lang Relevant to the language team, which will review and decide on the RFC. label Jun 29, 2015
@nikomatsakis
Copy link
Contributor

I'm in favor of the general idea of this RFC, but I think I'd prefer a slightly more expanded version to make things more uniform. Basically _ should be the way to signal "same behavior as elision", whether for lifetimes or types. (Also, I DO consider the fact that '_ is legal right now to be a bug -- I wasn't aware of that, but _ is not generally an identifier and I think that named lifetimes should be "' IDENTIFIER", more or less.)

So what this means is that:

In a fn signature:

  • _ type position means "fresh type parameter"
  • '_ in a lifetime position means "same as eliding it"
    • this could be a fresh lifetime parameter, if in argument position
    • but in return position, it obeys the lifetime elision rules

In fn body:

  • _ means fresh variable (as today)
  • '_ also means fresh variable

In a type definition or other context:

  • _ and '_ are both errors (as today)

Besides being more uniform, this also offers a nice compromise with respect to lifetime elision. I sometimes find if I have a signature like this:

struct Iter<'collection> { ... }
fn iter(&self) -> Iter { ... }

it is very pretty but not as explicit as I might like, as there is no signal (outside of the type definition) that the borrow will be extended when the fn returns to cover the return value. Still, naming the lifetime seems like overkill:

fn iter<'a>(&'a self) -> Iter<'a> { ... }

I'd be happy with a version that uses '_:

fn iter(&self) -> Iter<'_> { ... }

This tells me that the borrow will be extended, but avoids the need to give a name.

@nikomatsakis
Copy link
Contributor

Independent from this RFC, we should fix the bug about '_ ASAP, but we should also evaluate the impact (just in case).

@nikomatsakis
Copy link
Contributor

Marking this as T-lang: cc @rust-lang/lang

@petrochenkov
Copy link
Contributor

I think I'd prefer a slightly more expanded version to make things more uniform. Basically _ should be the way to signal "same behavior as elision", whether for lifetimes or types.

Speaking of extensions, _ as a placeholder for array sizes would be nice (although not as nice if impossible to use in consts/statics).

let a: &[u8; _] = b"abcdefghijklmn";

@nrc
Copy link
Member

nrc commented Jun 30, 2015

@nikomatsakis the problem with making '_ correspond with elision is that it means it has odd semantics where there are multiple uses. E.g.,

fn foo(&'_ Foo, & '_ Foo);
fn bar(&'_ Foo) -> &'_ Foo;

expand to

fn foo<'a, 'b>(&'a Foo, &'bFoo);
fn bar<'a>(&'a Foo) -> &'a Foo;

I.e., it is not clear from the syntax if multiple uses represent one fresh variable or many fresh variables. I guess we have this problem with elision already, but some how representing the elided lifetimes with '_ makes it feel much worse. C.f., Java wildcards, where a similar problem exists and is widely hated.

@eddyb
Copy link
Member Author

eddyb commented Jun 30, 2015

@nrc AFAICT, elision is not necessary for any of the planned uses.
It would also be slightly easier to implement '_ with no elision capabilities, but the difference isn't really significant.

@pnkfelix
Copy link
Member

pnkfelix commented Jul 2, 2015

@eddyb well, it seems like the hypothetical confusion described by @nrc would arise under the planned uses... i'm thinking in particular of a case like:

struct Context<'a, 'left: 'a, 'right: 'a, 'd> {
    left: &'a Inner<'left>,
    right: &'a Inner<'right>,
    data: &'d str,
}

fn neither<'data>(cx: Context<'_, '_, '_, 'data>) -> &'data str {
    cx.data
}

where it sounds like @nrc is worried that people will interpret the above as injecting a single fresh lifetime and then constraining the first three lifetime parameters to Context to all be assigned that single lifetime.


(Having said that, I do not think we should worry too much about such misunderstandings.)

@nikomatsakis
Copy link
Contributor

@nrc I see your point about potential confusion, but I think that having '_ uniformly conform to "elided" feel ultimately more predictable to me than having it either be an error or always a fresh lifetime. As you say, it's already true that the interpretation of &Foo depends on where it appears.

@eddyb
Copy link
Member Author

eddyb commented Jul 6, 2015

Did a quick grep across crates.io (thanks, @brson 😻), found these:

sxd-xpath-0.1.1/src/function.rs:    fn pop_value_or_context_node<'_>(&mut self, context: &EvaluationContext<'_, 'd>) -> Value<'d> {
sxd-xpath-0.1.1/src/function.rs:    fn pop_nodeset_or_context_node<'_>(&mut self, context: &EvaluationContext<'_, 'd>)
term_grid-0.1.1/src/lib.rs:impl<'_> convert::From<&'_ str> for Cell {
term_grid-0.1.1/src/lib.rs:    fn from(string: &'_ str) -> Self {

I honestly didn't expect anyone to use '_ as a named lifetime, and I was almost right, but it did happen in two crates.

Now, the way this RFC is phrased right now (and the initial implementation) would let those cases compile, with the same semantics as before, regardless of the feature gate.
I'm guessing that means we probably shouldn't don't drop the existing behaviour, when '_ is in scope - but we definitely can lint against '_ lifetime definitions.

@petrochenkov
Copy link
Contributor

I honestly didn't expect anyone to use '_ as a named lifetime, and I was almost right, but it did happen in two crates.

The reason why at least some people use it is actually quite interesting - lifetimes '_ appear in compiler's error messages when explicit lifetimes are required, but not provided and people fix the errors by copying them into their source code verbatim. Here's an example of this mistake - Simple OpenGL in Rust, Part 1 [in Russian]

@nikomatsakis
Copy link
Contributor

Seems like conversation has stalled here. My current feeling is unchanged: I am in my favor if we make '_ mean basically the "same thing as leaving out the lifetime". This means in fn arguments, it would be a fresh lifetime; in fn return type, it would use the elision rules; in a fn body, it would be a region variable; in a struct or type definition, it would be illegal.

There is one interesting corner case: I think that Trait+'_ should NOT be equivalent to Trait, but rather use the rules above. This is because the default for Trait types is to derive the region bound from the context, and it seems useful to be able to write Box<Trait+'_> as a shorthand for fn foo<'a>(x: Box<Trait+'a>) (since Box<Trait> alone would get you Box<Trait+'static>).

That last part would mean that '_ is not PRECISELY the same as leaving out the lifetime in all cases: it's the same as leaving out a lifetime from a &T type. My intuition here is that '_ is helpful as a way of making the lifetime explicit, so Foo<'_> and Trait+'_ let us acknowledge that region data may exist, without having to give unneeded names. In this same way, &T makes it obvious that region data exists without forcing us to write names.

@nikomatsakis nikomatsakis self-assigned this Jul 16, 2015
@nikomatsakis
Copy link
Contributor

So we just discussed this RFC in a recent lang team meeting. The consensus was that we are not ready to move it to FCP. The current text (which assigns fresh lifetimes everywhere) isn't optimal. There was some discussion about what semantics to use. The primary contenders are:

  1. The same as not writing anything, including in a trait object lifetime bound.
  2. The same as not writing anything, but error in a trait object lifetime bound.
  3. The same as leaving out a region from a &, which means:
    • in fn arguments, fresh lifetime
    • in fn return, elision semantics
    • in fn body, fresh variable.

The only real point where these proposals differ is on the semantics of Box<Trait+'_>. When found in argument position, the respective proposals yield:

  1. Box<Trait+'static>
  2. Error.
  3. Box<Trait+'x> where 'x is fresh

I still favor the third proposal, because, well, because it does what I expect in all cases. It means that '_ is always used to signal the presence of a lifetime for cases where giving an explicit name feels superfluous. In that sense, it feels simpler to me than option 1 -- the idea that Box<Trait+'_> would expand to Box<Trait+'static> is very counterintuitive to me. As @nrc put it, the idea that '_ would expand to 'static feels confusing.

There was agreement that using '_ in a return type specification as a way to signal that a borrow is there (e.g., fn iter(&self) -> Iterator<'_>) is potentially useful, but concern that the convention would not be adopted without a lint. Personally, I would favor a lint. For that matter, I might favor a lint that just favors using '_ in nominal types, whether in argument or return position.

@nikomatsakis
Copy link
Contributor

Note: it's worth pointing out that if we did adopt this RFC --- particularly with a lint --- we would probably want to wait until support for '_ has landed in a stable release before switching on the lint, so that projects which wish to span Nightly and stable are only encouraged to migrate once the feature is universally available.

@nikomatsakis
Copy link
Contributor

Further thoughts: I think we should move the question of a lint to a separate RFC, since it's a major stylistic adjustment, particularly my (now) preferred version, which only allows eliding lifetimes from &, and otherwise requires the lifetime to be at least "acknowledged" with '_. (Except for in value references, like fn calls.)

@nikomatsakis
Copy link
Contributor

(Note though that if we adopted this version of the lint, then the third alternative ('_ equivalent to what you would get if you elided the lifetime in a &) becomes very similar to the first, presuming you follow the lint strictly.)

@nikomatsakis
Copy link
Contributor

Talking with @wycats we were thinking that '_ is kind of hard to type and google. I'm not sure what would be better. We could use &, particularly if we adopted the "same as eliding a lifetime from a reference" semantics I favor, but it's sort of discontinuous with the explicit name model:

  • &Foo<'_, 'a> vs
  • &Foo<&, 'a> vs
  • &Foo<ref, 'a> (another suggestion)

It does seem like finding something less RSI-inducing than '_ could be important to advocating that people use this more frequently. (Of course, we can also tweak this before stabilizing.) OTOH, consistency with _ is a plus.

@wycats
Copy link
Contributor

wycats commented Jul 17, 2015

I really like the idea of a mechanism for expressing "there is a ref here" without the need to juggle the exact lifetimes. The lack of such a mechanism was part of the reason that we couldn't "go all the way" with lifetime elision (on structs), which was unfortunate in my opinion.

I worry that '_ is too much like noise for a person encountering it for the first time to even tokenize it in their brain, and then there's the googling problem. The other options have their problems too though :/

@aidancully
Copy link

This probably requires more thought, but... any reason not to just use _? It doesn't really look like a lifetime, but it seems that general type elision could potentially use the same syntax.

@eddyb
Copy link
Member Author

eddyb commented Jul 18, 2015

@aidancully Wouldn't that conflict with the same syntax used for types?

@nikomatsakis
Copy link
Contributor

On Sat, Jul 18, 2015 at 09:22:26AM -0700, Felix S Klock II wrote:

wait, so what does this version imply regarding Box<Trait> ? Is that then illegal? Or do you not regard that as the same as an "elided" lifetime?

I do not regard that as the same as an elided lifetime, no, and I would want Box<Trait> to continue meaning Box<Trait+'static>

@pnkfelix
Copy link
Member

@nikomatsakis maybe this gets at the heart of my problem; if the absent lifetime in Box<Trait> is not elided, what is it? I guess you call it a default bound, and (lifetime) bounds are fundamentally different in some way than other lifetime occurrences?

@nikomatsakis
Copy link
Contributor

@pnkfelix

if the absent lifetime in Box is not elided, what is it? I guess you call it a default bound, and (lifetime) bounds are fundamentally different in some way than other lifetime occurrences?

I agree this is a key question. The way I see it, in the language today, you can break down lifetimes into two categories: "explicit" lifetimes and "defaulted" lifetimes (perhaps we need better names):

  • Explicit lifetimes are like 'a in the following types: &'a Foo, Foo<'a>.
  • Defaulted lifetimes are like 'a in Foo+'a.

The behavior when these lifetimes are elided is already different, and I think the intuition for the difference is that, with explicit lifetimes, the normal thing is for there to be borrowed data present, whereas for defaulted lifetimes, we want to generally say that there is no borrowed data (i.e., a 'static bound) unless it is connected to an explicit lifetime.

So, for example, in type signatures:

  • If you omit an explicit lifetime, you get an error: we expect you to propagate the region to the struct header. Therefore, struct Foo { x: &u32 } is illegal, but either struct Foo { x: &'static u32 } or struct Foo<'a> { x: &'a u32 } is ok.
  • If you omit a defaulted lifetime, we typically default to 'static:
    • struct Foo { x: Box<Trait> } => struct Foo { x: Box<Trait+'static> }
  • Unless it's directly within an explicit lifetime:
    • struct Foo<'a> { x: &'a Trait } => struct Foo { x: &'a (Trait+'a) }
    • struct Foo<'a> { x: Ref<'a, Trait> } => struct Foo<'a> { x: Ref<'a, Trait+'a> }
      • where struct Ref<'a, T:'a> { ... }

In fn arguments, the behavior is similar. For explicit lifetimes, we default to a fresh lifetime, but for defaulted lifetimes, we prefer 'static (unless bounded by an explicit lifetime). Similarly in fn return types, explicit lifetimes use the elision rules, which means they try to find a region from the input, but defaulted lifetimes prefer 'static.

All of this seems consistent to me with us saying that defaulted lifetimes are not expected to have region data -- we expect objects not to close over region data. If you want otherwise, you have to ask for it, either by having a &Object or by saying Box<Object+'a> explicitly.

Now this is where '_ comes in. '_ seems like a lightweight way to acknowledge the existence of region data. In the case of explicit lifetimes, we already default towards region data being present, so it doesn't really affect the behavior there, but for defaulted lifetimes, we expect no region data, and supplying an explicit '_ switches the bias.

Now, as for the lint I've been tossing about: my concern is that, in practice, people may forget (or never have known) that structs were defined with lifetime parameters, and so they may not realize where region data is hiding. For example, I think that when you read a type like Foo, you do not necessarily have the definition in your head, and therefore you do not necessarily know that it was defined struct Foo<'a> and thus carries region data. Therefore, the errors that result can be somewhat surprising. I've found that the code which suggests adding explicit lifetimes (which is in dire of need of updating, but that's a separate story) is often most helpful simply because it tells me which arguments actually have an explicit lifetime parameter that I've forgotten about. This is most important in return types, since the fact that a return type carries a region means that the loan gets extended to cover the scope of the return value.

As a concrete example, I sometimes write functions in the compiler like:

fn foo<'tcx>(tcx: &ty::ctxt<'tcx>, t: Ty) { ... }

which looks fine, unless you remember that Ty really wants to be Ty<'tcx>. OTOH, the reason we chose to allow lifetime arguments to be elided in the first place is that requiring an explicit lifetime name is often pretty tedious -- as evidenced by the desire for this RFC in the first place! '_ could be a nice compromise, I don't know. (But I'll say again I think we should separate out thoughts of lints from the core feature, which I think is useful enough on its own.)

UPDATE: Lightly edited for clarity.

@pnkfelix
Copy link
Member

the intuition for the difference is that, with explicit lifetimes, the normal thing is for there to be borrowed data present, whereas for defaulted lifetimes, we want to generally say that there is no borrowed data (i.e., a 'static bound) unless it is connected to an explicit lifetime.

This was important for me to understand your POV.

I think making this intuition part of a user's mental model is important, so anything we can do to pick names that encourage such an intuition will make things better.

So, yes, we definitely need better names than "elided" and "defaulted". :)

@pnkfelix
Copy link
Member

having said that, now I that I better understand the reasoning behind your POV, I retract my (here unstated, I think) objection to the suggested semantics for '_

@aturon
Copy link
Member

aturon commented Aug 13, 2015

(Note, current status is: the lang team has basic consensus around this RFC, but is waiting for the RFC itself to be updated accordingly.)

@nikomatsakis
Copy link
Contributor

So @eddyb and I touched base on this, and we also discussed in @rust-lang/lang meeting. I am thinking of closing this RFC for the time being. For me, the TL;DR is that this RFC doesn't go far enough -- it can get us a small improvement over the status quo for complex lifetime scenarios (such as the compiler), but I want an "order of magnitude" improvement. And I think we can get one.

Moreover, the secondary goal of the RFC -- or at least an effect of the RFC, if not an original goal -- was to help make elision more explicit, and in particular to help make the case where a lifetime is "hidden" in the return type more explicit, without requiring an explicit name. The RFC achieves that too, but it does it with a syntax that nobody is thrilled about, and may not go far enough in this direction. (e.g., @wycats has had some thoughts about having some notation that indicates "elided lifetimes here", but doesn't require you to indicate how many)

For the time being, the hope is that we can keep progress in the compiler by moving away from free functions and towards methods, which tend to reduce "lifetime clutter" by moving the declarations up to the impl. @eddyb has been experimenting here.

Now, in terms of getting a 10x win, I don't want to claim a complete plan. But I think we should explore some more advanced ideas. One thing that I've been kicking around -- and hope to produce a blog post or something about -- is adding lifetime-parameterized modules (essentially, a limited and specialized form of ML functors).

The rough idea is that one would be able to add lifetime parameters to modules:

mod with_ctxt<'tcx> { ... /* most of the compiler goes here */ ... }

The intution for this is "all code in this module runs in the context of some lifetime 'tcx". As a side effect, all code within the module can freely refer to 'tcx without needing to declare it. Moreover, types within the module can reference one another without needing to refer to 'tcx. In effect, so long as you are within the module, instead of writing Ty<'tcx>, one would just write Ty, and the 'tcx is assumed to be the same one that you yourself have in scope.

When you refer to types in the module with_tcx from the outside, you would specify the lifetime by writing with_tcx<'some_lifetime>::Foo (of course most of the time you wouldn't have to write anything, as the compiler would infer an appropriate value, of course). (Similarly, if some type within with_tcx wanted to refer to a type that had some lifetime other than the 'tcx that is in scope, they too would have to use a more explicit path to make that clear.)

Obviously this is not even a pre-RFC. It's just a sketch of an idea that I think is worth pursuing. I feel though that there is a "min-max" proposal waiting to emerge here that might not be that hard to implement and could mean a massive ergonomic improvement. (Of course, any such efforts this would be very much an active experiment, with the compiler as the laboratory. It's clear though that there are "scalability" issues that arise with trying to use arenas at scale, and the compiler is a big consumer, so it's a good place to do such experiments.)

Another possibility for improvements is leveraging associated lifetimes, which have never been implemented (and indeed, probably a new RFC would be needed for some parts of them, since I think the original associated items RFC had some gaps. For example, it did not discuss projection syntax like &L::'tcx Foo, which would presumably be necessary.

The rough idea here would be that one could define a trait that "bundles up" a number of lifetimes and their relationships:

trait Tcx {
    lifetime 'tcx;
}

struct TcxArenas<'tcx> {
    ty_arena: &'tcx TypedArena<Ty<'tcx>>
    ...
}

impl<'tcx> Tcx for TcxArenas<'tcx> {
    lifetime 'tcx = 'tcx;
}

struct Ty<T: Tcx> { ... }
struct Substs<T: Tcx> { ... }

thus one would write Ty<T> instead of Ty<'tcx>. This becomes a win when you want to add new lifetimes and do other refactorings. However, it has scalability issues and doesn't (I think) present the transformational effect that module parameterization could offer.

@nikomatsakis
Copy link
Contributor

That said, while writing that, I started to wonder if it wouldn't be worth adding '_ as a kind of stop-gap, to enable us to make progress with some of our compiler goals, with the explicit and declared intention that this syntax will not be stabilized.

@eddyb
Copy link
Member Author

eddyb commented Mar 25, 2016

@nikomatsakis Using methods instead of free functions is not a panacea, but it goes a long way, enough to make this RFC quite underwhelming.

@eddyb eddyb closed this Mar 25, 2016
@eddyb eddyb deleted the lt-anon branch March 25, 2016 20:25
@petrochenkov
Copy link
Contributor

Still makes sense to issue a future compatibility warning for lifetimes named '_.

@eddyb
Copy link
Member Author

eddyb commented Mar 25, 2016

As an example of what can be done right now with impls:

struct TyCtxt<'a, 'gcx: 'a+'tcx, 'tcx: 'a>(&'a &'tcx &'gcx ());

#[allow(bad_style)]
struct module;
impl<'a, 'gcx, 'tcx> module {
    fn foo(_: TyCtxt<'a, 'gcx, 'tcx>) {}
}

module::foo(TyCtxt(&&&()))

Someone could turn this into a macro if they wanted to, and I haven't really abused it yet, but this flexibility of impls will come in handy when dealing with existing methods where a new lifetime needs to be added to one of the argument types.

@nrc
Copy link
Member

nrc commented May 2, 2016

@nikomatsakis re lifetime-parametric modules - I would love to have this, I've wanted it for ages. See, although there is not much there in the way of details #424. We should discuss...

@nrc
Copy link
Member

nrc commented May 2, 2016

@petrochenkov we really should deprecate '_ as a lifetime name.

@nrc nrc added the I-nominated label May 2, 2016
@nrc
Copy link
Member

nrc commented May 2, 2016

Nominated for discussion at the lang meeting - I want to talk about salvaging some bits of this RFC

bors added a commit to rust-lang/rust that referenced this pull request May 26, 2016
Add AST validation pass and move some checks to it

The purpose of this pass is to catch constructions that fit into AST data structures, but not permitted by the language. As an example, `impl`s don't have visibilities, but for convenience and uniformity with other items they are represented with a structure `Item` which has `Visibility` field.

This pass is intended to run after expansion of macros and syntax extensions (and before lowering to HIR), so it can catch erroneous constructions that were generated by them. This pass allows to remove ad hoc semantic checks from the parser, which can be overruled by syntax extensions and occasionally macros.

The checks can be put here if they are simple, local, don't require results of any complex analysis like name resolution or type checking and maybe don't logically fall into other passes. I expect most of errors generated by this pass to be non-fatal and allowing the compilation to proceed.

I intend to move some more checks to this pass later and maybe extend it with new checks, like, for example, identifier validity. Given that syntax extensions are going to be stabilized in the measurable future, it's important that they would not be able to subvert usual language rules.

In this patch I've added two new checks - a check for labels named `'static` and a check for lifetimes and labels named `'_`. The first one gives a hard error, the second one - a future compatibility warning.
Fixes #33059 ([breaking-change])
cc rust-lang/rfcs#1177

r? @nrc
bors added a commit to rust-lang/rust that referenced this pull request Jun 1, 2016
Add AST validation pass and move some checks to it

The purpose of this pass is to catch constructions that fit into AST data structures, but not permitted by the language. As an example, `impl`s don't have visibilities, but for convenience and uniformity with other items they are represented with a structure `Item` which has `Visibility` field.

This pass is intended to run after expansion of macros and syntax extensions (and before lowering to HIR), so it can catch erroneous constructions that were generated by them. This pass allows to remove ad hoc semantic checks from the parser, which can be overruled by syntax extensions and occasionally macros.

The checks can be put here if they are simple, local, don't require results of any complex analysis like name resolution or type checking and maybe don't logically fall into other passes. I expect most of errors generated by this pass to be non-fatal and allowing the compilation to proceed.

I intend to move some more checks to this pass later and maybe extend it with new checks, like, for example, identifier validity. Given that syntax extensions are going to be stabilized in the measurable future, it's important that they would not be able to subvert usual language rules.

In this patch I've added two new checks - a check for labels named `'static` and a check for lifetimes and labels named `'_`. The first one gives a hard error, the second one - a future compatibility warning.
Fixes #33059 ([breaking-change])
cc rust-lang/rfcs#1177

r? @nrc
ogham added a commit to ogham/exa that referenced this pull request Jun 11, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.