-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How do we expect required context types to interact with await
?
#3575
Comments
If we are completely consistent, then an expression inside I'm not absolutely sure we need to be consistent in this particular case. Because If we had a suffix await (#25) then (But if we get @chloestefantsova's selector-chain inference, then we'd probably end up giving the recevier By writing it as All this to say that if we want But the consistent behavior is to downcast to So that can work too. TL;DR: Either way works for me. (Short term. Long term, I'd love to do something more to I do worry that we will have some occurences of code that will break with this change, and they'll be hard to find precisely because the code is typed as |
You could say that the newly introduced breakage in void main() async {
dynamic x = Future<int?>.value(3);
int y = await x; // Would throw if it means `await (x as FutureOr<int>)`.
} is consistent with another part of the language: Future<int> f(dynamic d) async {
// Consider an invocation where `d` is a `Future<int?>`.
if (../*just an excuse for having two return statements*/..) {
return d; // Throws.
} else {
return await d; // Succeeds today;
// .. would throw if it means `return await (d as FutureOr<int>);`
}
}
void main() async {
int y = await f(Future<int?>.value(3));
} The breakage is real, but it might be "good breakage" because it makes the current (implicit) await behave more like an explicit await (which would be needed if we adopt #870). Also, we already decided (with return statements) that a In other words, "we will not wait for an object unless it is guaranteed to have the required type". The semantics for |
I wouldn't say that we decided that return should behave differently from other awaits. It's more like the person specifying the return not being the same as the person specifying the special case for nested |
@leafpetersen said:
Yes, I think so.
It's certainly not something I'd fully understood when writing up #3471, so in that sense, no, it's not something I expected. Thanks for bringing it up! In fact, thinking about it more, this problem doesn't just apply to
All of these work today because the implicit downcast happens after unwrapping: dynamic x = <int?>[3];
for (var (int y) in x) print(y); // prints `3` (the list element is downcast, not the list) dynamic x = <int?>[3];
print(<int>[...x]); // prints `[3]` (again, the list elements are downcast) And of course your example: dynamic x = Future<int?>.value(3);
int y = await x; // sets `y` to `3` (the awaited result is downcast, not the future) I believe all of these will stop working if we go ahead with #3471. In a sense, it's a consequence of that proposal pushing all coercions down, not just the coercions associated with conditionals and switch expressions. I can't think of any other constructs with the same problem, but I might be missing something. Incidentally, the user can create generic functions with the same kind of "unwrapping" behavior, but that leads to a runtime error even today. For example: T getFirstListElement<T>(List<T> list) => list.first;
dynamic x = <int?>[3];
int y = getFirstListElement(x); // Runtime error because `x` is downcast to `List<int>` So I suppose you could argue that #3471 is a good change, because by pushing the downcast down into Counterpoint: with generic functions, if you want the downcast to happen after the unwrapping, you can make that happen by introducing an explicit type argument: dynamic x = <int?>[3];
int y = getFirstListElement<dynamic>(x); // Works But there's no corresponding place to shove a dynamic argument into I certainly think that if we do wind up going ahead with #3471, these potential breakages argue for doing it as a language-versioned change, so that people have more control over when the breakage happens to them. I'll spend some time investigating these "unwrapping" constructs to see how much breakage we would be talking about here. |
One other thought: In some other issues recently I've been talking about the idea of extending type schemas with a void main() async {
dynamic x = Future<int?>.value(3);
int y = await x;
} Since int y = (await x) as int; |
I've done some additional prototyping along the lines suggested in this discussion, and I'm less certain about whether requiered context types are necessarily a good idea anymore. If they are a good idea, they're certainly not the "slam dunk" I once thought they were. The main issue I've run into with my additional prototyping is f(dynamic d) {
for (int x in d) {
print(x);
}
} Today, this is works at runtime as long as I have one other idea I would like to try before I give up, which is to see what would happen if we say that every expression is required to satisfy its context or have type f(dynamic d) {
for (var tmp in d as Iterable<Object?>) {
var x = tmp as int;
print(x);
}
} |
That would mean that the other coercions ( I'll refer to #3363 and say always downcast before any use of UP, before any join-point, because UP( |
@stereotype441 wrote:
@lrhn wrote:
I think this is very promising! To me it looks like #3471 is maturing, not falling apart. However, I think we might be able to adjust #3471 such that it behaves in a useful manner with these examples: Coercions should not be pushed down into the operand of an Similarly, we might (very well!) want to avoid pushing coercions into cascades (see 'Coercions are pushed down into cascade targets' in #3471). Finally, I suspect that we want to avoid pushing coercions into a member access if and when we introduce receiver type inference. When we have 3 kinds of expressions where coercions are applied directly (rather than being pushed into subexpressions), it's a significant part of the design, not a weird exception. Conceptually, it might work out to claim that propagation occurs into constructs involving a choice, otherwise there's no propagation (just like today). Also, I'd expect the behavior where coercions aren't pushed into these specific kinds of expressions to be useful (more useful than the alternative where the coercions are pushed in). @stereotype441, is there a technical reason why it would be difficult (like hard to implement, unsound, or in some other way not working) to omit the propagation of coercions into subexpressions in these particular cases? |
It all depends what we mean by omitting the propagation of coercions into subexpressions. As I see it, whether or not coercions are "propagated" from one place to another is an emergent property of the type anaylsis rules we choose. If we want different behavior we have to pick different rules, but it's not always clear to me what rules we're implying when we discuss alternatives like this. Currently we have a set of rules that says:
These rules have the emergent property that when there's a "choice" of when to apply a coercion (e.g. whether to apply a coercion to the target of a cascade or to the entire cascade expression), it tends to happen nearer to the top of the expression tree. But there's no code anywhere that's explicitly "making" that choice; it just emerges as a result of the rules. In #3471, I've imagined replacing rule 4 with a rule that says that coercions are performed whenever the single-pass depth-first traversal of the expression tree encounters an expression whose static type would otherwise not be a subtype of its context. That's not too difficult to do, since there is a single place in the codebase where the new rule would need to be added, and removal of the old rule could be spread out over time as a clean-up activity. But as we've discussed, it looks like it's a pretty seriously breaking change. In #3575 (comment), I mentioned the idea of requiring every expression to satisfy its context or have type Another possible rule change we've talked about would be split contexts into a required part and an aspirational part. Concretely, we would change rule 2 to say that "The input to each traversal step is a type schema, which we call the 'aspirational context', and a required type context". Then we would change rule 4 to say that coercions are performed when the static type of an expression would not otherwise be a subtype of the required type. I explored this idea a few months ago and I found it to be a lot of work, both in terms of implementation (because it required changing all expression visitor methods to accept this additional input) and in terms of specification (because it required deciding what the required type should be for every recursive invocation). Also, it gets tricky because when inferring generic type parameters on method and function calls, sometimes we don't know precisly what the type of a subexpression will be required to be until after we've visited it (because we haven't fully committed to what types we're going to infer yet). So getting back to your question, to determine whether it will be difficult to push coercions down into some kinds of expressions but not others, we need to first figure out what rules will lead to that emergent property. Of the rule changes I've considered, only the idea of splitting contexts into a required part and an aspirational part would let us push some coercions down but not others. And that one looks like a lot of work. But I'm open to more brainstorming. |
I think another way of saying this is there is no choice to be made about where to insert the coercions. Or at least, the coercions from dynamic. That choice follows naturally from the choice of what type to infer for expressions. That is, type inference is the process of assigning to every sub-expression a type. In certain places, we have a choice about how to distribute that assignment. So for example, given The point is that coercions are intended to mediate soundness, and soundness is the property that an expression of type So I believe that the correct way to think about this is to think directly about it in terms of what types we infer for the various sub-expressions. The coercions then follow naturally from that. Am I missing anything there? |
I'd say we have one more option here: int x = (b::bool ? i::int : d::dynamic) :: int; That one isn't mentioned, but we can define an inference that gives it:
Inference is about assigning static types to expression, and about deriving the static type of an expression from the static types of its subexpressions. If we have a general rule of "If you would infer static type The way it's currently implemented for dynamic downcasts, it's more like coersions are part of the semantics of assignments, function calls, and any other operations where we add them. We don't need to change the static type of the subexpression, but we coerce the value before we use it, as part of the semantics of the operation that has that subexpression. I think that can be easier to specify precisely, because coercion happens at specific points in the semantics, where we specify that they do. Rather than at some implicit point between the inference of the subexpression, and the containing expression seeing the result of that inference. That's not really a point that exists, and likely why it hasn't been implemented that way. ("The static type of the subexpression" has two different answers at different points in the algorithm, and we really need to say something like that we do "pre-inference" on the sub-expression, which is the inference we have currently specified, and actual "inference" of the subexpression takes the result of the pre-inference and possibly coerces it, so we can swap back-and-forth, interleaving the two algorithms. Just so we can give different names to the two types assigned to the same expression.) The cost of putting the coercion into the operation semantics is that we have to be explicit about every coercion. If we forget to mention it in one place, no coercion happens there. The "coercions happen all the time, between any two expressions" has it easier, but ... well ... we didn't actually implement it. I'm wagering that there's a reason for that. |
I don't know where you're going with this. As written, this is an unsound typing, because it gives the conditional expression the type So first, we need a way to say what the type of a conditional expression is. Let's add that as a missing type parameter on the Second, we need a way to say what the "inferred type" of an expression is, separate from it's underlying "inherent" type. So for an expression Finally, to reduce pointless ambiguity, let's say that we're going to require as well that at assignments to variables, the expression being assigned must have a type which is a subtype of the variable being assigned to. This reduces the pointless distinction between Ok, given that, let's try again. Given
I don't think there are any other ways to infer this that are not either unsound, or just pointlessly redundant in a similar way to what I've tried to forbid above. Am I missing something? I couldn't really follow where you were going with the rest of your comment, so maybe if there's something I'm missing the syntax above will give us a common language to express it in? |
TL;DR: I agree that what you present matches the intented currently specified behavior. I’m not sure that behavior is well specified, and we already know it’s not what’s implemented. I’m suggesting that there can be a model where coercions happen during the evaluation of operations, not between them, and that this is actually what we have currently implemented. That model is internally consistent, so I think it’s too early to say that the behavior it describes, or could describe with small tweaks to specific operations, cannot happen. Your description matches,our current attempt at specifying coercions. Let's start at the root:
The quotes are there for a reason, because this is the distinction that I think our specification fails to make, which again may be a contributing factor to not actually implementing the specification. Because when the specification is ambiguous, it's not clear whether an implementation follows it or not. The The specification is ambiguous because it calls both of these "the static type of I have a hard time pointing to the precise places where the current semantics are specified, because it’s combined from a number of different documents, some maybe not committed yet, but for the static typing of
That is, the input types to the static semantics of the conditional expression, the types of the subexpressions, has the same kind as the output type. There is no distinction between "inherent" and "inferred" type. So, as a first step, let’s try to make the current specified behavior (or what I understand as its intent) explicit. Let’s use completely new names, to avoid implicit assumptions.
Coercions are then the (implicit) step that goes from initial inferred type to final inferred type, based on the context type scheme. The step is implicitly introduced between any expression and its subexpressions, because the inference on the subexpression only finds the initial inferred type, and the parent expression needs and uses the final inferred type. Using meta-syntax like above, the inference of an expression int x = ((b):bool::bool ? (i):int::int : (d):dynamic::int) :: int It’s nigh impossible to specify coercions this way, and not get eager coercion, because the coercion of a subexpression is specified independently of the expression it’s part of. We implicitly say that an expression always refer to the final inferred type of its subexpressions. That implies, to ensure soundness, that the expression must produce a value of that final inferred type, which means doing the coercion that produces the final inferred type. Now my point, which is that we didn’t actually implement that, and maybe the implementations were right. I don’t know why implementations didn’t do eager coercion. A contributing factory was likely that the specification was very implicit about it, so it was hard to point to place that said “do eager coercions”. Also that the implementations could keep doing what they already did in Dart 1, which was to do type checks when type checks were needed, and it still worked well for almost all reasonable programs. And we never actually wrote tests for the edge cases of the specific behavior. Maybe the people doing the specification weren’t entirely clear on precisely what was specified, and whether it matched what they wanted (I know I wasn’t). A way to describe the model that is actually implemented is that coercions happen internally in the semantics of specific syntactic constructs, not in the gaps between an expression and its subexpressions. Or stated differently, the semantics of an expression refers only to the initial inferred types of its subexpressions, and it has to explicitly perform static and runtime coercions as part of its semantics, if it wants or needs those values. The (currently implemented, late-coerced) semantics of an expression like
There is a coercion at the branch expression, because it needs to have type Similarly
(We can, and would, choose to abstract over these coercion steps and just write “Evaluate That is, coercions are not something that happens between the inference and evaluation of expressions, its something that happens as part of the static and runtime semantics of an operation if it chooses to do so. Currently, the implementations choose to apply coercions at specific operations:
which we were mostly aware of, but also, implicitly, inside “unwrapping” operations (as Paul named it) like:
If we want to actually specify these operations properly (not as the current desugaring which effectively introduces new syntax after inference), then I think we’ll need something like this to do so. The multiple discussions related to #3471 suggests, at least a little, that just maybe the implementations were on the right track. Eagerly downcasting So, when you say about: int x = (b::bool ? i::int : d::dynamic) :: int; that it gives the conditional expression a type But as things are currently implemented, where the conditional expression doesn’t coerce its branches at all, we can give
This forces coercion of the taken branch, which means that it’s a compile-time error if the final inferred type (initial inferred type coerced to context type) of each branch is not a subtype (or the more complicated version) of the context type. That will make it much easier to change the use of UP to be at most When coercion is built into the semantics of individual expressions, we only need to assign one type to each expression (the initial inferred type), then the other operations can find the final inferred type from that if it needs it, and then that operation must also introduce a coercion of the the runtime value to that final inferred type. And we have to be explicit about every coercion, if we forget one, coercion won’t happen. At least, as long as the static and dynamic semantics agree on whether coercion happens, the result should be sound. |
I'm confused. There is no spec. There is an implementation, and I believe it matches one of the two options I presented. The discussion we've been having is about changing it from one to the other. What am I missing? What is the other option that you see?
I don't know where you're going here. I'm highly confident that I can specify coercion insertion to produce either of the two coercion reifications that I presented above. If you feel this is impossible, I'll need more explanation as to why.
No, this is where you're going wrong, at least as I see it. The point I am making is that there is a choice of how to produce what you are calling the initial inferred type, and once you have made that choice, the insertion of coercions is fully determined. It is true that the current choice of how to assign inferred types forces eager coercions. My point is just that there are two ways to assign initial inferred types (in your terminology) and one of them naturally leads to eager coercions and one naturally leads to late coercions.
I implemented this. The first implementation of strong mode, I implemented type inference in the analyzer, and then implemented coercion insertion as a separate pass (I called it "coercion reification"). I wasn't smart enough about the internals of the analyzer to realize that it was practical/possible to attach a type to conditional expressions, so I by default made the choice to make the type of conditional expressions implicit as the LUB of the types of their arms. The coercion reifier simply worked as I describe above: it looked at very term, checked whether its type was a subtype of the destination, and if not, checked whether a coercion could be inserted to satisfy it. The composition of those two design choices gives you the current implementation choice. All I'm trying to say here is that if I had simply made the choice to annotate conditional expressions with a type based on the context (where present), the coercion reifier would then naturally have inserted the coercions inside of the branches, instead of outside. No magic there - the types drive the coercions, so all of the decisions are about how you distribute the types across composite terms. |
I am possibly confused. We have an implementation, which is what our compilers currently do. We're talking about whether and how to change that. We have (a dream of) a specification. I am mainly talking about specification here, which relates to implementation in that some specification strategies may have problems specifying certain actual behaviors. I have been under the impression that we generally consider implicit downcast, and implicit coercions in general, to be (intended to be) specified by something like:
That's how we have generally talked about it in discussion, and we have discussed how the implementations don't actually match that. It isn't actually written out (possibly anywhere, maybe just not where I can find it), so it's quite possible that that's just, like, my opinion, and everybody else has had some other idea about how things are intended to work, and it's been vague enough that we haven't actually noticed the discrepancy. What I am saying is that "something like" the above, where the choice of whether to do coercion is based only on static type and context type of the expression, will unavoidably specifiy eager coercion. If coercion can happen for that expression type and context type anywhere, it will happen for the same expression type and context type everywhere, because it's defined entirely in terms of those. You are then saying that we can have specification like this, and get late coercion. That's where I think I'm missing something, maybe something unstated, because there need to be some extra information to trigger the coercion only in some positions.
I think the missing extra information might be the "destination", which I'm not sure what is. It's not just the context type, because then coercion would be eager. If a "destination" is something that only assignment, argument or test positions have, the positions that really require coercion to have happened, then that explains why that's where we coerce. But that also means that coercion depends on the expression that the coerced expression is a subexpression off. Doing So for my example of a conditional expression which forces coercions to its context type, If we instead use the view that I have a hunch that the "parent expression semantics explicitly specifies coercions" approach is necessary for formally specifying something like the current If we want to have a formal specification like that, it means that coercion cannot solely happen at the boundaries between expressions of the program. It can be invoked for a context type and a value that arose in the middle of sequence of semantic steps, which was never the value of any expression with that context type in the source program. We don't have a specification, so we don't have established terminology (or we do, and I just don't know it). That makes it hard to figure out if we agree, or are even talking about the same things. I think we do mostly agree on the behavior, but not really how to talk about it. |
Ok, this is helpful. I agree that that description leads naturally to eager coercions. But I don't consider something like the above a specification, or at least the only way to do a specification. It's a declarative statement that tries to capture "how things work", but it doesn't really specify anything. What is a context? Where are they introduced? What does it mean for an expression to have a static type? etc. Additionally, it's not by any means the only way to do either an informal or formal specification. I think (but am not sure) that a similar "specification" (which as I say above is not a specification) of lazy coercions would be something like this: When we introduce a new context C to infer a type for an expression E, and the static type inferred for E is T, and T is not a subtype of C, then we insert an implied coercion on E to convert it to a value of type S where S is a subtype of C (if such a coercion exists). I believe that this is essentially a declarative description of lazy coercions as we implement them, but I'm not completely sure. As I say, whatever it is, it's not a specification, since, as with your description above, it doesn't actually say what contexts are, where they are introduce, what the static types of terms are, etc. My claim is simply that once you do all of that, the choices you make about what the static types of terms are will essentially (within the degrees of freedom granted by redundant coercions, etc) have fully defined the coercion strategy. But really, that claim is kind of irrelevant to the main point, which is that when we do fully specify the inference process, we can specify coercion insertion however we want, again, so long as it is compatible with the typing. Generally speaking, for an actual specification here, I would propose (and certainly the way that I think about this) is that we simply specify (via syntax directed inference rules, or algorithm, or whatever) exactly how we mean to assign types and coercions to transform a piece of raw un-annotated syntax into fully typed and annotated AST. To make this concrete, I'll try to give an example of how we might specify this. So first off, what is it that we are specifying? Well, in Dart post 2.0, we allow programmers to elide some of the types, and some of the coercions, that would otherwise be necessary to fully specify a typed Dart 2.0+ program. We do type reconstruction to rewrite terms into a form which is fully annotated with types and coercions. Note that not all of the types needed for a fully typed Dart program can even be written in Dart syntax. Two examples: Dart syntax has no way to write the return type on a lambda, and yet a fully type inferred Dart program must specify what the return types of lambdas are; and Dart syntax has no way to write the type on a conditional expression (or a switch expression) and yet a fully type inferred Dart program must specify what the type of the conditional expression is. I'm being loose with that I mean by "fully type inferred" here, but it basically means that I can produce the full type of an expression simply by looking at the outermost node of the AST for that expression (more or less, constant factor), without doing any significant computation. So I consider the term Ok, so how might we specify this? If we're using markdown friendly algorithmic rules, we might say something like this. (Note: this isn't an actual attempt at a formal specification here - I'm banging this out in a comment thread, so please don't get stuck on typos, or minor details that I'm forgetting to handle - I'm just trying to give the general flavor here). INFERWe define a type reconstruction algorithm Option 1: eager coercions (sample of some rules).Commentary: it should be an invariant of this option that if Declarations:Given a variable declaration with initializer of the form More declaration forms here Expressions:We define If
If
More expression cases here Option 2: lazy coercions (sample of some rules).Commentary: it should be an invariant of this option that if Declarations:Given a variable declaration with initializer of the form More declaration forms here Expressions:We start by defining a helper relation
We define If
If
More expression cases here Option 3: lazy coercions with eager conditional expression typing (sample of some rules).I think this is basically how we would specify the change that I'm asking @stereotype441 to make in the short term if we are not able to move directly to eager coercions (which it seems like will be hard, if even possible/desirable). Commentary: it should be an invariant of this option that if Declarations:As in option 2 Expressions:We start by defining a helper relation We define If If
Commentary: I'm doing this from memory, but I think this is basically what @stereotype441 proposed. Basically, if the two branches of the conditional are each subtypes of the context (and we have a non-trivial context) we use the context as the type of the conditional. Otherwise, we fall back to the upper bound. More expression cases here So that's how I would propose we think about this, and when we sit down to specify it, how we actually specify it. Again - apologies if I've made typos, missed some cases, or am not getting things quite right. This is just a quick comment to try to explain where I'm coming from here. And I'm open to other approaches! So for example, as I said earlier, we could probably do this in two stages where we first infer types (allowing for type mismatches) and then specify a second relation which looks for type mismatches and inserts coercions where required. Or... something else? But this is how I would go about it, and where I'm coming from when I say that we can specify lazy or eager coercions equally well simply by specifying how the types are inferred. |
That does look like the approach I'd also suggest, with the caveat that I'd avoid creating a new expression if at all possible. If we have something like I'd suggest that instead of patching the syntax, we say that type inference annotates every expression which uses type arguments, whether some were given explicitly in the syntax or or not, with the semantic type arguments that will be used later. That means that if you write The static type would also be such an extra semantic slot/assigned metadata on the abstract syntax expression. Just because I want to, let me try to specify
and STATIC_COERCE(
and COERCE(v,
Something like that. (Forced coercion before UP, use UP if it's better than context, context if not.) Going back to
Then we will need to explain how we know this is sound. We only require Let's try
I believe this should match the current behavior. It allows both |
In #3471 , there is a proposal to require that all expressions be a subtype of their context type. This issue is to clarify the intended treatment of a specific corner of the language under this proposal.
Consider the following program:
This program passes static analysis currently. I believe that the current treatment chooses
FutureOr<int>
for the context type when do inference onx
. This has no effect, the static type ofx
remainsdynamic
. The static type ofawait x
is thereforedynamic
as well, and there is an implicit downcast toint
inserted at the assignment toy
. Semantically, it is as if we had writtenint y = (await x) as int
.This program also runs to completion currently. I believe that the semantic interpretation of
await x
in this case is essentially to check whetherx is Future<Object?>
and if so to unwrap the value from theFuture
and otherwise to simply return the value. In this case, the instance check returns true, the value is unwrapped, andawait x
evaluates to3
and the implicit cast succeeds.My natural interpretation of the proposal in #3471 is that we would change the above as follows.
Statically, we would instead say that since
x
does not satisfy its context type, we should insert a cast to the context type (hereFutureOr<int>
) inside of theawait
. The result would be that semantically, it is as if we had writtenawait (x as FutureOr<int>)
. The static type ofawait x
would then beint
, and no implicit downcast outside of theawait x
would be inserted.Assuming the above, then the runtime behavior of the above program will change. The cast inserted on
x
will fail, and the program above will terminate with a cast failure.Is my interpretation of the intended semantics above correct? Is this change expected/desired? If so, do we have any data on how breaking it would be?
cc @dart-lang/language-team
The text was updated successfully, but these errors were encountered: