Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[views, extension types] Lightweight type conformance #2150

Closed
eernstg opened this issue Mar 9, 2022 · 12 comments
Closed

[views, extension types] Lightweight type conformance #2150

eernstg opened this issue Mar 9, 2022 · 12 comments
Labels
extension-types feature Proposed language feature that solves one or more problems inline-classes Cf. language/accepted/future-releases/inline-classes/feature-specification.md

Comments

@eernstg
Copy link
Member

eernstg commented Mar 9, 2022

Status: We've had several discussions about this issue, some of it in comments on this issue, and other parts in email and elsewhere. I think it's fair to say that we've concluded the following:

The generalization of implements on views as proposed is possible, but not sufficiently general to be worthwhile as an explicit language mechanism. The generalizations that come to mind all seem to involve widespread implicit boxing (and a lot of complexity), which would create similarly widespread issues with object identity and space consumption.


This issue contains a proposal for how to generalize the implements clause of a view slightly, thus adding support for a lightweight type conformance mechanism.

Current Rules about implements on a View

The views proposal (cf. also the very similar, but older extension types proposal) include a feature with the syntax implements <typeList>, which is used to specify that the given view has members that are correct overrides of the members of the types in the implements clause.

In short, if we want to ensure that a view V<X> behaves like an Iterable<X> then we can declare it as

view V<X> on ... implements Iterable<X> ... {...}

With this declaration, it is a compile time error unless V declares a member Iterator<X> get iterator. In general, the view must satisfy the same requirements that any concrete class C<X> must satisfy when it has an implements Iterable<X> clause.

An implements clause on a view does not introduce a subtype relationship. So we don't get the abstraction. The compiler will know that we're operating on an instance of the on-type of V with the view defined by V on it, so we have no mechanism which is similar to object-oriented subtyping. This is crucial because it ensures that view method invocations can be resolved statically (so they might be inlined, etc).

However, the implements clause is supported with the full, normal, object-oriented semantics with a "boxed" view.

A view may support a box operation that creates an instance of a compiler-generated class V.class whose instances work as wrappers for the on-type instance of the given view V. V.class declares exactly the same methods as V, with the same signatures. So if myView is a variable of a view type V whose value is an instance o of the on-type of V, myView.box yields an instance oBox of V.class that wraps o. oBox is a full-fledged Dart object, and V.class has the same implements clause as V does, and this means that oBox is actually an object which can be passed around using any of the implemented types.

Possible Generalization of implements on a View

We have had some discussions about the treatment of views when they are used with for-in statements. The main observation is that it is unnecessarily costly to use the standard semantics.

As an example, let's introduce a singleton collection and support the use of that type of object as an iterable that repeats the single object that it holds.

class Singleton<X> {
  final X x;
  Singleton(this.x);
}

view SingletonAsIterable<X> on Singleton<X> implements Iterable<X> {
  Iterator<X> get iterator {
    SingletonAsIterator<X> it = this;
    return it.box;
  }
  ... // Other `Iterable` members.
}

view SingletonAsIterator<X> on Singleton<X> implements Iterator<X> {
  bool moveNext() => true;
  X get current => this.x;
}

void main() {
  SingletonAsIterable<int> it = Singleton(1);
  int count = 0;
  for (var i in it.box) {
    print(i);
    if (++count > 100) return;
  }
}

We have to box the singleton collection using it.box in order to obtain an object whose type implements Iterable<int> which wraps the given singleton. This is required for it to be used in a for-in statement. Next, the method iterator also needs to box the singleton object, such that we obtain an object whose type implements Iterator<int>.

If we generalize the implements clause of a view as specified below then we can use the following:

view SingletonAsIterable<X> on Singleton<X> implements Iterable<X> {
  SingletonAsIterator<X> get iterator => this;
  ... // Other `Iterable` members.
}

view SingletonAsIterator<X> on Singleton<X> implements Iterator<X> {
  bool moveNext() => true;
  X get current => this.x;
}

void main() {
  SingletonAsIterable<int> it = Singleton(1);
  int count = 0;
  for (var i in it) {
    print(i);
    if (++count > 100) return;
  }
}

The main idea behind the generalization is that a for-in statement will require that the collection has a static type that conforms to Iterable<T> for some T, and the conforms relation is a bit more flexible than the normal subtype relationship.

In particular, SingletonAsIterable<T> conforms to Iterable<T>, and SingletonAsIterator<T> conforms to Iterator<T>, but there is no corresponding subtype relationship between those types.

The resulting code avoids both boxing operations, and the code in main will use statically resolved functions rather than object-oriented methods that are dispatched at run time.

If we make the choice to compile for-in statements where the collection is a view using the specified desugaring (compilers currently do something else for a normal for-in statement, but we can do this for that particular kind of for-in statement), then we'd get the following, after inlining:

void main() {
  SingletonAsIterable<int> it = Singleton(1);
  int count = 0;
  {
    // The cast is guaranteed to succeed, it will not be executed at run time.
    SingletonAsIterator<int> id1 = (it as Singleton);
    var id2 = (id1 as Singleton); // Again, the cast has no cost at run time.
    while (true) {
      var i = id2.x;
      print(i);
      if (++count > 100) return;
    }
  }
}

In general, the notion of type conformance will allow code using views to have methods that return view types, and hence avoid boxing, while still maintaining the property that a boxed view will implement "normal" classes. This allows a larger set of method invocations to remain statically resolved.

For example, we may use code generation to obtain a set of views that represent safe navigation of specific JSON values. The underlying representation would be statically typed as Object?, and it would actually consist of instances of Map<String, Object?>, List<Object?>, String, bool, and numeric types. The views would ensure that this dynamic object graph would be navigated safely, as long as we start off with a view that represents access to a JSON value with a specific schema S, and the actual representation object is a JSON value with schema S.

We would then be able to navigate the JSON value safely without ever boxing any objects, we would just use the views to ensure that the "raw" operations (like casting Object? to Map<String, Object?> and then looking up the value at key "someKey", etc.) are performed in a way that corresponds to the schema S.

However, we would also be able to use box to obtain a boxed representation of the JSON value, and we could then use regular object-oriented coding to navigate the JSON value, because the boxed representation is a normal object with normal OO methods. The point is that the recursion is handled automatically: If we have a given boxed JSON value and extract a (smaller) JSON value from it, then the latter will also be boxed, and we do not have to have two near-identical copies of the code.

So how is this possible? Details below!

A Type may 'Conform to' another Type

This is a new relationship between two Dart types. It is very simple:

  • A view type V conforms to a non-view type T if V is declared with an implements clause that includes a type T0 such that T0 <: T.
  • A non-view type S conforms to a non-view type T if S <: T.
  • It is never the case that a type conforms to a view type.

At first, this concept is only used with for-in statements: We require that the
collection has a type that conforms to Iterable<T> for some T, or dynamic (we always allowed dynamic here, because it is a built-in part of assignability). We continue to have the run-time semantics where a non-view type is statically known to be an iterable, or it is checked at run time (in the case where the collection has static type dynamic), so everything is working in the same way as today, except for view types. This gives rise to the behavior with for-in statements that was described in the previous section.

However, we can make type conformance explicit in the language, and we can use it to enable customized variants of a given abstraction:

void foo<X conforms Iterable<int>>(X x) {
  for (var i in x) print(i);
}

void main() {
  SingletonAsIterable<int> iter = Singleton(2);
  foo(iter);
  foo([1, 2, 3]);
}

The main idea is that when a function is generic, and one or more type parameters are specified with conforms rather than extends, it is given special treatment during compilation:

  • A "general" copy of the function is compiled, and it will be used in the case where no actual type argument is a view type.
  • For each call site of this function where one or more actual type arguments is a view type, a customized version of the function is compiled where those type arguments are eliminated by substituting them into the body of the function, and the resulting function is compiled. This may give rise to a substantially different set of optimization opportunities.
  • When this compilation process gives rise to identical compiled code, one is kept, and all others are redirected to reuse it.

This is similar to template function instantiation in C++, and it allows developers to write a piece of reusable code and still obtain the optimization that could be expected if they had written several variants of the function that only differ by having committed to specific values of specific type arguments.

Of course, this could give rise to serious code bloat problems, so it will have to be used with caution.

On the other hand, the notion of type conformance ensures that there will never be a compile-time error in the body of one of these "template functions": When the type arguments satisfy the requirements, the body will be type correct.

Rules about the Implements Clause of a View

As mentioned, the implements clause of a view differs from the implements clause of a class or mixin, because there is no subtype relationship between the view and the types in the implements clause (but the boxed form has these subtype relationships, as expected for a class).

Correspondingly, the rule that determines whether an instance member declaration in a view is a "correct override" of the member signatures of the types in the implements clause is different.

The return erased signature se corresponding to a signature s is such that se has return type void, and se is otherwise identical to s.

Assume that V is a view with an implements clause containing the types T1 .. Tk. Assume that V has a declaration of a member named m with signature sV, and Tj has a member signature sT named m.

We say that sV is a conforming override of sT iff:

  • sV and sT are both getters, and the return type of sV conforms to the return type of sT.
  • sV and sT are both setters, and sV correctly overrides sT (using the normal rules for correct overrides).
  • sV and sT are both methods, and the return type of sV conforms to the return type of sT, and the return-erased signatures sVe and sTe corresponding to sV respectively sT are such that sVe is a correct override of sTe (using the normal override rules again).

In short, a conforming override is just like a normal override, except that the return type can use type conformance rather than subtyping.

The class V.class that defines boxed instances corresponding to a view type V is adjusted correspondingly: The member signatures of V.class are computed by taking the signatures of the member declarations of V and adjusting return types such that there is a correct (normal, not conforming) override relationship from each member in V.class to the corresponding member in each of the implemented types T1..Tk. (If that relationship is violated then V.class is a compile-time error.)

The implicitly induced implementations of the methods in the class V.class will still forward the call to the view method, but it will add .box to this forwarding invocation as needed, such that the method body is type correct.

This means that in the case where the implements clause makes the promise that a certain non-view type T is returned, and the view method returns a view type V, the returned value is auto-boxed. As a consequence, we get the effect that we can work on the underlying on-type including the case where view methods return view types, and we can also work on boxed representations, and in that case the V.class methods will return the corresponding boxed types. This is as claimed earlier in connection with JSON values.

Discussion

The main idea here is that we can propagate the performance benefits associated with views a bit further with this generalization than we could previously, and we can allow for a certain (encapsulated, and type safe) level of C++-template-function-ish customization, which could yield significant performance improvements, but which also carries a danger in terms of code bloat.

@leafpetersen, @lrhn, @natebosch, @stereotype441, @jakemac53, @munificent, WDYT?

@eernstg eernstg added small-feature A small feature which is relatively cheap to implement. extension-types inline-classes Cf. language/accepted/future-releases/inline-classes/feature-specification.md feature Proposed language feature that solves one or more problems and removed small-feature A small feature which is relatively cheap to implement. labels Mar 9, 2022
@eernstg
Copy link
Member Author

eernstg commented Mar 9, 2022

@chloestefantsova, do you think this looks like a scary 'feature' or more like a manageable 'small-feature'? ;-)

@lrhn
Copy link
Member

lrhn commented Mar 10, 2022

As usual I'll point to a Rust traits-like functionality as another implementation strategy for structural polymorphism.
Instead of specializing the code for each different structural "conformation" of a interface, we pass the base object and the strategy object "implementing" that interface on the base object as separate parameters (or as a "fat pointer", object + V-table, or as an "unboxed wrapper object", or whichever way you choose to look at it.)

If we use conforms_to instead of extends (or maybe static extends!) in the places where we want to be able to pass objects that implements an interface without subtyping it, or have some other way to mark them, then we won't need to use fat pointers for every interface typed variable.

@leafpetersen
Copy link
Member

The main idea is that when a function is generic, and one or more type parameters are specified with conforms rather than extends, it is given special treatment during compilation:

  • A "general" copy of the function is compiled, and it will be used in the case where no actual type argument is a view type.
  • For each call site of this function where one or more actual type arguments is a view type, a customized version of the function is compiled where those type arguments are eliminated by substituting them into the body of the function, and the resulting function is compiled. This may give rise to a substantially different set of optimization opportunities.
  • When this compilation process gives rise to identical compiled code, one is kept, and all others are redirected to reuse it.

This is an implementation strategy, possibly, but not one that I would recommend, and I think it overly conflates things.

As usual I'll point to a Rust traits-like functionality as another implementation strategy for structural polymorphism.
Instead of specializing the code for each different structural "conformation" of a interface, we pass the base object and the strategy object "implementing" that interface on the base object as separate parameters (or as a "fat pointer", object + V-table, or as an "unboxed wrapper object", or whichever way you choose to look at it.)

This would be my recommended implementation strategy for this feature, where the strategy object is just the type parameter itself. That is, we don't compile two versions of any functions, we just compile a protocol-polymorphic function to expect the type parameter to provide the methods for the protocol(s) to which it conforms.

@lrhn
Copy link
Member

lrhn commented Mar 10, 2022

I like the idea of linking the strategy to the type variable (the wonders of reified type arguments), but I think we'd also want to be able to specify the "conforms to" type constraint directly on parameters.

You can write foo<R, T conforms_to Iterable<R>>(T something) { for (R r in something) ... }, but it can easily get cumbersome to call compared to foo<R>(conforms_to Iterable<R> something) { ... }.
Some functions cannot be generic at all (operators and setters are the main pain points, where we have already seen people ask for generic variants just for existential typing, and this is like existential typing with a proof object.)

(I also worry that if we allow T conforms_to B to just work better than T extends B, by allowing strictly more types, people will just migrate all code to the former, and then only go back to extends if it turns out to be a performance problem, and only if they realize that that is where the performence issue comes from.)

@a14n
Copy link

a14n commented Mar 10, 2022

You can write foo<R, T conforms_to Iterable<R>>(T something) { for (R r in something) ... }, but it can easily get cumbersome to call compared to foo<R>(conforms_to Iterable<R> something) { ... }.

The notion of "conforms to" on parameters looks similar to structural typing. In #1612 I proposed the syntax dynamic<Iterable<R>>.

@eernstg
Copy link
Member Author

eernstg commented Mar 10, 2022

I probably did not put enough emphasis on the main purpose of this proposal: It is intended to enlarge the set of situations in code where the zero-cost nature of the view abstraction is preserved.

(The starting point was a discussion via email about how we could improve the performance of for-in statements in the case where we have a view on an object that allows it to behave as an Iterable<T> for some T, and we don't want to create an actual object whose type is a subtype of Iterable<T>, we just want to use the view members directly.)

I consider performance to be a crucial feature for views, so it does matter whether we're staying within the boundaries of static resolution, or we're adding run-time dispatch mechanisms and other indirections.

@lrhn wrote:

I'll point to a Rust traits-like functionality ... for structural polymorphism

I don't agree that structural polymorphism is at the core of this proposal.

The constraint X conforms T allows X (among other things) to be a view type V such that T is among the types that V implements. For each of those view types V1 .. Vk there is a corresponding on-type T1 .. Tk, and there is no relationship whatsoever between Ti and Tj for any pair i != j.

So if we're considering an invocation of a function that declares a type parameter like X conforms T (or conforms_to) then an actual argument of type X is not known to have any properties (other than being an Object?).

So we do not have the situation where the given object is known to have a specific set of members with a specific signature (which would be structural polymorphism), we just don't know anything other than Object?.

We could easily turn this into a matter of normal subtype polymorphism: Just introduce a rule which says "If an expression e has a view type V and a context type T which is not a supertype of Object, and V implements T0 such that T0 <: T, then e is implicitly transformed into e.box". That is, just add auto-boxing, and then we don't need to have a notion of 'conforms (to)' at all.

However, that defeats the whole purpose of this mechanism, which is to preserve the zero-cost nature of the views mechanism in a slightly broader set of situations.

So that's the reason why I'm proposing to preserve the static resolution of the view methods, rather than adding dispatcher objects, of any kind.

@leafpetersen wrote:

This is an implementation strategy, possibly, but not one that
I would recommend, and I think it overly conflates things.

[citing @lrhn's description of using a strategy object, cf. Rust fat pointers]

This would be my recommended implementation strategy for this
feature, where the strategy object is just the type parameter itself

We can definitely do some very interesting and useful things with Haskell-typeclass-ish strategy objects (which could certainly be handled using the run time type representation), but that is again an approach that is based on a new and more complex/powerful kind of run-time dispatch.

More expressive power at run time is always cool, but I don't think we should ignore performance.

@chloestefantsova
Copy link
Contributor

@chloestefantsova, do you think this looks like a scary 'feature' or more like a manageable 'small-feature'? ;-)

The rules for the for-in loop and the rules for conforming member overrides in views seem to be relatively easy to implement. The rules for the general and the customized versions of the copy can be a problem, especially for the modular compilation when we don't know all of the call sites. In first approximation the problem seems similar to lowering of the named optional parameters by generating function versions for each combination of the actual arguments, which proved to be unpractical AFAIK.

@eernstg
Copy link
Member Author

eernstg commented Mar 10, 2022

@chloestefantsova wrote:

The rules for the general and the customized versions of the
copy can be a problem, especially for the modular compilation
when we don't know all of the call sites.

Right, that's exactly the 'code bloat' issue that I mentioned, which is well-known from C++ templates. We could end up generating a large amount of code with a lot of redundancy. So we'd definitely want to do the same thing that they've done in C++ for many years: Eliminate duplicates. With modular compilation this would be done at link time.

But the other side of that coin is that we may obtain significant performance benefits from having a small number of carefully selected customized versions of a given performance critical piece of code.

Part of the picture here is that views are not expected to be the new class mechanism, used everywhere for all purposes. They are expected to be a much more specialized device, offering improved performance based on a statically specified commitment to a specific underlying representation. So we're giving up on object-oriented encapsulation when we use a view type, and we'd only do that in specialized situations where we are willing to pay for that loss of abstraction because we get something in return that we definitely must have (and that would typically be improved performance).

@leafpetersen
Copy link
Member

@eernstg Discussed this in person. I think we are in agreement that the question of whether protocol polymorphism is implemented via code duplication or via dictionary passing is largely irrelevant (though it might observable to users in that certain polymorphic recursion patterns would be rejected in the code duplication strategy since there is no finite monomorphism).

The main outcome of the discussion though was that protocol polymorphism is largely irrelevant to the main thrust of this issue, which is really intended to be about supporting auto-generating a "boxing" class for views, with auto-generated forwarding stubs which translate between the boxed and unboxed versions. That is, I believe the main goal here was to propose that if we have a view V which implements an interface which specifies that it requires a C get foo; and it is implemented via V get foo => ..., then 1) such an implementation is allowed (that is the specialization of the return type from C to V), and 2) the auto-generated boxing class V.box will have a method forwarding stub generated that is morally equivalent to V.class get foo => V.box(this.unbox.foo)

That is, the auto-generated box class will add forwards which translate from the unboxed view type to the boxed view type.

This cannot be done for composite types (e.g. List<V>), and can't be done for method parameters.

@eernstg
Copy link
Member Author

eernstg commented Mar 14, 2022

@leafpetersen wrote:

the auto-generated box class will add forwards which translate
from the unboxed view type to the boxed view type

Right, the point is that we often have an interface involving a set of interconnected classes (in the main example it's Iterable and Iterator), and it might be helpful to be able to write a set of views with corresponding interconnections, with a declared relationship (something that we might call 'conforms to'). The simplest case is a class which is connected to itself.

abstract class C {
  C get next;
}

view V on T implements C boxed as B { // `T` could be anything, doesn't matter here.
  V get next => ...;
  // The compiler implicitly generates the following getter:
  B get box => B(this);
}

// The compiler implicitly generates the following class.
class B implements C {
  final V _this;
  B(this._this);
  C get next => _this.next.box;
}

This means that we can work with instances of type T using the view V, and we can navigate from one of these to the "next" one using the next getter. But we can also box such an instance, obtaining an object of type B, which is assignable to C, and then we can work on that object using the static type C, still navigating to the "next" one using next.

However, this kind of parallel type hierarchies can only be supported with very limited generality, and in particular it doesn't carry over to higher order constructs. Just consider an allNext getter as follows:

abstract class C {
  Set<C> get allNext;
}

view V on T implements C boxed as B {
  Set<V> get allNext => ...;
  // The compiler implicitly generates the following getter:
  B get box => B(this);
}

// The compiler implicitly generates the following class.
class B implements C {
  final V _this;
  B(this._this);
  // We would need to do something like the following:
  Set<C> get allNext => _this.allNext.map((v) => v.box);
}

It is not realistic to even start going down this path (which is basically the path of performing implicit auto-boxing in arbitrary object structures, and wrapping functions in a similar manner, etc). So that's the reason why I proposed to do it for return types only.

It can be done with formal parameter types, but it's hardly useful. We can let the boxing class method box an actual argument when needed:

abstract class E {
  void m(V v);
}

view V on ... implements E boxed as F {
  void m(E e) {}
}

class F implements E {
  final V _this;
  F(this._this);
  void m(V v) => _this.m(v.box);
}

This is the implementable relationship (we can use box to align the types), but it probably won't occur in real life, because a class like E is unlikely to depend on a view like V.

On the other hand, I don't think we can assume that there is an unbox operation at all. In general, a class like E could very well have a formal parameter of type E, but it is again very unlikely to have a formal parameter of type F, because a class like E is most likely written by developers who do not know that V even exists (and even then it's weird to mention the implicitly generated boxing class of V in the interface of E). So we'd end up trying to unbox an instance of static type E (or some other type that the current view implements), and there is no particular reason to believe that said instance is actually obtained by boxing an expression of type V.

The conclusion is probably that we can establish support for "lifting" in this sense, but it is going to be a highly limited mechanism, and developers are likely to encounter the limitations. For example, there's nothing particularly surprising or unusual about having a getter with a return type like Set<C> get allNext, but this is already beyond reach for the proposed lifting mechanism.

So we're probably going to have to live with the fact that a boxed view will typically have an interface that returns some view types (or some types that contains view types, like List<V1> or void Function(V2)), or it accepts parameters whose type is a view type (or a type that contains it).

Alternatively, the view could be written to never mention view types in its member signatures, but in this case it is likely to need to perform a lot of boxing operations, which is most likely going to cause a performance hit.

@jodinathan
Copy link

is this planned?

@eernstg
Copy link
Member Author

eernstg commented Mar 16, 2023

This proposal has been changed many times since this issue was created. It is known as 'inline classes' today, and it is being implemented. Spec here. The release date has not been decided.

I'll close this issue because discussions today should be concerned with the actual feature and specification, using the current name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-types feature Proposed language feature that solves one or more problems inline-classes Cf. language/accepted/future-releases/inline-classes/feature-specification.md
Projects
None yet
Development

No branches or pull requests

6 participants