-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
classes with structural identity #2246
Comments
The concept exists, and we keep coming up with types which deserve the treatment, so it's definitely worthy of consideration. Having deep structural identity should preclude cycles, which likely means requiring the object to be immutable. Requiring no super-class or super-interface prevents an object with structural identity from flowing into a context where we need to check first before doing the identical check (other than It's not sufficient because it still doesn't help with It's not realistic because we can't stop I don't think we need to seal the type (no classes implementing it, not classes extending it, no classes mixing it in - which might be possible given the The classes do not need to have The next request we'll get is to let those objects have automatic deep equality and |
@mraleph, I think we need a primitive notion of value classes: They would be special in that instances do not have identity (so a compiler can copy them freely, allocate them in registers, etc), and the standard notion of So they would require immutability, and they'd need to have a known size given their type, so value classes can't have subtypes of any kind (other than We might give We could then use this very low-level Would that address the concerns you had in mind? |
I think we can choose to package things like immutability and structural identity together, but strictly speaking it is not required. Things like immutability, excluding cycles and not having subtypes are required for reliable unboxing, but not a not related to how we define I also want to highlight that naive unboxing of arbitrary large structural types is likely going to be detrimental to performance.
My suggestion does not come out of nowhere - it is based on the experience with optimising
That already does not work for |
@mraleph wrote (about preserving the low-level semantics of
I don't think this deviation from the "same object, physically" semantics would interfere with any of the applications that we've discussed here:
The point is that a structural equality operation will embody a policy, not just a mechanism. That is, the implementation will reflect semantic decisions that aren't unquestionable. So it should not be language defined, it should be customizable. Hence the proposal to do it using macros to introduce an implementation of We will need the low-level semantics ("same object, physically") in order to be able write an implementation of those structural equality operations with good performance. Of course, we could just introduce yet another built-in function (
I don't see a problem in letting It's actually the very same situation as that of a class that defines About the JS records/tuples: If we translate |
But You can optimize |
As a side note, objects with structural identity should probably not work with If we do deep identity (an alternative is "no idenitity", always say |
I don't see introducing "value types" (whatever that covers) as a necessity to introduce structurally identical(/unboxable) types. The proposal here says "deep structural identity". That's one choice which allows unboxing. That still prevents you from recognizing that an object changes memory location and representation over time. Deep identity does so by ignoring location and only looking at field value (recursive) identity. Either can work with value types too. Value types would likely get structural equality, so they may not need identity. |
The kind of type that I'm proposing is intended to allow copying to take place freely. That's the point, that's the primitive semantics that we can't express in the current language, so we need a new language feature in order to do it. Here's the reason why that gives rise to "values": Given that such objects have no conceptual identity (because a compiler/runtime is allowed to break the physical identity anytime it wants, so any assumption about a conceptual identity would be misguided), I consider shallow immutability to be a derived requirement. It makes no sense to allow such objects to have mutable state. Deep immutability is not a requirement, there's no problem (conceptually or technically) in considering a reference to a (potentially) mutable object as a value (so the reference itself is a value, but the referenced object may have identity, i.e., it may be a non-value). Deep immutability could certainly be useful in some cases (e.g., "passing" immutable object structures from one isolate to another by sharing a reference), but that's easy to express as an extra constraint that any given value type does or does not satisfy. In any case, I'm proposing a low-level mechanism, because we need to express the basic primitives as simple and clean as possible, and then we can combine the low-level mechanisms to form a bunch of different policies. Those policies would include a definition of The point I made was that we need primitives. If we change |
Shallow immutability (unmodifiability) is definitely a requirement for structural identity, and deep is not. If we choose deep structural identity (stopping at references to objects without structural identity, and doing reference identity on those), then it will allow unboxing, reboxing and inlining of the structural-identity based types. I'm not sure I see the leap to also enforcing structural bool operator ==(Object other) => other is MyOwnType && field1 == other.field1 && field2 == other.field2;
int get hashCode => Object.hash(field1, field2); Providing an automatic That does not hold for records, because they are structural types, so you can't declare methods on them, and we'd give them deep structural equality for free.
I do not get the point. If we make (And again, the goal of allowing implicit unboxing can also be served by not having identity at all. It's a little more dangerous, because the language currently do not have an value where |
Right, which is exactly the reason why I characterize that as 'policy' and recommend that we handle it using user-chosen code (possibly generated by a macro, but maintaining that other choices are still available).
(I don't see the connection, the interface of a structural type can certainly include signatures of methods, and it can be enforced that every actual entity with that type has an implementation. I tend to think that the syntax of records makes it natural to say "they can't have methods", and this also helps simplifying the subtype graph which is already pretty large when we have both positional and named components, and depth subtyping.)
Perhaps we'd want the "semi-deep" equality here, too, stopping at references where the referenced object isn't statically known to be a value? It seems error-prone if a record is unequal to itself, just because some tiny part of the program state was mutated in the graph of objects reachable from that record. int _i = 0;
class C {
int get i => i++;
}
void main() {
var r = (1, C());
print(r == r); // 'false'.
}
That could be your goal, but I'd like to support performant implementations of multiple policies based on simple, fast, low-level mechanisms. And I don't think delineation of objects graphs ("where to stop") is sufficiently simple to be a mechanism. Here's another example of a situation where we'd presumably want a well-defined primitive, rather than a higher-level policy that we can't change: We could obtain a reified representation of the run-time type of an object by a primitive, say, a static method on The problem is that if we don't provide access to the primitive then we may be unable to ensure that we get the right policy. E.g., some classes could override My take on primitives is that they are important, and they should be well-defined and available, even in the case where they are only used in one or two locations in the code. That's the reason why I recommend that we do not change |
When I say "structural type", I expect that two separate declarations with the same structure (whatever that means) defines the same type. Because of that, user-added members on structural type declarations do not make sense to me. Well, not unless they effectively work like extension methods, and apply to anything of that structural type, and that still has to be scoped somehow (just like extension members, which is why extension members can apply to structural types like function types).
My take is that I'm fine with introducing values without identity into the language, ones where |
I think the terms aren't very well standardized, so we might be able to define 'structural type' to mean anything we want. Parenthesis-begin. As an aside, my definition of a structural type was based on having a structural (rather than nominal) subtype relation. This would make class Point2d {
final int x, y;
Point2d(this.x, this.y);
}
class Point3d {
final int x, y, z;
Point3d(this.x, this.y, this.z);
} Nothing prevents the addition of methods to those classes, and we can still check whether their interfaces satisfy the superset-and-override relationship (just like the tests that we would perform if Function types are structural according to this perspective because they have a subtype relationship which is computed based on their parts (return type, parameter types, names of named parameters), not based on an explicitly declared subtype relationship. The fact that function types use a syntax where there is no space for method declarations is just a matter of syntax, we could allow for methods on those as well. Two structural types are then the same type if and only if they are mutual subtypes. (Note that this implies that the implementation isn't part of the type, because nothing requires those mutual subtypes to have the same implementation of any given member. We could allow nominal types to declare that they are equal to each other, too, and get the same kind of implementation abstraction, but we usually get that by using a subtype of some "public" interface.) Parenthesis-end.
That's a beautiful and principled rule. However, I think it's naive to assume that properties that potentially involve the entire set of objects reachable from a given object are going to be unquestionably defined in one particular way: That kind of feature is a policy, not a mechanism. So we should not bake it into the language. As an example, if we have two values So we need to be able to implement "sameness" properties in user-written code. This can only be done with good performance if we have access to the primitive which allows for a fast path based on being the same representation in memory, and
There is nothing new in having multiple distinct objects representing the same conceptual entity. For instance, a list containing a sequence of objects So the fact that values would leak exactly the same kind of information (that is: these two values have distinct representations in memory) is not new, and won't go away. So we may as well allow for user-specified "sameness" with good performance based on |
Can we make the type system prevent these uses? That would be nicer than a runtime check, or worse, undefined behaviour.
'not anything smart' does not apply to to the web. Currently, we go to great lengths to avoid any kind of hash code on Strings because it has to be recomputed every time - there is no reasonable way to cache it. One silver lining is that now we no longer support IE11, we can lean on ES6 Maps, which pretty much do exactly what we want for |
We can't rely on a proposal, even when accepted, for several years. We only recently removed support for IE11. A major customer requires compatibility with Safari 12. We would need a compelling intermediate strategy. |
@rakudrama responded to @lrhn:
Surely only in the case where the code that creates the connection is typed sufficiently tightly, so we can't rely on that. If an But isn't this essentially the same problem as using an import 'dart:math';
// This class can't be a value class, because it caches the result of a computation.
class IntBox {
final int x;
IntBox(this.x);
late double squareRoot = sqrt(x); // Assume that `sqrt` is really expensive, so we cache it.
@override
operator ==(other) => other is IntBox ? x == other.x : false;
@override
get hashCode => x.hashCode;
}
void main() {
var map = Map<IntBox, int>.identity();
var b1 = IntBox(0), b2 = IntBox(0);
map[b1] = 1;
map[b2] = 2;
// ...
} Except for the fact that this class uses a cache to avoid repeating an expensive computation, this class is a perfect candidate to be a value class. So we probably want to allow it to behave very much like a value class, and in particular we should be able to create a new object with the same state as an existing one, and they should be treated as "the same thing" even though they are two distinct physical representations. For such objects, it's a bug to use the physical identity, they should be tested for "sameness" using So we have the same kind of problem with value objects and with these objects where "sameness" has been specified using It is worse in one way, though: With the regular class the developer could "simply" canonicalize each conceptual object manually, such that In both cases the cure is to stop using that kind of object with an An identity map would give rise to very similar considerations, and so does
|
I would like to suggest splitting changes to
identity()
rules from the records proposal into a separate proposal and allow defining classes with deep structural identity.My proposal would be as follows: any class can be marked as having structural identity. For the purposes if this proposal we can think that this can be done by annotating a class with
@pragma('value-type')
, though a more user-friendly syntax can also be considered.If
identical(x, y)
is called withx
andy
being instances of the same classC
and this class uses structural identity thenidentical(x, y)
is defined recursively asMotivation
I would like to make this change to:
Current state
Currently
int
anddouble
values have no discernible identity in Dart (neither natively or when compiled to JS).String
has distinct and preserved identity in native Dart, but no discernible identity when compiled to JS.Dart VM does not preserve identity of SIMD values (in violation of the spec), but dart2js does.
Implementation Concerns
There is a concern that introducing special treatment into
identical
would degrade its performance too much. Potential mitigation could be to restrict which classes can be marked as having structural identity, e.g. requiring such classes toextend Object
and not have any superinterfaces would allow to optimise allidentical(x,y)
invocations where eitherx
ory
are known to have a non-Object
, non-structural-identity static type.Capitalising on unboxing opportunities might require some from of derived pointers, consider
/cc @leafpetersen @lrhn @munificent
The text was updated successfully, but these errors were encountered: