Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Records #205

Closed
wants to merge 13 commits into from
Closed

RFC: Records #205

wants to merge 13 commits into from

Conversation

zeux
Copy link
Collaborator

@zeux zeux commented Nov 16, 2021

@zeux zeux added the rfc Language change proposal label Nov 16, 2021
rfcs/records.md Show resolved Hide resolved
rfcs/records.md Outdated Show resolved Hide resolved
rfcs/records.md Outdated Show resolved Hide resolved
Small tweaks, added stub for generic records
Copy link
Contributor

@tiffany352 tiffany352 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some thoughts on this but I'm not sure if they're particularly valuable insights. I like the idea of adding records vs traditional classes though, and I'd love to start using these instead of the old metatable-based OOP pattern.

Might be worth noting in the RFC that that's effectively the current status quo. I think records will help a lot with readability.

Before:

local Vector2 = {}
Vector2.__index = Vector2 -- magic (and easy to forget, which results in hard to debug problems with your code)

function Vector2.new(x, y)
    return setmetatable({ x = x, y = y }, Vector2) -- more magic
end

function Vector2.__add(left, right)
    return Vector2.new(left.x + right.x, left.y + right.y)
end

return Vector2

After:

record Vector2(x, y)

function Vector2.__add(left, right)
    return Vector2(left.x + right.x, left.y + right.y)
end

return Vector2

rfcs/records.md Outdated Show resolved Hide resolved
rfcs/records.md Outdated Show resolved Hide resolved
rfcs/records.md Outdated Show resolved Hide resolved
@zeux
Copy link
Collaborator Author

zeux commented Nov 17, 2021

Thinking through syntactic options more it occurred to me that the new operator isn't backwards compatible syntactically. It uses the "two identifiers in a row" rule that we've used successfully in statement context, but it doesn't actually work in expression context as this code is valid today, with or without the newline:

local X = new
Y { a = 42 }

The Point { x = x, y = y } syntax without the new operator isn't too difficult to make efficient either. The codegen today is something like

NEWTABLE
SETTABLEKS
SETTABLEKS
GETLOCAL
CALL

To allow for records, the codegen would be something like

NEWRECORD
SETTABLEKS
SETTABLEKS
FINRECORD
CALL

NEWRECORD will create the record if the source local is a record shape, and create a table otherwise; FINRECORD will skip the next call instruction if the object is a record, and do nothing otherwise. This should preserve the DSL like syntax dynamically, I think.

The variant with unnamed arguments is still easier to codegen / make efficient, it requires no new opcodes and is just a small tweak in the CALL dispatch which should be almost entirely free as we already have a path for checking if the called object is a table (which handles __call).

I'm going to edit the RFC to remove two of the four variants and more explicitly outline the tradeoffs between those.

Copy link
Collaborator

@asajeffrey asajeffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions plus lots of brain-dump.

rfcs/records.md Show resolved Hide resolved
rfcs/records.md Show resolved Hide resolved
rfcs/records.md Show resolved Hide resolved
rfcs/records.md Outdated Show resolved Hide resolved
rfcs/records.md Show resolved Hide resolved
rfcs/records.md Outdated Show resolved Hide resolved
rfcs/records.md Show resolved Hide resolved
rfcs/records.md Show resolved Hide resolved
rfcs/records.md Outdated Show resolved Hide resolved
rfcs/records.md Show resolved Hide resolved
Co-authored-by: Alan Jeffrey <403333+asajeffrey@users.noreply.github.com>
@zeux
Copy link
Collaborator Author

zeux commented Nov 18, 2021

I've added lots of edits with better examples and cleaned up syntax and wording in a few places, as well as specifying two options for record subtyping (that roughly boil down to, do we treat records as TableTypeVar or ClassTypeVar on type level?). I've also resolved all comment chains to clean up the discussion threads as it was getting difficult to manage.

Invoking methods with `:` desugars into `getmetatable(obj).method(obj, args)` instead of the usual `obj.method(obj, args)`. This is important because it allows to
keep the method calls as efficient as possible, as they don't need to check whether the record has a given method as a field.

> TODO: Should we use `__namecall` instead of raw MT access? It seems more consistent, but at the same time `__namecall` today expects a function so it might be best
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making __namecall an actual metamethod would be nice for other datatypes too. I do however have two concerns:

  • There is no way to obtain a reference to a method accessed through __namecall. Without this they always need wrapping when used with pcall or coroutine.create and can never be called using member[<var>](member, ...)
  • __namecall doesn't really reflect the operation, would __membercall be more accurate?

The first is the most important to resolve in my opinion. If we can do that then I think it also makes sense to use __namecall for records.

present in the table, the error is raised. This is in constrast with tables where `nil` is returned for unknown keys upon read; records are meant to be stricted than
tables and as such returning `nil` will mask valuable errors, and make it more difficult to be strict about the types of the result.

The field lookup does not use `__index` or `__newindex` metamethods (or the metatable in general).

This comment was marked as resolved.

end
```

Note that `Point` is simply a table and as such it can be used to store static methods as well; as it also serves as a metatable, metafields defined on this table
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice if __namecall is used as it will let you access member methods by indexing on the definition table.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that in the proposal as it exists, you can use pcall et al by doing pcall(getmetatable(r).Method, r, args)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming __metatable can't be overridden then this seems fine to me.

Comment on lines +137 to +147
Syntax A:

```lua
local person = Person { name = "Bob", age = 42 }
```

Syntax B:

```lua
local person = Person("Bob", 42)
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if __call is set on the metatable of Person?

Comment on lines +154 to +156
In variant A, it would make sense to allow omission of any field, in which case it gets replaced with the default of `nil`. A future extension (not part of this RFC) could be made to
allow specification of default values at record definition time. Type checker would fail to type check record construction if fields that have non-optional types
have the values omitted.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To create a record with all fields omitted, would I write Person {}?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah.

allow specification of default values at record definition time. Type checker would fail to type check record construction if fields that have non-optional types
have the values omitted.

In variant B, it would probably make sense to require exact number of values to be specified, or follow the usual function call syntax rules.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we ever supported named parameters? If so, I like this syntax more

rfcs/records.md Outdated Show resolved Hide resolved
Co-authored-by: vegorov-rbx <75688451+vegorov-rbx@users.noreply.github.com>

Records don't provide a facility for implementation inheritance: adding fields or methods to a record requires defining a new record. This is something that is possible to implement in the future, by extending the syntax to be able to
provide the parent record when defining a new record shape, and requiring all fields to be specified. However, doing so is not only outside of the scope of this RFC, but also the author would like to note that implementation inheritance
is often considered an anti-pattern and composition or interface inheritance are preferred instead.
Copy link

@KrissStyle KrissStyle Nov 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to have record ntersections

record A = { a: number }

record B = { b: number }

record C = A & B -- { a: number, b: number }

This is because tables have a large header, the hash portion is rounded to a power of two, and every entry has key and value despite the fixed object structure.
2. Type system compatibility. While Luau type checker can type tables, including ones with rigid structure, it falls short in OOP scenarios because it's very difficult
to associate methods with table structure in idiomatic OOP in Luau with tables. Today the situation is especially dire because each method gets its own inferred self type
(something that is likely to change), and it's impossible to specify a table-with-metatable type via obvious type syntax.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate subject from records entirely, but the lack of metatable syntax is something we ought to fix, and fairly trivial. Maybe we should just add this in?

is beautifully concise, but is a bit more difficult to migrate away from tables, and it's easy to mix up the names. Variant A is more verbose and requires a bit
more magic at compile time to keep construction efficient.

In variant A, it would make sense to allow omission of any field, in which case it gets replaced with the default of `nil`. A future extension (not part of this RFC) could be made to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if I pass excessive amount of properties? Are they just discarded entirely? Or is it an error?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is equivalent to assigning a field with a non-existent name and as such it'd be an error. I'll clarify that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. With width subtyping incoming, we no longer have a way to say that A must have the exact same set of properties as B. @asajeffrey you're probably a few steps ahead of me here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsealed tables are precise. If #276 lands, table literals will be unsealed, and so precise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That assumes Record { x = x } is the only way to build it, which would prevent you from being able to map some array of tables and apply the record constructor on them. It also means that f { x = x } is not going to infer Callable<<a, b...>({x: a}) -> b...>. It has to be something else because if you can't write Record(t), then passing Record as f is an error if it's in the form f(t).

I think the ability to map them is important. One use case might be that you have some data source (for example, parsing some JSON document into a table) and you want to transform this document into an optimized data structure. If you can't apply records like a function call, you have to write things like:

local function into_city(city: { name: string, population: number}): City
  return City { name = city.name, population = city.population }
end

and more for every unique data structure in your JSON document, versus map(cities, City) or City(city).

Syntax A:

```lua
local person = Person { name = "Bob", age = 42 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the following also work just because of how the API is designed?

record Person = { name: string, age: number }

local function mkPerson(name: string, age: number)
  return { name = name, age = age }
end

local person = Person(mkPerson("Steve", 37))

If so, how is it done? Is the table retained or shallow copied during the construction of Person? It's an important thing to call out because of alias analysis.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's optionality here - we could go either way.

One option is that Person { fields } is special magic syntax (that's backwards compatible with function call), and Person isn't callable otherwise.
Another option is that Person { fields } semantically behaves just as it does today, and the fields are copied out of the source table into the record - and compiler does some magic to optimize this to avoid a temporary table object.

In either case you'd get a fresh object.

Copy link
Contributor

@alexmccord alexmccord Dec 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, a temporary borrow.

I think the latter option is going to be better. It's going to be able to cooperate with the rest of the existing ecosystem. If map applies the function f to each element in the array a in the form f(v), and we have some record City, then calling map(cities, City) is an error. It doesn't quite make much sense for a feature to be partially exclusive to other language features. This also decreases the complexity of type inference around f { x = x } too.

Records defined via a `record` statement can be used in type annotations.

> TODO: How do you export a record type? `export record` would be straightforward but potentially conflicts with future export statements for functions/values.
> Alternatively, is `export type Record = Record` too awkward?
Copy link
Contributor

@alexmccord alexmccord Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too awkward, painful, and error prone. If your records are generic, you will have to explicitly spell out the type parameters and then correctly pass these parameters into the type arguments, which is an opportunity for them to be out of order by accident.

While plans around bounded quantification is not being formed at this time, it's possible that you need to duplicate your generic bounds twice, once in your record definition and once again for the same corresponding type parameter in the exported type alias. That's if we decide due to some cases that Luau shouldn't infer generic bounds for explicit generics.

Then there's the matter of scoping, type Foo = Foo will circularly reference itself, not the pre-existing type definition.

There's two ways I can see out of this:

  1. Go with export record syntax. It does leave a question of what to do about the record value. We don't have precedent yet about exporting values. Perhaps we should define that first in a separate RFC and have that RFC be a prerequisite for this?
  2. Come up with a re-export syntax dedicated for type definitions for now, which will partially solve this case without needing to figure out runtime semantics for export.

@zeux zeux mentioned this pull request Feb 9, 2022
@alexmccord
Copy link
Contributor

Posting this here so it doesn't get lost. Consider whether we want to support keys that aren't normally valid identifiers, e.g. record MyRec = { ["$$typeof"]: "MyRec" }.

@data-man
Copy link

data-man commented May 8, 2022

Why the name record?
Why not struct or tuple?

@BenMactavsin
Copy link
Contributor

Why the name record?
Why not struct or tuple?

It doesn't make sense to name it tuple since tuples are just immutable lists in practice.
As for struct, I don't know. I guess they really wanted to go with the record syntax.

@data-man
Copy link

data-man commented May 8, 2022

It doesn't make sense to name it tuple since tuples are just immutable lists in practice.

Usually != always. :)

How to use tuples with named fields in C# by example
Namedtuple in Python

@zeux
Copy link
Collaborator Author

zeux commented May 9, 2022

Tuple usually refers to a fixed length collection of values that doesn't carry names.
Struct can carry the connotation that would imply the object has value type semantics, which is not the case here.

Hence we chose the name record as the one that's least likely to give people the wrong idea about the new construct behavior.

Syntax A:

```lua
local person = Person { name = "Bob", age = 42 }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a potential drawback here from an external tooling perspective.
There is no way to statically know whether this is a record constructor, or a function call passing 1 table literal as the first argument.

This becomes problematic for tools such as stylua. It would attempt to put parentheses around the call, as it wouldn't be able to differentiate the two. To be able to tell the difference, you would need to now roll your own analysis/inference. If e.g. selene wanted to add lints specific to records, it could not longer do so.

A potential solution here would be to add syntax different to table literals, such as using colons instead of equals. But from a user perspective, this might be an annoying confusion...

Copy link
Contributor

@Kampfkarren Kampfkarren Jul 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A suggestion I put for this in OSS is:

Suggested change
local person = Person { name = "Bob", age = 42 }
local person = Person { name: "Bob", age: 42 }

Along with being able to be parsed especially, this would have the added benefit of being able to force key names to be defined statically, rather than letting:

local person = Person { [weirdThing()] = "Bob" }

...pass through until it gets linted later (assuming they need to be declared statically right now, that is)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly above, forcing key names to be static can most likely simplify type checking too - no longer need to worry about ensuring weirdThing() has to return keyof Person etc. And prevention of duplicate keys being specified.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the key syntax is used to differentiate records from function calls, all records need to have at least one field in the call. Empty constructor Person {} would still be ambiguous. If default values are allowed at some point, this might be rather common

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Person(existingTable) would also have to be disallowed for external tools to have a chance on the special : syntax -- all keys must always be explicit, which is not very lua-like and would be too strict in my opinion

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to reply here but we definitely aren't going to overload : for record construction - it needs to be = for consistency. That said, there might still be alternative syntax options to more clearly mark the record construction as such.

@zeux
Copy link
Collaborator Author

zeux commented Oct 30, 2023

This PR is closed as part of RFC migration; please see #1074 (comment) for context.

Additional notes: I still think this represents a significant need, but since this RFC has been submitted there were new developments that require somehow revising this RFC and I'm not sure how to do that quite yet. Specifically,

  • We would need to equalize this design with new design for shared self (which hasn't been merged or finalized, but we know we want to do that). It might mean that we should find a better way to split type handling so that records and tables with shared self are more similar
  • My initial designs for record implementation required dynamic bytecode patching so that records can be addressed as tables with no performance penalty. This still works, but it predates native codegen; in native codegen we currently don't have a great way to implement records unless we know, at codegen time, that a given variable is a record. This knowledge requires cross-module type analysis to be general, so we need to figure out if we can somehow change our native codegen approach. It would be bad if records are somehow slower than tables in native codegen because of extra conditional checks and fallbacks.

@zeux zeux closed this Oct 30, 2023
@zeux zeux deleted the zeux-records branch October 30, 2023 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rfc Language change proposal
Development

Successfully merging this pull request may close these issues.