diff --git a/docs/design/expressions/indexing.md b/docs/design/expressions/indexing.md index d9f8fe8671049..c0327ecceed5d 100644 --- a/docs/design/expressions/indexing.md +++ b/docs/design/expressions/indexing.md @@ -13,8 +13,6 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Overview](#overview) - [Details](#details) - [Examples](#examples) -- [Open questions](#open-questions) - - [Tuple indexing](#tuple-indexing) - [Alternatives considered](#alternatives-considered) - [References](#references) @@ -133,15 +131,6 @@ class Span(T:! type) { } ``` -## Open questions - -### Tuple indexing - -It is not clear how tuple indexing will be modeled. When indexing a tuple, the -index value must be a constant, and the type of the expression can depend on -that value, but we don't yet have the tools to express those properties in a -Carbon API. - ## Alternatives considered - [Different subscripting syntaxes](/proposals/p2274.md#different-subscripting-syntaxes) diff --git a/docs/design/expressions/member_access.md b/docs/design/expressions/member_access.md index a55aab4fd8924..2c1044860015f 100644 --- a/docs/design/expressions/member_access.md +++ b/docs/design/expressions/member_access.md @@ -24,6 +24,7 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception - [Instance binding](#instance-binding) - [Non-instance members](#non-instance-members) - [Non-vacuous member access restriction](#non-vacuous-member-access-restriction) +- [Tuple indexing](#tuple-indexing) - [Precedence and associativity](#precedence-and-associativity) - [Alternatives considered](#alternatives-considered) - [References](#references) @@ -841,6 +842,56 @@ alias X3 = (i32 as Factory).Make; alias X4 = i32.((i32 as Factory).Make); ``` +## Tuple indexing + +A tuple indexing expression is of the form: + +- _expression_ `.` _integer-literal_ +- _expression_ `->` _integer-literal_ + +The _expression_ is required to be of tuple type. + +Each positional element of the tuple is considered to have a name that is the +corresponding decimal integer: `0`, `1`, and so on. The spelling of the +_integer-literal_ is required to exactly match one of those names, and the +result is the corresponding element of the tuple. + +``` +// ✅ `a == 42`. +let a: i32 = (41, 42, 43).1; +// ❌ Error: no tuple element named `0x1`. +let b: i32 = (1, 2, 3).0x1; +// ❌ Error: no tuple element named `2`. +let c: i32 = (1, 2).2; + +var t: (i32, i32, i32) = (1, 2, 3); +let p: (i32, i32, i32)* = &t; +// ✅ `m == 3`. +let m: i32 = p->2; +``` + +In a compound member access of the form: + +- _expression_ `.` `(` _expression_ `)` +- _expression_ `->` `(` _expression_ `)` + +in which the first _expression_ is a tuple and the second _expression_ is of +integer or integer literal type, the second _expression_ is required to be a +non-negative template constant that is less than the number of tuple elements, +and the result is the corresponding positional element of the tuple. + +``` +// ✅ `d == 43`. +let d: i32 = (41, 42, 43).(1 + 1); +// ✅ `e == 2`. +let template e:! i32 = (1, 2, 3).(0x1); +// ❌ Error: no tuple element with index 4. +let f: i32 = (1, 2).(2 * 2); + +// ✅ `n == 3`. +let n: i32 = p->(e); +``` + ## Precedence and associativity Member access expressions associate left-to-right: diff --git a/docs/design/lexical_conventions/README.md b/docs/design/lexical_conventions/README.md index 32d660b6c83c9..6bfac20c1d07b 100644 --- a/docs/design/lexical_conventions/README.md +++ b/docs/design/lexical_conventions/README.md @@ -33,4 +33,11 @@ A _lexical element_ is one of the following: - a [symbolic token](symbolic_tokens.md) The sequence of lexical elements is formed by repeatedly removing the longest -initial sequence of characters that forms a valid lexical element. +initial sequence of characters that forms a valid lexical element, with the +following exception: + +- When a numeric literal immediately follows a `.` or `->` token, with no + intervening whitespace, a real literal is never formed. Instead, the token + will end no later than the next `.` character. For example, `tuple.1.2` is + five tokens, `tuple` `.` `1` `.` `2`, not three tokens, `tuple` `.` `1.2`. + However, `tuple . 1.2` is lexed as three tokens. diff --git a/docs/design/lexical_conventions/numeric_literals.md b/docs/design/lexical_conventions/numeric_literals.md index f3279d42e9c4f..0932de751b67d 100644 --- a/docs/design/lexical_conventions/numeric_literals.md +++ b/docs/design/lexical_conventions/numeric_literals.md @@ -70,7 +70,12 @@ or C++ octal literal (other than `0`) is invalid in Carbon. Real numbers are written as a decimal or hexadecimal integer followed by a period (`.`) followed by a sequence of one or more decimal or hexadecimal digits, respectively. A digit is required on each side of the period. `0.` and -`.3` are both invalid. +`.3` are both lexed as two separate tokens: `0.(Util.Abs)()` and `tuple.3` both +treat the period as member or element access, not as a radix point. + +To support tuple indexing, a real number literal is never formed immediately +following a `.` token with no intervening whitespace. Instead, the result is an +integer literal. A real number can be followed by an exponent character, an optional `+` or `-` (defaulting to `+` if absent), and a character sequence matching the grammar of diff --git a/docs/design/tuples.md b/docs/design/tuples.md index a46b082774995..746896e982df1 100644 --- a/docs/design/tuples.md +++ b/docs/design/tuples.md @@ -10,30 +10,25 @@ SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception ## Table of contents -- [TODO](#todo) - [Overview](#overview) +- [Element access](#element-access) - [Empty tuples](#empty-tuples) - - [Indices as compile-time constants](#indices-as-compile-time-constants) + - [Trailing commas and single-element tuples](#trailing-commas-and-single-element-tuples) + - [Tuple of types and tuple types](#tuple-of-types-and-tuple-types) - [Operations performed field-wise](#operations-performed-field-wise) + - [Pattern matching](#pattern-matching) - [Open questions](#open-questions) + - [Tuple slicing](#tuple-slicing) - [Slicing ranges](#slicing-ranges) - - [Single-value tuples](#single-value-tuples) - - [Function pattern match](#function-pattern-match) - - [Type vs tuple of types](#type-vs-tuple-of-types) +- [Alternatives considered](#alternatives-considered) +- [References](#references) -## TODO - -This is a skeletal design, added to support [the overview](README.md). It should -not be treated as accepted by the core team; rather, it is a placeholder until -we have more time to examine this detail. Please feel welcome to rewrite and -update as appropriate. - ## Overview -The primary composite type involves simple aggregation of other types as a tuple -(called a "product type" in formal type theory): +The primary composite type involves simple aggregation of other types as a +tuple, called a "product type" in formal type theory: ``` fn DoubleBoth(x: i32, y: i32) -> (i32, i32) { @@ -49,39 +44,55 @@ and second elements are expressions referring to the `i32` type. The only difference is the type of these expressions. Both are tuples, but one is a tuple of types. -Element access uses subscript syntax: +## Element access + +Element access uses a syntax similar to field access, with an element index +instead of a field name: ``` -fn Bar(x: i32, y: i32) -> i32 { +fn Sum(x: i32, y: i32) -> i32 { var t: (i32, i32) = (x, y); - return t[0] + t[1]; + return t.0 + t.1; } ``` -Tuples also support multiple indices and slicing to restructure tuple elements: +A parenthesized template constant expression can also be used to index a tuple: ``` -fn Baz(x: i32, y: i32, z: i32) -> (i32, i32) { - var t1: (i32, i32, i32) = (x, y, z); - var t2: (i32, i32, i32) = t1[(2, 1, 0)]; - return t2[0 .. 2]; +fn Choose(template N:! i32) -> i32 { + return (1, 2, 3).(N % 3); } ``` -This code first reverses the tuple, and then extracts a slice using a half-open -range of indices. - ### Empty tuples `()` is the empty tuple. This is used in other parts of the design, such as -[functions](functions.md). +[functions](functions.md), where a type with a single value is needed. + +### Trailing commas and single-element tuples + +The final element in a tuple literal may be followed by a trailing comma, such +as `(1, 2,)`. This trailing comma is optional in tuples with two or more +elements, and mandatory in a tuple with a single element: `(x,)` is a one-tuple, +whereas `(x)` is a parenthesized single expression. -### Indices as compile-time constants +### Tuple of types and tuple types -In the example `t1[(2, 1, 0)]`, we will likely want to restrict these indices to -compile-time constants. Without that, run-time indexing would need to suddenly -switch to a variant-style return type to handle heterogeneous tuples. This would -both be surprising and complex for little or no value. +A tuple of types can be used in contexts where a type is needed. This is made +possible by a built-in implicit conversion: a tuple can be implicitly converted +to type `type` if all of its elements can be converted to type `type`, and the +result of the conversion is the corresponding tuple type. + +For example, `(i32, i32)` is a value of type `(type, type)`, which is not a type +but can be implicitly converted to a type. `(i32, i32) as type` can be used to +explicitly refer to the corresponding tuple type, which is the type of +expressions such as `(1 as i32, 2 as i32)`. However, this is rarely necessary, +as contexts requiring a type will implicitly convert their operand to a type: + +```carbon +// OK, both (i32, i32) values are implicitly converted to `type`. +fn F(x: (i32, i32)) -> (i32, i32); +``` ### Operations performed field-wise @@ -100,12 +111,39 @@ For binary operations, the two tuples must have the same number of components and the operation must be defined for the corresponding component types of the two tuples. -**References:** The rules for assignment, comparison, and implicit conversion -for argument passing were decided in -[question-for-leads issue #710](https://github.com/carbon-language/carbon-lang/issues/710). +### Pattern matching + +Tuple values can be matched using a +[tuple pattern](/docs/design/pattern_matching.md#tuple-patterns), which is +written as a tuple of element patterns: + +```carbon +let tup: (i32, i32, i32) = (1, 2, 3); +match (tup) { + case (a: i32, 2, var c: i32) => { + c = a; + return c + 1; + } +} +``` ## Open questions +### Tuple slicing + +Tuples could support multiple indices and slicing to restructure tuple elements: + +``` +fn Baz(x: i32, y: i32, z: i32) -> (i32, i32) { + var t1: (i32, i32, i32) = (x, y, z); + var t2: (i32, i32, i32) = t1.((2, 1, 0)); + return t2.(0 .. 2); +} +``` + +This code would first reverse the tuple, and then extract a slice using a +half-open range of indices. + ### Slicing ranges The intent of `0 .. 2` is to be syntax for forming a sequence of indices based @@ -125,28 +163,23 @@ answer here: - Do we want to require the `..` to be surrounded by whitespace to minimize that collision? -### Single-value tuples - -This remains an area of active investigation. There are serious problems with -all approaches here. Without the collapse of one-tuples to scalars we need to -distinguish between a parenthesized single expression (`(42)`) and a one-tuple -(in Python or Rust, `(42,)`), and if we distinguish them then we cannot model a -function call as simply a function name followed by a tuple of arguments; one of -`f(0)` and `f(0,)` becomes a special case. With the collapse, we either break -genericity by forbidding `(42)[0]` from working, or it isn't clear what it means -to access a nested tuple's first element from a parenthesized single expression: -`((1, 2))[0]`. - -### Function pattern match - -There are some interesting corner cases we need to expand on to fully and more -precisely talk about the exact semantic model of function calls and their -pattern match here, especially to handle variadic patterns and forwarding of -tuples as arguments. We are hoping for a purely type system answer here without -needing templates to be directly involved outside the type system as happens in -C++ variadics. - -### Type vs tuple of types - -Is `(i32, i32)` a type, a tuple of types, or is there even a difference between -the two? Is different syntax needed for these cases? +## Alternatives considered + +- [Indexing with square brackets](/proposals/p3646.md#square-bracket-notation) +- [Indexing from the end of a tuple](/proposals/p3646.md#negative-indexing-from-the-end-of-the-tuple) +- [Restrict indexes to decimal integers](/proposals/p3646.md#decimal-indexing-restriction) +- [Alternatives to trailing commas](/proposals/p3646.md#trailing-commas) + +## References + +- Proposal + [#2188: Pattern matching syntax and semantics](https://github.com/carbon-language/carbon-lang/pull/2188) +- Proposal + [#2360: Types are values of type `type`](https://github.com/carbon-language/carbon-lang/pull/2360) +- Proposal + [#3646: Tuples and tuple indexing](https://github.com/carbon-language/carbon-lang/pull/3646) +- Leads issue + [#710](https://github.com/carbon-language/carbon-lang/issues/710) + established rules for assignment, comparison, and implicit conversion +- Leads issue + [#2191: one-tuples and one-tuple syntax](https://github.com/carbon-language/carbon-lang/issues/2191) diff --git a/proposals/p3646.md b/proposals/p3646.md new file mode 100644 index 0000000000000..c0ac27b7f6e07 --- /dev/null +++ b/proposals/p3646.md @@ -0,0 +1,295 @@ +# Tuples and tuple indexing + + + +[Pull request](https://github.com/carbon-language/carbon-lang/pull/3646) + + + +## Table of contents + +- [Abstract](#abstract) +- [Problem](#problem) +- [Background](#background) +- [Proposal](#proposal) +- [Details](#details) + - [Lexing](#lexing) + - [Indexes as names](#indexes-as-names) + - [Precedence](#precedence) + - [Expression operand](#expression-operand) + - [Bounds](#bounds) + - [Tuple slicing](#tuple-slicing) +- [Rationale](#rationale) +- [Alternatives considered](#alternatives-considered) + - [Alternative lexing rule](#alternative-lexing-rule) + - [Decimal indexing restriction](#decimal-indexing-restriction) + - [Square bracket notation](#square-bracket-notation) + - [Negative indexing from the end of the tuple](#negative-indexing-from-the-end-of-the-tuple) + - [Trailing commas](#trailing-commas) + + + +## Abstract + +Add support for extracting elements of a tuple by their numerical index. + +Also formally add the well-established basic syntactic and semantic rules for +tuples, for which we have had leads issues but no proposal, into the design. + +## Problem + +Currently, the only way to access the elements of a tuple is through pattern +matching. While this handles many cases well, it is sometimes desirable to +access an element of a tuple more succinctly, especially in cases where only a +single element's value is needed. + +## Background + +In Python, tuple indexing is performed using square brackets: + +```python +tup = (1, 2, 3) +# Prints 2. +print(tup[1]) +``` + +In C++, `std::pair` is indexed using `.first` and `.second`, and `std::tuple` is +indexed using `std::get`. + +In Rust and Swift, a tuple is indexed using `.N`, where `N` is a decimal integer +literal. + +- Rust disallows digit separators and base prefixes in `N`, but allows certain + literal suffixes + [for historical reasons](https://github.com/rust-lang/rust/issues/60210). +- Swift disallows digit separators and base prefixes in `N`. `swiftc` allows + leading `0` digits, although this appears to be an unintentional consequence + of `llvm::StringRef::getAsInteger` allowing them. + +The current Carbon documentation suggests using `tuple[i]` for tuple indexing, +but this has not been the subject of an approved proposal. + +## Proposal + +Formally, we have not yet approved a proposal that says that Carbon has tuple +types, although we have approved several proposals that explicitly include +support for tuples. So, this proposal does that: tuples exist in Carbon, and are +product types with unnamed positional elements. + +This proposal also updates the design to match other decisions that have been +made in leads issues but not captured by a proposal, specifically: + +- Leads issue #2191 (one-tuples and one-tuple syntax), despite being focused + on one-tuples, established the syntax for tuples of all arities. +- Leads issue #710 established rules for assignment, comparison, and implicit + conversion of tuples. These operations are performed elementwise, with + relational comparisons being performed lexicographically. + +Finally, the main intent of this proposal is to add support for indexing tuples, +using the following syntaxes: + +- `.` _N_, where _N_ is an integer literal, and +- `.` `(` _expr_ `)`, where _expr_ is a template constant of integer type. + +For pointers to tuples, `->` _N_ and `->` `(` _expr_ `)` are also supported. + +## Details + +### Lexing + +Multi-level tuple indexing will result in constructs such as +`tuple_of_tuples.1.2`. It's important that these are lexed as two tuple indexing +operations, not as `tuple_of_tuples` `.` `1.2`, as it would be under the current +lexical rules, so a new rule is introduced: + +- When a `.` or `->` token is followed immediately by a digit, it is lexed as + a `.` or `->` token followed by an integer literal, never a real literal. + +Note that this results in lexing being slightly contextual: the rule to lex a +token after a `.` or `->` is different from the rule to lex a token in any other +context. However, there is an alternative equivalent formulation of the rule +that is not context-sensitive: that `.integer` is treated as a single lexeme +that produces two tokens, and likewise for `->integer`. + +### Indexes as names + +The elements of a tuple are treated as if they had decimal integers as their +names: `.0`, `.1`, and so on. It is an error to use a different spelling of that +integer in a simple member access, because that spelling would not match the +element name. For example, `(1, 2).0x0` is invalid, as is `large_tuple.1_2`. +These spellings can be used as an [expression operand](#expression-operand) as +described below: `(1, 2).(0x0)` and `large_tuple.(1_2)` are both valid. + +### Precedence + +The `.` _N_ syntax has the same precedence as postfix member access syntax, `.` +_name_, and can be combined in the same expression: `a.0.x.1` is valid. + +The `.` `(` _expr_ `)` syntax is not new in this proposal, and continues to have +the same precedence as `.` _name_. + +### Expression operand + +In the `.` `(` _expr_ `)` syntax, if the first operand is a tuple and the second +operand is a constant of any integer type, the result is the corresponding tuple +element, as if specified by a decimal integer literal. This rule is built into +the language; the `.` `(` ... `)` notion is not currently overloadable. + +### Bounds + +If the tuple index is not between 0 and one less than the number of elements in +the tuple, inclusive, the indexing is invalid. + +### Tuple slicing + +The current skeleton design suggests using `tuple[a .. b]` to slice tuples. For +example, `tuple[0 .. 2]` could be used to extract the first two elements of a +tuple. Tuple slicing support is not covered by this proposal, but could be added +in the future with syntax such as `tuple.(0 .. 2)`. However, note that there is +a risk that this syntax may lead to an incorrect theory about how Carbon works: +namely, that `tuple.__` gives an element whereas `tuple.(__)` gives a tuple. + +## Rationale + +Goals: + +- [Language tools and ecosystem](/docs/project/goals.md#language-tools-and-ecosystem) + - The lexing rule is relatively simple to implement. Tools such as syntax + highlighters can treat `.i` as a distinct kind of token rather than + implementing any kind of context-sensitive lexing. +- [Software and language evolution](/docs/project/goals.md#software-and-language-evolution) + - Consistent use of tuple field indexes can be used to support code that + adds new tuple elements over time. +- [Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write) + - This feature allows tuple access to be written more concisely than + pattern matching would allow. + - Lexing `.1.2` as four tokens rather than two avoids a surprise that + would make chained member access hard to write. + - For simple member access, requiring a decimal integer with no digit + separators allows the member access to be treated as an element name, + making the indexing easier to understand. +- [Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code) + - This feature provides a migration syntax for existing use of `.first`, + `.second`, and `std::get`. The permission to use expressions rather + than only literals supports migration of `std::get`. + +Principles: + +- [Low context sensitivity](/docs/project/principles/low_context_sensitivity.md). + - We look only at the character immediately before a numeric literal to + determine whether it is lexed as a tuple index that stops before the + next `.` or as a general numeric literal. + +## Alternatives considered + +### Alternative lexing rule + +We could lex `.0`, `.1`, ... as a single token rather than as separate `.` and +`0`, `1`, ... tokens. This would somewhat simplify the lexing rules, because +they would no longer be contextual. We choose to not do this because: + +- This would be inconsistent with our handling of `struct.fieldname`. +- Either `tuple . 0` would be invalid, unlike `struct.fieldname`, or it would + need to use a distinct grammar production from `tuple.0`. + +We could lex an integer literal when the previous token is `.`, regardless of +whether the literal follows the `.` immediately. For example, we could treat + +```carbon +let n: i32 = ((1, 2, 3), 4) . 0.1; +``` + +as tuple indexing, rather than as a tuple followed by a `.` and a real literal. +This is what Swift does. We choose to not do this because: + +- The `0.1` literal in this case looks like a real literal, not tuple + indexing, so this would likely cause surprise for readers. +- This would make the context-sensitive lexing be non-local. The chosen rule + can be interpreted as lexing `.[0-9]*` as a single lexeme, but forming two + tokens from it, whereas this alternative rule would be much more firmly a + context-sensitive lexing rule. + +We could get a similar result in other ways: + +- We could allowing a real literal after a `.`, and split it into a pair of + member accesses when needed. This is + [what `rustc` does](https://github.com/rust-lang/rust/pull/71322). +- We could lex a real literal as three tokens: an integer token, a `.` token, + and a suffix token, and merge them back together in the parser. This is + [what `intellij` does](https://github.com/intellij-rust/intellij-rust/commit/f82f6cd68567e574bf1e30f5e0d263ee15d1d36e) + when parsing Rust. + +Note that these approaches are not entirely equivalent to each other. In Rust, +for example, the difference is observable in proc macros. Also, using any kind +of token merging or splitting approach would result in the token stream not +matching the interpretation of the program, which is problematic for tooling. +For example, many common Rust syntax highlighters do not properly highlight +chained tuple indexing. + +### Decimal indexing restriction + +Carbon follows Rust and Swift in restricting tuple indexes to being decimal +integers: + +```carbon +// OK +let a: i32 = (1, 2, 3).0; + +// Error, invalid index for tuple element. +let b: i32 = (1, 2, 3).0x0; +``` + +This restriction introduces an inconsistency between `.0x0` and `.(0x0)`, and we +could easily drop it. However, the restriction allows us to consider `.0`, `.1`, +and so on to simply be the names of the tuple elements, analogous to struct +field names, and there isn't a clear utility for permitting a base prefix or a +digit separator in a tuple index. + +### Square bracket notation + +Instead of `tuple.0` and `tuple.(IndexConstant)`, we could use `tuple[0]` and +`tuple[IndexConstant]`. This would result in more consistent syntax for indexing +with a constant versus with an expression, but would make accessing an element +of a tuple less consistent with accessing an element of a struct. We expect +tuple access with a non-literal index to be a rare operation, so the consistency +with that syntax seems to have lower value. + +Also, the use of `.` notation aims to convey the intent of the developer better: +we intend `x[n]` notation to be used primarily for _homogenous_ indexing, +whereas `.` notation is used for _heterogenous_ access. This also reflects the +difference in phase: tuple indexing requires a constant index in the same way +that struct member access requires a constant name, whereas array or container +indexing would typically be expected to permit a runtime index. + +The `.N` notation can also be extended to perform member indexing into a struct +or class, at least the latter of which would not be reasonable to support with +`[]` notation. However, such support is not part of this proposal. + +Use of `[]` notation has the advantage of reducing visual ambiguity for cases +such as `O.0`, `l.0`, and `Z.0`, which might be visually confused with `0.0`, +`1.0`, and `2.0`, respectively. However, we're not aware of this being a problem +in practice in Rust or Swift, which use this notation, and the same problem +exists even without the `.0` suffix: `F(O, l, Z)` may resemble `F(0, 1, 2)`. + +### Negative indexing from the end of the tuple + +We could support `tuple.-1`, or perhaps `tuple.(-1)`, as a notation for "the +last element of the tuple", as used for example in Python. We choose not to +support this at this time because such notation can be confusing and has awkward +edge cases. An off-by-one error, or an attempt to access a one-past-the-start +element, will sometimes be accepted and silently do the wrong thing. + +If a future proposal introduces tuple slicing, it should revisit this question, +because this kind of indexing from the end is often desirable when forming a +slice. The possibility of using a different notation for this operation should +be considered, such as `tuple.(.size - 1)`. + +### Trailing commas + +Carbon permits optional trailing commas in tuples, with mandatory trailing +commas for one-tuples. Alternatives to this choice were considered in +[leads issue #2191](https://github.com/carbon-language/carbon-lang/issues/2191).