diff --git a/proposals/README.md b/proposals/README.md index f55b1238f335b..1cdb103584cef 100644 --- a/proposals/README.md +++ b/proposals/README.md @@ -48,6 +48,7 @@ request: - [0301 - Principle: Errors are values](p0301.md) - [0339 - `var` statement](p0339.md) - [0340 - while loops](p0340.md) +- [0353 - `for` loops](p0353.md) - [0415 - Syntax: `return`](p0415.md) - [0426 - Governance & evolution revamp](p0426.md) - [0444 - GitHub Discussions](p0444.md) diff --git a/proposals/p0353.md b/proposals/p0353.md new file mode 100644 index 0000000000000..5b9e86cf77a7c --- /dev/null +++ b/proposals/p0353.md @@ -0,0 +1,357 @@ +# `for` loops + + + +[Pull request](https://github.com/carbon-language/carbon-lang/pull/353) + + + +## Table of contents + +- [Problem](#problem) +- [Background](#background) + - [C++](#c) + - [Java](#java) + - [TypeScript and JavaScript](#typescript-and-javascript) + - [Python, Swift, and Rust](#python-swift-and-rust) + - [Go](#go) +- [Proposal](#proposal) +- [Details](#details) + - [Range inputs](#range-inputs) + - [Executable semantics form](#executable-semantics-form) +- [Caveats](#caveats) + - [C++ as baseline](#c-as-baseline) + - [Semisemi support](#semisemi-support) + - [Range literals](#range-literals) + - [Enumerating containers](#enumerating-containers) +- [Rationale based on Carbon's goals](#rationale-based-on-carbons-goals) +- [Alternatives considered](#alternatives-considered) + - [Include semisemi `for` loops](#include-semisemi-for-loops) + - [Writing `in` instead of `:`](#writing-in-instead-of-) + - [Multi-variable bindings](#multi-variable-bindings) + + + +## Problem + +Control flow is documented at +[language overview](/docs/design/README.md#control-flow). `for` loops are common +in C++, and Carbon should consider providing some form of it. + +## Background + +### C++ + +There are two forms of `for` loops in C++: + +- **Semisemi** (semicolon, semicolon): `for (int i = 0; i < list.size(); ++i)` +- **Range-based**: `for (auto x : list)` + +Semisemi `for` loops have been around for a long time, and are in C. Range-based +`for` loops were added in C++11. + +For example, here is a basic semisemi: + +```cc +for (int i = 0; i < list.size(); ++i) { + printf("List at %d: %s\n", i, list[i].name); +} +``` + +An equivalent semisemi using iterators and the comma operator may look like: + +```cc +int i = 0; +for (auto it = list.begin(); it != list.end(); ++it, ++i) { + printf("List at %d: %s\n", i, it->name); +} +``` + +Range-based syntax can be simpler, but can also make it more difficult if there +are multiple pieces of interesting information: + +```cc +int i = 0; +for (const auto& x : list) { + printf("List at %d: %s\n", i, x.name); + ++i; +} +``` + +### Java + +Java provides equivalent syntax to C++. Although Java doesn't have a comma +operator, it does provide for comma-separated statements in the first and third +sections of semisemi for loops. + +### TypeScript and JavaScript + +Both TypeScript and JavaScript offer three kinds of for loops: + +- Semisemi, mirroring C++. +- `for (x of list)`, mirroring range-based for loops. +- `for (x in list)`, returning indices. + +For example, here is an `in` loop: + +```javascript +for (i in list) { + console.log('List at ' + i + ': ' + list[i].name); +} +``` + +### Python, Swift, and Rust + +Python, Swift, and Rust all only support range-based for loops, using +`for x in list` syntax. + +### Go + +Go uses `for` as its primary looping construct. It has: + +- Semisemi, mirroring C++. +- `for i < list.size()` condition-only loops, mirroring C++ `while` loops. +- `for {` infinite loops. + +## Proposal + +Carbon should adopt C++-style range-based `for` loops syntax. Semisemi `for` +loops should be addressed through a different mechanism. + +Related keywords are: + +- `for` +- `continue`: continues with the next loop iteration. +- `break`: breaks out of the loop. + +## Details + +For loop syntax looks like: `for (` `var` _type_ _variable_ `:` _expression_ +`) {` _statements_ `}` + +Similar to the +[if/else proposal](https://github.com/carbon-language/carbon-lang/pull/285), the +braces are optional and must be paired (`{ ... }`) if present. When there are no +braces, only one statement is allowed. + +`continue` will continue with the next loop iteration directly, skipping any +other statements in the loop body. + +`break` exits the loop immediately. + +All of this is consistent with C/C++ behavior. + +### Range inputs + +The syntax for inputs is not being defined in this proposal. However, we can +still establish critical things to support: + +- Interoperable C++ objects that work with C++'s range-based `for` loops, such + as containers with iterators. +- Carbon arrays and other containers. +- Range literals. These are not proposed, but for an example seen in other + languages, `0..2` may indicate the set of integers [0, 2). + +### Executable semantics form + +```bison +%token FOR + +statement: + FOR "(" pattern ":" expression ")" statement +| /* pre-existing statements elided */ +; +``` + +The `continue` and `break` statements are intended to be added as part of the +[while proposal](https://github.com/carbon-language/carbon-lang/pull/340). + +## Caveats + +### C++ as baseline + +This baseline syntax is based on C++, following the migration sub-goal +[Familiarity for experienced C++ developers with a gentle learning curve](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code). +To the extent that this proposal anchors on a particular approach, it aims to +anchor on C++'s existing syntax, consistent with that sub-goal. + +Alternatives will generally reflect breaking consistency with C++ syntax. While +most proposals may consider alternatives more, this proposal suggests a +threshold of only accepting alternatives that skew from C++ syntax if they are +clearly better; the priority in this proposal is to _avoid debate_ and produce a +trivial proposal. Where an alternative would trigger debate, it should be +examined by an advocate in a separate proposal. + +### Semisemi support + +Carbon will not provide semisemi support. This decision will be contingent upon +a better alternative loop structure which is not currently provided by `while` +or `for` syntax. If Carbon doesn't evolve a better solution, semisemi support +will be added later. + +For details, see [the alternative](#include-semisemi-for-loops). + +### Range literals + +Range literals are important to the ergonomics of range-based `for` loops, and +should be added. However, they should be examined separately as part of limiting +the scope of this proposal. + +### Enumerating containers + +Several languages have the concept of providing an index with the object in a +range-based for loop: + +- Python does `for i, item in enumerate(items)`, with a global function. +- Go does `for i, item := range items`, with a keyword. +- Swift does `for (i, item) in items.enumerated()`, having removed a + `enumerate()` global function. +- Rust does `for (i, item) in items.enumerate()`. + +An equivalent pattern for Carbon should be examined separately as part of +limiting the scope of this proposal. + +## Rationale based on Carbon's goals + +Relevant goals are: + +- [3. Code that is easy to read, understand, and write](/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write): + + - Range-based `for` loops are easy to read and very helpful. + - Semisemi `for` syntax is complex and can be error prone for cases where + range-based loops work. Avoiding it, even by providing equivalent syntax + with a different loop structure, should discourage its use and direct + engineers towards better options. The alternative syntax should also be + easier to understand than semisemi syntax, otherwise we should just keep + semisemi syntax. + +- [7. Interoperability with and migration from existing C++ code](/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code): + + - Keeping syntax close to C++ will make it easier for developers to + transition. + +## Alternatives considered + +Both alternatives from the +[`if`/`else` proposal](https://github.com/carbon-language/carbon-lang/pull/285) +apply to `while` as well: we could remove parentheses, require braces, or both. +The conclusions mirror here in order to avoid a divergence in syntax. + +Additional alternatives follow. + +### Include semisemi `for` loops + +We could include semisemi for loops for greater consistency with C++. + +This is in part important because switching from a semisemi `for` loop to a +`while` loop is not always straightforward due to how `for` evaluates the third +section of the semisemi. The inter-loop evaluation of the third section is +important given how it interacts with `continue`. In particular, consider the +loops: + +```cc +for (int i = 0; i < 3; ++i) { + if (i == 1) continue; + printf("%d\n", i); +} + +int j = 0; +while (j < 3) { + if (j == 1) continue; + printf("%d\n", j); + ++j; +} + +int k = 0; +while (k < 3) { + ++k; + if (k == 1) continue; + printf("%d\n", k); +} + +int l = 0; +while (l < 3) { + if (l == 1) { + ++l; + continue; + } + printf("%d\n", l); + ++l; +} +``` + +To explain the differences between these loops: + +- The first loop will print 0 and 2. +- The second loop will print 0, then loop infinitely because the increment is + never reached. +- The third loop will only print 2 because the increment happens too early. +- Only the fourth loop is equivalent to the first loop, and it duplicates the + increment. + +There is no easy place to put the increment in a `while` loop. + +Advantages: + +- We need a plan for + [migrating both developers and code from C++](https://github.com/carbon-language/carbon-lang/blob/trunk/docs/project/goals.md#interoperability-with-and-migration-from-existing-c-code) + semisemis `for` loops, and providing them in Carbon is the easiest solution. + - Semisemis remain common in C++ code. +- Semisemis are much more flexible than range-based `for` loops. + - `while` loops do not offer a sufficient alternative. + +Disadvantages: + +- Semisemi loops can be error prone, such as `for (int i = 0; i < 3; --i)`. + - Syntax such as `for (int x : range(0, 3))` leaves less room for + developer mistakes. + - Removing semisemi syntax will likely improve + [understandability of Carbon code](https://github.com/carbon-language/carbon-lang/blob/trunk/docs/project/goals.md#code-that-is-easy-to-read-understand-and-write), + a language goal. +- If we add semisemi loops, it would be very difficult to get rid of them. + - Code using them should be expected to accumulate quickly, from both + migrated code and developers familiar with C++ idioms. + +If we want to remove `for` loops, we should avoid adding them. We do need to +ensure that developers are _happy_ with the replacement, although that should be +achievable through providing strong range support, including range literals. + +A story for migrating developers and code is still required. For developers, it +would be ideal if we could have a compiler error that detects semisemi loops and +advises the preferred Carbon constructs. For both developers and code, we need a +suitable loop syntax that is easy to use in cases that remain hard to write in +`while` or range-based `for` loops. This will depend on a separate proposal, but +there's at least presently interest in this direction. + +### Writing `in` instead of `:` + +Range-based for loops could write `in` instead of `:`, such as: + +```carbon +for (x in list) { + ... +} +``` + +An argument for switching _now_, instead of using +[C++ as a baseline](#c-as-baseline), would be that `var` syntax has been +discussed as using a `:`, and avoiding `:` in range-based for loops may reduce +syntax ambiguity risks. However, the +[current `var` proposal](https://github.com/carbon-language/carbon-lang/pull/339) +does not use a `:`, and so this risk is only a potential future concern: it's +too early to require further evaluation. + +Because the benefits of this alternative are debatable and would diverge from +C++, adopting `in` would run contrary to +[using C++ as a baseline](#c-as-a-baseline). Any divergence should be justified +and reviewed as a separate proposal. + +### Multi-variable bindings + +C++ allows `for (auto [x, y] : range_of_pairs)` which is not explicitly part of +the syntax here. Carbon is likely to support this through tuples, so adding +special `for` syntax for this would likely be redundant.