Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tuple cons cell syntax #1582

Closed
wants to merge 4 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
262 changes: 262 additions & 0 deletions text/0000-tuple-cons-cell.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,262 @@
- Feature Name: tuple_cons_cells
- Start Date: 2016-05-17
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)

# Summary
[summary]: #summary

Add a syntax for expressing tuples as a head and tail pair, similar to a Lisp
cons cell.

# Motivation
[motivation]: #motivation

Currently, rust doesn't give the user any way to talk about generically-sized
tuples. This means that it's not possible to, for example, implement a trait
for all tuples. Traits like `fmt::Debug` are only implemented for tuples of up
to some arbitrary number of elements. These impls are generated using some
hacky macro magic and they end up polluting the rust-doc documentation. This
RFC would alleviate these problems.

This RFC is also a step towards more general variadic generics as proposed in
draft RFC #376. While that RFC discusses using variadic lists of types in
positions such as function argument lists, this RFC only covers the specific
case of tuples of generic arity. Whatever "full" variadic generics eventually
look like, when and if they get implemented, it is unlikely we would want the
design for tuples to work any differently to what's proposed here.

# Detailed design
[design]: #detailed-design

This RFC proposes introducing two new syntactic forms: one for tuple types and
one for tuple terms. With this new syntax a tuple can be expressed as `(head_0,
head_1, ... head_n; tail)` where `head_x` are the first `n + 1` elements of the
tuple and `tail` is a tuple containing the remainder of the tuple. The table
below shows some of the different ways of expressing the same tuple combining
current rust syntax and the new syntax.

0 elements
()
(; ())
(; (; ()))

1 element

(a,)
(a; ())
(a,; ())
(; (a,))
(; (a; ()))
(; (a,; ()))

2 elements

(a, b)
(a, b,)
(a, b; ())
(a, b,; ())
(a; (b,))
(a; (b; ()))
(a; (b,; ()))

3 elements

(a, b, c)
(a, b, c,)
(a, b, c; ())
(a, b, c,; ())
(a, b; (c,))
(a, b,; (c,))
(a, b; (c; ()))
(a, b; (c,; ()))
(a, b,; (c; ()))
(a, b,; (c,; ()))
(a; (b, c))
(a; (b, c,))
(a,; (b, c))
(a,; (b, c,))
(a; (b, c; ()))
(a; (b, c,; ()))
(a,; (b, c; ()))
(a; (b, c,; ()))
(a,; (b, c,; ()))
(a; (b; (c,)))
(a; (b,; (c,)))
(a,; (b; (c,)))
(a,; (b,; (c,)))
(a; (b; (c; ())))
(a; (b; (c,; ())))
(a; (b,; (c; ())))
(a; (b,; (c,; ())))
(a,; (b; (c; ())))
(a,; (b; (c,; ())))
(a,; (b,; (c; ())))
(a,; (b,; (c,; ())))

and so forth...

This RFC proposes equivalent syntax for tuple types. Formally, the syntax for
tuple types could be described with the following grammar fragment:

```
ty_tuple
: "(" ")"
| "(" ty "," ty_tuple_inner ")"
| "(" ty_tuple_inner ";" ty ")"

ty_tuple_inner
: %empty
| ty
| ty "," ty_tuple_inner
```

With this syntax, any tuple can be expressed as either `()` or `(head; tail)`.
This makes it possible for generic code to handle all tuples by covering just
these two cases.

In addition to syntax for expressions and types, this RFC also proposes syntax
for destructuring tuples into a head and tail. Here, the obvious syntax is
used, ie. `let (head; tail) = (a; b);` results in `head == a` and
`tail == b`.

This RFC also proposes a new marker trait be added to the language. `Tuple` is
a trait which is implemented for `()` and `(H; T) where T: Tuple`. In general,
the type `(H; T)` is only valid when `T: Tuple`.

### Representation

The main problem with implementing this RFC is the question of representation.
At the memory level, in current Rust, there is no guarantee that the
representation of an `(a; b)` contains the representation of a `b`. The
solution proposed here is two-fold. First, we allow types to have separate
stride and size as per RFC issue #1397. Secondly, we layout tuples in reverse
order. Under this scheme, the tuple `(A, B, C) : (u16, u16, u32)` would be
represented as

```
------------------------------------------------
| Byte | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|------|-------------------|---------|---------|
| Data | C (u32) | B (u16) | A (u16) |
------------------------------------------------
```

And it's tail `(B, C) : (u16, u32)` would be represented as

```
------------------------------------------------
| Byte | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|------|-------------------|---------|---------|
| Data | C (u32) | B (u16) | padding |
------------------------------------------------
```

Crucially, the stride of this type is only 6 bytes. This means that a
`&mut (u16, u32)` can not be used to modify the tuple's two trailing "padding"
bytes as any tuple (accessed through a reference) may be the tail of a larger
tuple.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As enthusiastic as I am about this RFC, there's two potential issues I don't see addressed (or I'm not aware of their resolution):

  • Would this representation permit the last element of a tuple to be unsized? E.g. &(bool, char, Display). (cc all but the last field of a tuple must be Sized #1592)
  • Would this be backwards compatible with existing unsafe code which assumes that all of size_of::<T>() can be overwritten?

(Personally, while the alternative of laying out tuples less efficiently is shaky on "don't sacrifice potential performance ever ever" grounds, I do think it could pass the "don't pay for what you don't use" test. Heretofore, tuples and structs have been semantically equivalent, so it makes perfect sense that they would use the same representation. With this change, tuples would now be more powerful than structs: they would not merely be "anonymous structs", as they had been before, but heterogenously-typed lists which can be iterated over. Furthermore, struct fields are unordered, while tuple fields would have a strict ordering not just syntactically but now semantically as well. Just as unordered collections often get to use more efficient representations than ordered ones, it also makes sense that unordered structs would get to use more efficient representations than ordered tuples. But we should collect statistics and benchmarks about the impact on existing real-world Rust code if we choose to pursue this option. My suspicion is that the impact would be negligible - tuples with more than two elements and of different types are already an uncommon case, I think.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this representation permit the last element of a tuple to be unsized? E.g. &(bool, char, Display).

Good point, that's a pretty strong argument for changing the syntax to something like (head..., tail) where the subtuple is collected into the head rather than the tail. I'll see what comes of the conversation on #1592 before updating the RFC.

Would this be backwards compatible with existing unsafe code which assumes that all of size_of::<T>() can be overwritten?

Well, in general having, a size/stride distinction would break code like that. Unless we make size_of actually return the stride for the sake of backwards-compatibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having to iterate backwards feels unnatural to me - everything else (of course) goes forwards. Just my personal opinion though...

## Example

This code shows how we could use the proposed syntax to `impl Debug` for all
tuples.

```
// libcore/fmt/mod.rs

// Private trait used to help impl Debug.
trait TupleExt: Tuple + Debug {
fn debug_tail(&self, f: &mut Formatter) -> Result;
}

impl TupleExt for () {
fn debug_tail(&self, f: &mut Formatter) -> Result {
write!(f, ")")
}
}

impl<H: Debug, T: TupleExt> TupleExt for (H; T) {
fn debug_tail(&self, f: &mut Formatter) -> Result {
let (ref head; ref tail) = *self;
try!(write!(f, ", {:?}", *head));
tail.debug_tail(f)
}
}

// impl Debug for 0 elements
impl Debug for () {
fn fmt(&self, f: &mut Formatter) -> Result {
write!(f, "()")
}
}

// impl Debug for 1 element
impl<H: Debug> Debug for (H,) {
fn fmt(&self, f: &mut Formatter) -> Result {
let (ref head,) = *self;
write!(f, "({:?},)", *head)
}
}

// impl Debug for 2 or more elements
impl<H0: Debug, H1: Debug, T: TupleExt> Debug for (H0, H1; T) {
fn fmt(&self, f: &mut Formatter) -> Result {
let (ref head; ref tail) = *self;
try!(write!(f, "({:?}", *head));
tail.debug_tail(f)
}
}
```

# Drawbacks
[drawbacks]: #drawbacks

* Adds more syntax and another concept that people will need to learn.
* Code that modifies the head element of a tuple through a mutable reference -
where the compiler cannot guarantee that the tuple is not part of a larger
tuple - will sometimes be less efficient as the compiler will no longer be
able to assume that it's safe to overwrite trailing padding bytes.

# Alternatives
[alternatives]: #alternatives

* Not do this.
* Consider using a different syntax. The `(H; T)` syntax was chosen to resemble
the `[T; N]` syntax for arrays (on the theory that a tuple type is defined by
the type of its element and the type of its tail whereas an array type is
defined by the type of its elements and its length). Another possible syntax
would be `(a, b, ...more_elems)` although this would conflict with the
inclusive ranges RFC. Another is `(a, b, more_elems...)` although this looks
very similar to range syntax and may be confusing.
* Consider a different layout. By packing tuples less efficiently we could
obviate the need for the stride/size distinction and make updating the head
elements of tuples more efficient. Overall though I'm not sure this
would be a win. The efficiency hit associated with the proposed design only
happens when modifying a tuple through a mutable reference. Also the reference
must be to the tuple itself, not to an element in the tuple like what one
would obtain by writing `(mut ref a, ...) = some_tuple`. Also, the update
must happen to the head element of the tuple and the head element must be
small. Conversely, packing tuples less efficiently would often result in
significantly less efficient layout (eg. `(u16, u16, u32)` taking 12 bytes
instead of 8). [More knowledgeable people than me disagree though](https://github.com/rust-lang/rfcs/issues/1397#issuecomment-213311508),
so it would be worth discussing this further and trying to obtain data to
inform a decision with.
* Sidestep the representation issue by disallowing references to the tail of a
tuple. This would largely defeat the purpose of the RFC as, for example, the
`Debug` implementation above would be impossible to write.
* Sidestep the representation issue by making references to the tail of a tuple
expand into a tuple of references. Here, `let (; ref x) = (0, 1, 2);` would
yield `x == (&0, &1, &2)`. This would get extremely messy. Consider the
`Debug` implementation above which recursively formats its tuple argument. On
the first iteration it would be handling a tuple of values. On the second, a
tuple of references-to-values. On the third, a tuple of
references-to-references-to-values. And so forth. It would also be surprising
and unintuitive that `let (; x) = (0, 1, 2)` gives `x == (0, 1, 2)` but
`let (; ref x) = (0, 1, 2)` doesn't give `x == &(0, 1, 2)`.

# Unresolved questions
[unresolved]: #unresolved-questions

The representation issue warrants further discussion.