Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add syntax to destructure array initialization lists #498

Closed
andrewrk opened this issue Sep 26, 2017 · 25 comments
Closed

add syntax to destructure array initialization lists #498

andrewrk opened this issue Sep 26, 2017 · 25 comments
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@andrewrk
Copy link
Member

andrewrk commented Sep 26, 2017

Latest Proposal


This proposal is an alternative to the rejected multiple expression values proposal (#83). It affects inline assembly improvements (#215). It depends on or at least is related to my comment in #346.

  • Add ability for functions to have multiple return values.
fn div(numerator: i32, denominator: i32) -> i32, i32 {
    return numerator / denominator, numerator % denominator;
}
  • If you want an error, it's recommended to use a struct:
error DivByZero;
const DivResult = struct {quotient: i32, remainder: i32 };
fn div(numerator: i32, denominator: i32) -> %DivResult {
    if (denominator == 0) return error.DivByZero;
    return DivResult {
        .quotient = numerator / denominator,
        .remainder = numerator % denominator,
    };
}
  • return statements can have multiple return values:
fn foo(condition: bool) {
    const x, const y = div(3, 1);
    const a, const b = if (condition) {
        return :this false, 1234;
    } else {
        return :this true, 5678;
    };
}

This is not general-purpose tuples. This is multiple assignment and multiple return values.

Real Actual Use Case: https://github.com/zig-lang/zig/blob/cba4a9ad4a149766c650e3f3d71435cef14867a3/std/os/child_process.zig#L237-L246

@andrewrk andrewrk added enhancement Solving this issue will likely involve adding new logic or components to the codebase. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. labels Sep 26, 2017
@andrewrk andrewrk added this to the 0.2.0 milestone Sep 26, 2017
@PavelVozenilek
Copy link

  1. It would make sense to have named return values:
fn foo() -> int_val : i32, float_val : f32 {
   int_val = 0;
   ...
   if (...) return 10, 3.14;
   ...
   if (...) {
      float_val = 1.0;
      // implicitly converted to return int_val, float_val, compiler makes sure both were set
      return; 
   }
  ...
  if (...) {
     // compiler verifies float_val was set
     return 20, float_val;
  }
  int_val += 1;
  ...
  return int_val, float_val; // equivalent of return;
}

The more return values the more this would help.

Nim does this.

It is similar to proposal:
#83 (comment)


  1. Will there be a chance to use undefined as "I do not care" return value?
fn foo() -> i32, f32 {
   if (...) return 1, undefined;
   ...
   return 1, 2.2;
}

  1. If a structure is used as return type, it would be really handy to define it "inline". Otherwise people may place this type somewhere far away from the function definition, increasing confusion and potential for misuse.
fn my_func() -> const my_result_type = struct {foo: i32, bar: i32 }
    return my_result_type { ...  };
}
 
var x : my_func.my_result_type  = my_func();

Unnamed variant:

fn my_func() -> struct {foo: i32, bar: i32 }
    return { ...  }; // compiler knows it returns unnamed struct type
}
 
// result_type is contextual keyword understood by the compiler
var x : my_func.result_type  = my_func(); 

Here the function acts like a namespace for the return type.

Risk that someone mistakenly uses the type in inappropriate context is lower.


  1. This should be possible:
fn foo() -> i32, f32 { ... }

var bar, baz = foo(); // type inference 

  1. It may be handy to ignore some return value
fn foo() -> i32, f32 { ... }

var bar, undefined = foo(); // type inference for bar

@hasenj
Copy link

hasenj commented Sep 27, 2017

Is it just me or the struct + error thing violates the maxim about

Only one obvious way to do things.

@andrewrk
Copy link
Member Author

Can you give an example where it's not obvious which thing to do?

@hasenj
Copy link

hasenj commented Sep 28, 2017

The obvious thing (previously, I suppose) if you want to return multiple values is to use a struct.

Now you have two options: struct or multiple returns.

Suppose you have a function that returns two values, and later it evolves to also support errors. Now you have to rewrite the function and all calls to it so that it uses a struct. Maybe after a few times you decide to always use a struct and never use multiple return values.

Suppose the opposite: you use a struct because the function can return errors, but later it gets simplified and there are no errors anymore. Should you refactor it to return multiple values or leave it as-is?

@PavelVozenilek
Copy link

@hasenj: adding a struct increases number of "high level things" in the system.

One may be tempted to reuse return structure in different contexts, e.g. as member in some other structure. This discourages later changes.

Struct definition could be placed far away from its function. (Project rules may require such structuring - first define all constants, then the structures, last the functions.) It gets even better when the struct gives no hint of its intended purpose.

Having a struct also requires one to invent new name (could be solved by allowing function_name.return_type).


On the other hand, multiple return values is very local thing. It has no chance to affect unrelated code. It is always present where it is needed: at function definition and function invocations, and nowhere else. One is not temped to extend/reuse it for other purposes.

IMHO it should be preferred to structs/tuples.

@hasenj
Copy link

hasenj commented Sep 29, 2017

Project rules may require such structuring

The problem here is with arbitrary project rules.

I can also see another problem with multiple return values: it's not clear what is what (Just like with a regular tuple).

fn div(numerator: i32, denominator: i32) -> i32, i32

Without looking at the code, which value is the div and which is the mod?

I shall invoke other items from the zen:

Reduce the amount one must remember.

Favor reading code over writing code.

When you get a struct, the field name will clearly denote which item is which.

Avoid local maximums.

It might be easier to write the function once and use it once or twice. But can you imagine a project full of such functions?

It can be tempting to litter the code with multiple-value returning functions instead of properly defining the data structures that represent the problem and solution one is trying to build.

@PavelVozenilek
Copy link

@hasenj:

Project rules may require such structuring [of source file sections]

The problem here is with arbitrary project rules.

Yes, but this happens and the negative impact could be reduced a bit.

I can also see another problem with multiple return values: it's not clear what is what (Just like with a regular tuple).

Above I proposed optional named return values (adding the ability to manipulate individual values).

At call place returned value is assigned to a named variable. If one uses wrong name or wrong names order ... well, that's mistake like any other.

It might be easier to write the function once and use it once or twice. But can you imagine a project full of such functions?

Yes, I imagine that. Formal project rules kick in with full force:

// Mandatory project header template

//=== constants ===
...
// === types ===
...
// === functions ===
...

More seriously: "too many functions" is problem that should be solved on different level, by proper modularity, hiding the details as much as possible.


Multiple named values have their place: if there are only few of them (hard limit could be used, or some style guide or compiler check, per project) and when they make intuitive sense ( fn date() -> year : u32, month : u32, day : u32 ).

Structures are good if there's reuse, or if the data get too complex.

In C people often prefer multiple return values: all those out parameters by pointer, instead of defining return structure. Projects invent rules where to place these out parameters, tools are created to catch common bugs. This could happen to Zig too.

@hasenj
Copy link

hasenj commented Sep 29, 2017

What's the difference between a named tuple and a struct?

@PavelVozenilek
Copy link

@hasenj: no, I do not mean named tuple (which can be freely used in other places). I mean:

fn foo() -> ret_val1 : i32, ret_val2 : f64 { ... }

var x, y = foo();

The point is that the ret_val1 : i32, ret_val2 : f64 is tied to this function only, is predictably always at the right place, and does not require unique name.

@PavelVozenilek
Copy link

There is yet another use case for multiple return values: comptime expressions.

Setting a value using comptime is tricky (perhaps I didn't learn enough).

This works:

const x = comptime {
  var i : i32 = 99;
  i += 1;
  i
};

It is bit clumsy (avoid ; after last expression, don't forget ; after closing bracket) and, mainly, it does not allow to return more than one value. Yes, one can define a struct, but this makes design more complicated than it needs to be.

I imagine something as:

const x, y, z = comptime {
  ...
  i, j , k
};

@hasenj
Copy link

hasenj commented Oct 20, 2017

Some further questions to consider:

  1. Can the multiple return include an error as one of the items?

     error DivByZero;
     fn div(numerator: i32, denominator: i32) -> (i32, i32, error) {
         if (denominator == 0) return (0, 0, error.DivByZero);
         return (numerator / denominator, numerator % denominator, null);
     }
    
  2. Why can't a multiple return value also be wrapped/union-ed with an error value?

     error DivByZero;
     fn div(numerator: i32, denominator: i32) -> %(i32, i32) {
         if (denominator == 0) return error.DivByZero;
         return (numerator / denominator, numerator % denominator);
     }
    

I think the main issue I'm trying to raise is, why can't "tuples" be used outside the context of a function return? It seems like an asymmetry that can cause problems or confusion. One of which is the inability to union the return value with an error.

@andrewrk
Copy link
Member Author

andrewrk commented Dec 8, 2017

The questions brought up by @hasenj are resolved with #632, and since that's now accepted, I'm going to accept this proposal as well.

@andrewrk andrewrk added the accepted This proposal is planned. label Dec 8, 2017
@andrewrk andrewrk modified the milestones: 0.2.0, 0.3.0 Feb 28, 2018
@andrewrk andrewrk removed the accepted This proposal is planned. label Jun 1, 2018
@andrewrk
Copy link
Member Author

andrewrk commented Jun 1, 2018

Removing accepted label as it conflicts with #208.

@andrewrk andrewrk modified the milestones: 0.3.0, 0.4.0 Jul 18, 2018
@andrewrk andrewrk removed the enhancement Solving this issue will likely involve adding new logic or components to the codebase. label Nov 21, 2018
@andrewrk andrewrk changed the title proposal: multiple block return values add syntax to destructure array initialization lists Nov 21, 2018
@andrewrk
Copy link
Member Author

This proposal depends on #208 and #287 and would allow something like this:

const S = struct {field: i32};
var s: S = undefined;
var x, const y, s.field = blk: {
    break :blk .{foo(), bar(), baz() + 1};
};

@andrewrk andrewrk added the accepted This proposal is planned. label Feb 15, 2019
@andrewrk andrewrk modified the milestones: 0.4.0, 0.5.0 Feb 15, 2019
@andrewrk andrewrk modified the milestones: 0.5.0, 0.6.0 Aug 28, 2019
@InKryption
Copy link
Contributor

@billzez I'd just like to point out, let(.{ &a, &b }, .{b, a}); wouldn't allow for destructuring into const variables.

@moosichu
Copy link
Contributor

moosichu commented Jul 4, 2021

Another option could be out parameters (like C# has), which does have some nice benefits as it can allow for APIs that can scope variable initialisation conditionally as well. Eg:

if(queue.tryDequeue(out const someVar)) {...}

@jamii
Copy link

jamii commented Oct 5, 2022

(EDIT moved to #3805 (comment))

@andrewrk andrewrk modified the milestones: 0.11.0, 0.12.0 Apr 9, 2023
@andrewrk andrewrk modified the milestones: 0.13.0, 0.12.0 Jul 9, 2023
mlugg added a commit to mlugg/zig that referenced this issue Sep 14, 2023
This change implements the following syntax into the compiler:

```zig
const x: u32, var y, foo.bar = .{ 1, 2, 3 };
```

A destructure expression may only appear within a block (i.e. not at
comtainer scope). The LHS consists of a sequence of comma-separated var
decls and/or lvalue expressions. The RHS is a normal expression.

A new result location type, `destructure`, is used, which contains
result pointers for each component of the destructure. This means that
when the RHS is a more complicated expression, peer type resolution is
not used: each result value is individually destructured and written to
the result pointers. RLS is always used for destructure expressions,
meaning every `const` on the LHS of such an expression creates a true
stack allocation.

Aside from anonymous array literals, Sema is capable of destructuring
the following types:
* Tuples
* Arrays
* Vectors

A destructure may be prefixed with the `comptime` keyword, in which case
the entire destructure is evaluated at comptime: this means all `var`s
in the LHS are `comptime var`s, every lvalue expression is evaluated at
comptime, and the RHS is evaluated at comptime. If every LHS is a
`const`, this is not allowed: as with single declarations, the user
should instead mark the RHS as `comptime`.

There are a few subtleties in the grammar changes here. For one thing,
if every LHS is an lvalue expression (rather than a var decl), a
destructure is considered an expression. This makes, for instance,
`if (cond) x, y = .{ 1, 2 };` valid Zig code. A destructure is allowed
in almost every context where a standard assignment expression is
permitted. The exception is `switch` prongs, which cannot be
destructures as the comma is ambiguous with the end of the prong.

A follow-up commit will begin utilizing this syntax in the Zig compiler.

Resolves: ziglang#498
mlugg added a commit to mlugg/zig that referenced this issue Sep 14, 2023
This change implements the following syntax into the compiler:

```zig
const x: u32, var y, foo.bar = .{ 1, 2, 3 };
```

A destructure expression may only appear within a block (i.e. not at
comtainer scope). The LHS consists of a sequence of comma-separated var
decls and/or lvalue expressions. The RHS is a normal expression.

A new result location type, `destructure`, is used, which contains
result pointers for each component of the destructure. This means that
when the RHS is a more complicated expression, peer type resolution is
not used: each result value is individually destructured and written to
the result pointers. RLS is always used for destructure expressions,
meaning every `const` on the LHS of such an expression creates a true
stack allocation.

Aside from anonymous array literals, Sema is capable of destructuring
the following types:
* Tuples
* Arrays
* Vectors

A destructure may be prefixed with the `comptime` keyword, in which case
the entire destructure is evaluated at comptime: this means all `var`s
in the LHS are `comptime var`s, every lvalue expression is evaluated at
comptime, and the RHS is evaluated at comptime. If every LHS is a
`const`, this is not allowed: as with single declarations, the user
should instead mark the RHS as `comptime`.

There are a few subtleties in the grammar changes here. For one thing,
if every LHS is an lvalue expression (rather than a var decl), a
destructure is considered an expression. This makes, for instance,
`if (cond) x, y = .{ 1, 2 };` valid Zig code. A destructure is allowed
in almost every context where a standard assignment expression is
permitted. The exception is `switch` prongs, which cannot be
destructures as the comma is ambiguous with the end of the prong.

A follow-up commit will begin utilizing this syntax in the Zig compiler.

Resolves: ziglang#498
mlugg added a commit to mlugg/zig that referenced this issue Sep 15, 2023
This change implements the following syntax into the compiler:

```zig
const x: u32, var y, foo.bar = .{ 1, 2, 3 };
```

A destructure expression may only appear within a block (i.e. not at
comtainer scope). The LHS consists of a sequence of comma-separated var
decls and/or lvalue expressions. The RHS is a normal expression.

A new result location type, `destructure`, is used, which contains
result pointers for each component of the destructure. This means that
when the RHS is a more complicated expression, peer type resolution is
not used: each result value is individually destructured and written to
the result pointers. RLS is always used for destructure expressions,
meaning every `const` on the LHS of such an expression creates a true
stack allocation.

Aside from anonymous array literals, Sema is capable of destructuring
the following types:
* Tuples
* Arrays
* Vectors

A destructure may be prefixed with the `comptime` keyword, in which case
the entire destructure is evaluated at comptime: this means all `var`s
in the LHS are `comptime var`s, every lvalue expression is evaluated at
comptime, and the RHS is evaluated at comptime. If every LHS is a
`const`, this is not allowed: as with single declarations, the user
should instead mark the RHS as `comptime`.

There are a few subtleties in the grammar changes here. For one thing,
if every LHS is an lvalue expression (rather than a var decl), a
destructure is considered an expression. This makes, for instance,
`if (cond) x, y = .{ 1, 2 };` valid Zig code. A destructure is allowed
in almost every context where a standard assignment expression is
permitted. The exception is `switch` prongs, which cannot be
destructures as the comma is ambiguous with the end of the prong.

A follow-up commit will begin utilizing this syntax in the Zig compiler.

Resolves: ziglang#498
mlugg added a commit to mlugg/zig that referenced this issue Sep 15, 2023
This change implements the following syntax into the compiler:

```zig
const x: u32, var y, foo.bar = .{ 1, 2, 3 };
```

A destructure expression may only appear within a block (i.e. not at
comtainer scope). The LHS consists of a sequence of comma-separated var
decls and/or lvalue expressions. The RHS is a normal expression.

A new result location type, `destructure`, is used, which contains
result pointers for each component of the destructure. This means that
when the RHS is a more complicated expression, peer type resolution is
not used: each result value is individually destructured and written to
the result pointers. RLS is always used for destructure expressions,
meaning every `const` on the LHS of such an expression creates a true
stack allocation.

Aside from anonymous array literals, Sema is capable of destructuring
the following types:
* Tuples
* Arrays
* Vectors

A destructure may be prefixed with the `comptime` keyword, in which case
the entire destructure is evaluated at comptime: this means all `var`s
in the LHS are `comptime var`s, every lvalue expression is evaluated at
comptime, and the RHS is evaluated at comptime. If every LHS is a
`const`, this is not allowed: as with single declarations, the user
should instead mark the RHS as `comptime`.

There are a few subtleties in the grammar changes here. For one thing,
if every LHS is an lvalue expression (rather than a var decl), a
destructure is considered an expression. This makes, for instance,
`if (cond) x, y = .{ 1, 2 };` valid Zig code. A destructure is allowed
in almost every context where a standard assignment expression is
permitted. The exception is `switch` prongs, which cannot be
destructures as the comma is ambiguous with the end of the prong.

A follow-up commit will begin utilizing this syntax in the Zig compiler.

Resolves: ziglang#498
mlugg added a commit to mlugg/zig that referenced this issue Sep 15, 2023
This change implements the following syntax into the compiler:

```zig
const x: u32, var y, foo.bar = .{ 1, 2, 3 };
```

A destructure expression may only appear within a block (i.e. not at
comtainer scope). The LHS consists of a sequence of comma-separated var
decls and/or lvalue expressions. The RHS is a normal expression.

A new result location type, `destructure`, is used, which contains
result pointers for each component of the destructure. This means that
when the RHS is a more complicated expression, peer type resolution is
not used: each result value is individually destructured and written to
the result pointers. RLS is always used for destructure expressions,
meaning every `const` on the LHS of such an expression creates a true
stack allocation.

Aside from anonymous array literals, Sema is capable of destructuring
the following types:
* Tuples
* Arrays
* Vectors

A destructure may be prefixed with the `comptime` keyword, in which case
the entire destructure is evaluated at comptime: this means all `var`s
in the LHS are `comptime var`s, every lvalue expression is evaluated at
comptime, and the RHS is evaluated at comptime. If every LHS is a
`const`, this is not allowed: as with single declarations, the user
should instead mark the RHS as `comptime`.

There are a few subtleties in the grammar changes here. For one thing,
if every LHS is an lvalue expression (rather than a var decl), a
destructure is considered an expression. This makes, for instance,
`if (cond) x, y = .{ 1, 2 };` valid Zig code. A destructure is allowed
in almost every context where a standard assignment expression is
permitted. The exception is `switch` prongs, which cannot be
destructures as the comma is ambiguous with the end of the prong.

A follow-up commit will begin utilizing this syntax in the Zig compiler.

Resolves: ziglang#498
mlugg added a commit to mlugg/zig that referenced this issue Sep 15, 2023
This change implements the following syntax into the compiler:

```zig
const x: u32, var y, foo.bar = .{ 1, 2, 3 };
```

A destructure expression may only appear within a block (i.e. not at
comtainer scope). The LHS consists of a sequence of comma-separated var
decls and/or lvalue expressions. The RHS is a normal expression.

A new result location type, `destructure`, is used, which contains
result pointers for each component of the destructure. This means that
when the RHS is a more complicated expression, peer type resolution is
not used: each result value is individually destructured and written to
the result pointers. RLS is always used for destructure expressions,
meaning every `const` on the LHS of such an expression creates a true
stack allocation.

Aside from anonymous array literals, Sema is capable of destructuring
the following types:
* Tuples
* Arrays
* Vectors

A destructure may be prefixed with the `comptime` keyword, in which case
the entire destructure is evaluated at comptime: this means all `var`s
in the LHS are `comptime var`s, every lvalue expression is evaluated at
comptime, and the RHS is evaluated at comptime. If every LHS is a
`const`, this is not allowed: as with single declarations, the user
should instead mark the RHS as `comptime`.

There are a few subtleties in the grammar changes here. For one thing,
if every LHS is an lvalue expression (rather than a var decl), a
destructure is considered an expression. This makes, for instance,
`if (cond) x, y = .{ 1, 2 };` valid Zig code. A destructure is allowed
in almost every context where a standard assignment expression is
permitted. The exception is `switch` prongs, which cannot be
destructures as the comma is ambiguous with the end of the prong.

A follow-up commit will begin utilizing this syntax in the Zig compiler.

Resolves: ziglang#498
@andrewrk andrewrk modified the milestones: 0.13.0, 0.12.0 Sep 15, 2023
@andrewrk andrewrk added the accepted This proposal is planned. label Sep 16, 2023
@nektro
Copy link
Contributor

nektro commented Sep 16, 2023

is this going to also support destructuring named structs by field name? i worry that if the answer is not eventually yes it might slightly pressure apis to use tuples more often and hurt readability in the long term

mk12 added a commit to mk12/blog that referenced this issue Sep 16, 2023
Implemented yesterday:
ziglang/zig#498
TUSF pushed a commit to TUSF/zig that referenced this issue May 9, 2024
This change implements the following syntax into the compiler:

```zig
const x: u32, var y, foo.bar = .{ 1, 2, 3 };
```

A destructure expression may only appear within a block (i.e. not at
comtainer scope). The LHS consists of a sequence of comma-separated var
decls and/or lvalue expressions. The RHS is a normal expression.

A new result location type, `destructure`, is used, which contains
result pointers for each component of the destructure. This means that
when the RHS is a more complicated expression, peer type resolution is
not used: each result value is individually destructured and written to
the result pointers. RLS is always used for destructure expressions,
meaning every `const` on the LHS of such an expression creates a true
stack allocation.

Aside from anonymous array literals, Sema is capable of destructuring
the following types:
* Tuples
* Arrays
* Vectors

A destructure may be prefixed with the `comptime` keyword, in which case
the entire destructure is evaluated at comptime: this means all `var`s
in the LHS are `comptime var`s, every lvalue expression is evaluated at
comptime, and the RHS is evaluated at comptime. If every LHS is a
`const`, this is not allowed: as with single declarations, the user
should instead mark the RHS as `comptime`.

There are a few subtleties in the grammar changes here. For one thing,
if every LHS is an lvalue expression (rather than a var decl), a
destructure is considered an expression. This makes, for instance,
`if (cond) x, y = .{ 1, 2 };` valid Zig code. A destructure is allowed
in almost every context where a standard assignment expression is
permitted. The exception is `switch` prongs, which cannot be
destructures as the comma is ambiguous with the end of the prong.

A follow-up commit will begin utilizing this syntax in the Zig compiler.

Resolves: ziglang#498
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This proposal is planned. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests