multiple expression return values, error type redesign, introduction of copyable property of types #83

andrewrk · 2016-01-25T02:23:20Z

(see below comment for up to date details of this issue)

struct Foo {
    x: i32,
    y: i32,
}
fn make_foo(x: i32, y: i32) -> Foo {
    var f: Foo = undefined;
    foo.x = x;
    foo.y = y;
    return f;
}
fn f() {
    var foo = make_foo(1234, 5678);
}

When make_foo is generated, we should notice that f is always returned, which means instead of allocating stack space for f, we use the secret first argument pointer value and directly put the values there.

Also, in the code generated for function f, we notice the simple assignment, and instead of doing a memcpy, simply allocate the stack variable for foo, and then call make_foo passing the stack variable address as the secret first parameter.

Once these two optimizations are implemented, idiomatic zig code for initializing structs can be an assignment.

The text was updated successfully, but these errors were encountered:

andrewrk · 2016-08-12T03:48:16Z

This may be more than just an optimization problem. Consider:

pub struct List(T: type) {
    items: []T,
    len: usize,
    prealloc_items: [STATIC_SIZE]T,

    pub fn init() -> List(T) {
        var l: Self = undefined;
        l.items = l.prealloc_items[0...];
        l.len = 0;
        return l;
    }
}

fn basic_list_test() {
    var list = List(i32).init();
    defer list.deinit();
}

Here, we assign a pointer to l.items. The pointer is to the address of prealloc_items, which is on the stack. When we return, this pointer becomes invalid.

Note: I think this is the 2nd time I accidentally did this in the zig standard library, which caused a runtime memory corruption error, which I had to troubleshoot. This is the kind of code we don't want people to write on accident.

If we wanted this to work we would need to make this "return value is on caller's stack" thing more explicit, which I'm not sure we want to do. But it would look something like this:

fn make_foo(x: i32, y: i32) -> (foo: &Foo) {
    foo.x = x;
    foo.y = y;
}

But now this function assumes that there is memory available to write to. If you called the function with no assignment, we'd need a hidden stack allocation (a concept we already have) to provide the memory:

fn f() {
    make_foo(1, 2);
}

Another point is that if we had this different return value syntax, it implies that we should support tuples which is another can of worms. Maybe we should open it.

Let's say we went with this though, and we wanted the instance on the heap:

fn f() {
    const list = alloc_one(List(i32), 1);
    *list = List(i32).init();
}

Kind of awkward. Compare to:

fn heap() {
    const list = alloc_one(List(i32), 1);
    list.init();
}
fn stack() {
    var list: List(i32) = undefined;
    list.init();
}

The latter seems more uniform. Maybe? Now I'm not so sure.

But the point of this comment is that this optimization needs to be explicitly part of the syntax rather than something that sometimes works and is hidden when it doesn't work.

Another option that I am considering is that we could make byvalue struct parameters and return values a compile error. The user would be forced by the compiler to do the undefined/init method, which notably is always available even if we want the other method to be idiomatic. Which means that "only one way to do it" gently suggests disallowing structs as byvalue params and return types.

fsaintjacques · 2016-11-03T20:37:26Z

var list: List(i32) = undefined;
list.init();

I'd parse this as a null dereference.

kiljacken · 2016-11-03T20:44:08Z

Well, there no null dereferencing there. Just calling a function on uninitialized stack-allocated struct (which probably passes the struct as a pointer, but it won't be null).
Looks very reasonable to me.

fsaintjacques · 2016-11-04T13:07:06Z

I am aware there's no null dereferencing. I'm trying to say is that undefined is a really poor keyword name. Maybe adding constructor or initializer, e.g.

var list: List(32) = List(32) { .len = 0, ... };

kiljacken · 2016-11-04T13:36:32Z

Yeah, undefined seems kind of weirds. As a plain keyword, uninitialized would probably convey the programmer's intent more clearly. I'm not so sure about a constructor/initializer style syntax, as it would require ways both to say: "the rest of the fields are zeroed" and "the rest of the fields are uninitialized".

fsaintjacques · 2016-11-04T13:39:42Z

Oh no, this is just me being lazy, the initializer list would be forced to put all fields.

fsaintjacques · 2016-11-04T16:48:55Z

So it does already support the initializer list construct.

andrewrk · 2016-11-08T19:31:24Z

Equivalent C (ignoring the generic stuff):

List list;
list_init(&list);

Translated to zig:

var list: List = undefined;
list.init();

This looks like a reasonable translation to me.

If it looks bad it's because we're playing with fire by using uninitialized memory. I would argue that your alarm is well deserved. Ideally we would construct the language design so that using undefined is an edge case.

As you mentioned on IRC, something that will help this a bunch is named return value. Then it could look like:

const list = List.init();

Let's look at the proposed initializer syntax:

var list = List(i32) { .len = 0, ... };

Status quo zig says you should write it this way instead:

var list: List(i32) = undefined; // you can also put `zeroes` instead of `undefined`
list.len = 0;

If you want a compile error if a field is added/removed to the struct, then use the initializer list which requires populating all fields (and you can specify some of the fields as undefined or zeroes).

But status quo I think correctly represents the dangers of undefined values.

As for undefined vs uninitialized, it's technically allowed to set something to undefined that was previously defined. This can have implications for optimizations as well as protecting against some mistakes by detecting undefined usage at compile time. Given this it makes more sense to use undefined rather than uninitialized. It makes sense because the compiler requires that variables are initialized to a value, and we tell the compiler that the value is undefined. It's a tad bit higher level of an abstraction as uninitialized. The compiler is allowed to initialize the value if it wants to, and in fact, in debug mode it does exactly that. In debug mode, all undefined bytes are set to 0xaa. Side note, we might want to allow disabling this for the purpose of tools such as valgrind which can emit an error when uninitialized memory is used. Although ideally we would somehow be able to flag this memset in debug info as uninitialized so that valgrind and other tools could track the value as undefined. That would require cooperation/standardization between zig and these other tools though.

fsaintjacques · 2016-11-08T21:21:27Z

zeroes is a keyword?

andrewrk · 2016-11-08T21:52:27Z

https://github.com/andrewrk/zig/blob/master/test/cases/zeroes.zig

thejoshwolfe · 2016-11-08T21:56:42Z

Here's a proposal for named return values:

Functions can have one of two different return styles:

Unnamed return value. In this case the return type must meet some criteria, such as being primitive types or being small enough to fit into the ISA's return value registers or something. A void return type, including an implicit void return type, fits into this category. This is status quo.
Named return values. Any number of return values can be declared, and each must have a unique name just like parameters. In fact, these return types are effectively implemented as pointer parameters.

In both cases, any (non-void) return value must be accepted into variables rather than ignored. This is a change from status quo. The special _ variable name is available for storing values that are intended to be ignored.

The semantics of unnamed return values are unchanged. Every exit point from a function with a non-void unnamed return value must provide a value to return.

Analogously, all named return values must be fully initialized at every exit point from a function.

Static analysis for a function with named return values assumes the initial values of the return variables are undefined. The aliasing assumptions for named return values should work as though the named return values are pointer parameters.

Examples:

fn init1() -> (result: Foo) {
    result = Foo{
        .field1 = value1,
        .field2 = value2,
    };
}
fn init2() -> (result: Foo) {
    result.field1 = value1;
    result.field2 = value2;
}
fn init_equivalent(result: &Foo) {
    result.field1 = value1;
    result.field2 = value2;
}
fn init4() -> (result: Foo) {
    result.field1 = value1;
    // ERROR: field2 is not initialized
}
fn init5() -> (result: Foo) {
    return Foo{ // ERROR: attempt to return unnamed return value in function with named return values
        .field1 = value1,
        .field2 = value2,
    };
}
fn init6() -> Foo { // ERROR: struct Foo cannot be used as an unnamed return value
}

fn div(numerator: i32, denominator: i32) -> (quotient: i32, remainder: i32) {
    quotient = numerator / denominator;
    remainder = numerator % denominator;
}

fn main() {
    var foo = init1(); // type is inferred.
    init_equivalent(&foo); // this is status quo
    const foo2 = init1(); // const works too, which you can't do with status quo.
    _ = init1(); // _ can be any type
    init1(); // ERROR: cannot ignore return value

    var x: i32;
    var y: i32;
    x, y = div(3,1); // this is not general-purpose tuples. this just named return values.
    var x2, var y2 = div(3,2);
    const x2, const y2 = div(3,2);
    var x3, y = div(3,2);
    x, var y3 = div(3,2);
    x, _ = div(3,2);
    _, y = div(3,2);
    _, _ = div(3,2);
    var broken = div(3,2); // ERROR: wrong number of return values
    div(3,2); // ERROR: cannot ignore return values
}

In practice, here's a case where the error for ignoring return values will matter:

fn add_to_set(set: &Set, x: u32) -> bool {
    // return true if x was actually added, and false if x was already in the set.
}

fn main() {
    var set: Set = something();
    add_to_set(&set, 1); // ERROR: cannot ignore return value
    add_to_set(&set, 2); // ERROR: cannot ignore return value
    add_to_set(&set, 3); // ERROR: cannot ignore return value
    _ = add_to_set(&set, 1);
    _ = add_to_set(&set, 2);
    _ = add_to_set(&set, 3);
}

Even though it's "annoying" to have to type those _ ='s, I think this is a good idea. It makes the caller document the presence of a non-void return value, and explicitly ignore it.

andrewrk · 2016-11-08T22:03:42Z

I like the proposal. How would it account for error union return types?

andrewrk · 2016-11-08T22:36:29Z

Proposal:

error DivByZero;

fn div(numerator: i32, denominator: i32) -> %(quotient: i32, remainder: i32) {
    if (denominator == 0) return error.DivByZero;
    quotient = numerator / denominator;
    remainder = numerator % denominator;
}

fn main() -> %void {
    var x2, var y2 = div(3,2) %% ([]type{i32, i32}){0, 0};

    const x3, const y3 = %return div(3, 2);
}

Maybe multiple named return types are in fact just tuples?

andrewrk · 2016-11-09T16:20:11Z

andrewrk · 2016-12-26T21:48:19Z

Here's a test I'm deleting. Reminding myself to port it to the new error syntax when that's done.

fn switchOnErrorUnion() {
    @setFnTest(this);

    const x = switch (returnsTen()) {
        Ok => |val| val + 1,
        ItBroke, NoMem => 1,
        CrappedOut => 2,
    };
    assert(x == 11);
}
error ItBroke;
error NoMem;
error CrappedOut;
fn returnsTen() -> %i32 {
    10
}

See #83

thejoshwolfe · 2017-02-07T21:43:26Z

Inspired by #250, here's a usecase where this proposal gets awkward. This usecase does some very fancy first-class function stuff, which most languages are ill-equipped to handle:

fn workerThreadMain() {
    // we're not in the UI thread here.
    try (const width, const height = runInGuiThread(delegate)) {
        setDimensions(width, height);
    }
}
fn delegate() %-> (s32, s32) {
    // now we're in the UI thread
    if (comboBox.selectedIndex() == 1) {
        return widthSpinner.value(), heightSpinner.value();
    } else {
        return error.NotApplicable;
    }
}

// this is the overloading you might find in Java or C#
fn runInGuiThread(f: fn() -> var) -> f.resultType { }
fn runInGuiThread(f: fn() -> (var, var)) -> (f.resultTypes[0], f.resultTypes[1]) { }
fn runInGuiThread(f: fn() %-> var) %-> f.resultType { }
fn runInGuiThread(f: fn() %-> (var, var)) %-> (f.resultTypes[0], f.resultTypes[1]) { }

// maybe something like this would be better:
fn runInGuiThread(f: fn() -> ...) -> ...f.resultTypes { }
// or maybe this:
fn runInGuiThread(f: fn() %-> ...) %-> ...f.resultTypes { }

I haven't figured out how a generic function runner like that would be implemented, but #229 would probably help.

andrewrk · 2017-02-07T23:08:52Z

These examples are starting to get outside the range of syntax complexity I'm comfortable having. If it gets too weird I'd rather stick with single return values.

andrewrk · 2017-02-08T05:16:42Z

Did we ever consider this for multiple return values?

fn div(numerator: i32, denominator: i32) -> struct {quotient: i32, remainder: i32} {
    return this.ReturnType {
        .quotient = numerator / denominator,
        .remainder = numerator % denominator,
    };
}

To account for named return values:

fn div(numerator: i32, denominator: i32) -> (result: struct {quotient: i32, remainder: i32}) {
    result.quotient = numerator / denominator;
    result.remainder = numerator % denominator;
}

Now with an error:

error DivByZero;

fn div(numerator: i32, denominator: i32) -> %struct {quotient: i32, remainder: i32} {
    if (denominator == 0) return error.DivByZero;
    return this.ReturnType {
        .quotient = numerator / denominator,
        .remainder = numerator % denominator,
    };
}

Named return value with an error:

fn div(numerator: i32, denominator: i32) -> (result: %struct {quotient: i32, remainder: i32}) {
    if (denominator == 0) {
        result = error.DivByZero;
        return;
    }
    result = @typeOf(result).ChildType {
        .quotient = numerator / denominator,
        .remainder = numerator % denominator,
    };
}

That last one is pretty awkward. I'm not satisfied with it.

As a reminder, one of the main driving use cases of this issue is so that the pattern of struct initialization is instead of (status quo):

var list: List(i32) = undefined;
list.init();

We want instead:

var list = List(i32).init();

It's more than just aesthetics; we're trying to reduce the difference between compile-time code and run-time code, and the former cannot be used to initialize a global variable, while the latter can. So we're trying to make the common pattern work for both.

andrewrk · 2017-02-08T05:27:28Z

Another idea for that last one. We introduce another syntax for error unions and nullable types. The syntax lets you set the value to non-null/non-error, and a pointer to the payload (presumably to write through it). The contents of memory the pointer points to when you use this operation are undefined.

I'm also going to throw in there, that you can still return error.DivByZero if the type of the return value (this proposal requires there is always exactly one return value) is an error union.

fn div(numerator: i32, denominator: i32) -> (result: %struct {quotient: i32, remainder: i32}) {
    if (denominator == 0) return error.DivByZero;
    const payload = &%result; // result becomes non-error now, with undefined value payload
    payload.quotient = numerator / denominator;
    payload.remainder = numerator % denominator;
}

Similarly this would introduce the &? operation for nullable types.

This syntax seems to make sense because it's something you might want to do with a nullable type or an error union type anyway.

Downside is that it introduces another sigil, which seems to be a common complaint from people newly exposed to zig.

Also, even if we don't have multiple function return values, I think it still makes sense that you have to explicitly throw away void, and expressions can result in 0 or 1 values, instead of expressions using void to result in nothing.

andrewrk · 2017-03-27T21:29:44Z

This proposal is too big and gnarly. Some of the issues it brought up are solved, some are no longer valid, and the rest are split up into other issues.

andrewrk added enhancement Solving this issue will likely involve adding new logic or components to the codebase. optimization labels Jan 25, 2016

andrewrk added this to the debut milestone Jan 25, 2016

andrewrk removed the optimization label Aug 12, 2016

andrewrk changed the title ~~return value optimization for structs~~ figure out how by value return types should work Aug 12, 2016

andrewrk mentioned this issue Sep 1, 2016

use let keyword instead of const for immutable variable declarations #181

Closed

andrewrk changed the title ~~figure out how by value return types should work~~ multiple expression return values, error type redesign, introduction of copyable property of types Nov 9, 2016

thejoshwolfe mentioned this issue Nov 9, 2016

remove var args and add anon list initialization syntax #208

Closed

andrewrk mentioned this issue Nov 18, 2016

inline assembly improvements #215

Open

andrewrk mentioned this issue Jan 16, 2017

codegen incorrect for some kinds of aliasing #103

Closed

andrewrk added a commit that referenced this issue Feb 2, 2017

add try expression

c0b37e8

See #83

andrewrk mentioned this issue Feb 6, 2017

use case: generalize the format string mechanism so code other than printf can use it #250

Closed

andrewrk mentioned this issue Mar 27, 2017

named return values and reference-assignment operators #286

Closed

andrewrk closed this as completed Mar 27, 2017

thejoshwolfe mentioned this issue Mar 28, 2017

result location mechanism (previously: well defined copy eliding semantics) #287

Closed

andrewrk mentioned this issue Sep 26, 2017

add syntax to destructure array initialization lists #498

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multiple expression return values, error type redesign, introduction of copyable property of types #83

multiple expression return values, error type redesign, introduction of copyable property of types #83

andrewrk commented Jan 25, 2016 •

edited by thejoshwolfe

Loading

andrewrk commented Aug 12, 2016 •

edited

Loading

fsaintjacques commented Nov 3, 2016 •

edited

Loading

kiljacken commented Nov 3, 2016 •

edited

Loading

fsaintjacques commented Nov 4, 2016

kiljacken commented Nov 4, 2016

fsaintjacques commented Nov 4, 2016

fsaintjacques commented Nov 4, 2016

andrewrk commented Nov 8, 2016 •

edited

Loading

fsaintjacques commented Nov 8, 2016

andrewrk commented Nov 8, 2016

thejoshwolfe commented Nov 8, 2016

andrewrk commented Nov 8, 2016

andrewrk commented Nov 8, 2016

andrewrk commented Nov 9, 2016 •

edited

Loading

andrewrk commented Dec 26, 2016

thejoshwolfe commented Feb 7, 2017

andrewrk commented Feb 7, 2017

andrewrk commented Feb 8, 2017

andrewrk commented Feb 8, 2017 •

edited

Loading

andrewrk commented Mar 27, 2017

multiple expression return values, error type redesign, introduction of copyable property of types #83

multiple expression return values, error type redesign, introduction of copyable property of types #83

Comments

andrewrk commented Jan 25, 2016 • edited by thejoshwolfe Loading

andrewrk commented Aug 12, 2016 • edited Loading

fsaintjacques commented Nov 3, 2016 • edited Loading

kiljacken commented Nov 3, 2016 • edited Loading

fsaintjacques commented Nov 4, 2016

kiljacken commented Nov 4, 2016

fsaintjacques commented Nov 4, 2016

fsaintjacques commented Nov 4, 2016

andrewrk commented Nov 8, 2016 • edited Loading

fsaintjacques commented Nov 8, 2016

andrewrk commented Nov 8, 2016

thejoshwolfe commented Nov 8, 2016

andrewrk commented Nov 8, 2016

andrewrk commented Nov 8, 2016

andrewrk commented Nov 9, 2016 • edited Loading

andrewrk commented Dec 26, 2016

thejoshwolfe commented Feb 7, 2017

andrewrk commented Feb 7, 2017

andrewrk commented Feb 8, 2017

andrewrk commented Feb 8, 2017 • edited Loading

andrewrk commented Mar 27, 2017

andrewrk commented Jan 25, 2016 •

edited by thejoshwolfe

Loading

andrewrk commented Aug 12, 2016 •

edited

Loading

fsaintjacques commented Nov 3, 2016 •

edited

Loading

kiljacken commented Nov 3, 2016 •

edited

Loading

andrewrk commented Nov 8, 2016 •

edited

Loading

andrewrk commented Nov 9, 2016 •

edited

Loading

andrewrk commented Feb 8, 2017 •

edited

Loading