Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple expression return values, error type redesign, introduction of copyable property of types #83

Closed
andrewrk opened this issue Jan 25, 2016 · 20 comments
Labels
enhancement Solving this issue will likely involve adding new logic or components to the codebase.
Milestone

Comments

@andrewrk
Copy link
Member

andrewrk commented Jan 25, 2016

(see below comment for up to date details of this issue)

struct Foo {
    x: i32,
    y: i32,
}
fn make_foo(x: i32, y: i32) -> Foo {
    var f: Foo = undefined;
    foo.x = x;
    foo.y = y;
    return f;
}
fn f() {
    var foo = make_foo(1234, 5678);
}

When make_foo is generated, we should notice that f is always returned, which means instead of allocating stack space for f, we use the secret first argument pointer value and directly put the values there.

Also, in the code generated for function f, we notice the simple assignment, and instead of doing a memcpy, simply allocate the stack variable for foo, and then call make_foo passing the stack variable address as the secret first parameter.

Once these two optimizations are implemented, idiomatic zig code for initializing structs can be an assignment.

@andrewrk andrewrk added enhancement Solving this issue will likely involve adding new logic or components to the codebase. optimization labels Jan 25, 2016
@andrewrk andrewrk added this to the debut milestone Jan 25, 2016
@andrewrk
Copy link
Member Author

andrewrk commented Aug 12, 2016

This may be more than just an optimization problem. Consider:

pub struct List(T: type) {
    items: []T,
    len: usize,
    prealloc_items: [STATIC_SIZE]T,

    pub fn init() -> List(T) {
        var l: Self = undefined;
        l.items = l.prealloc_items[0...];
        l.len = 0;
        return l;
    }
}

fn basic_list_test() {
    var list = List(i32).init();
    defer list.deinit();
}

Here, we assign a pointer to l.items. The pointer is to the address of prealloc_items, which is on the stack. When we return, this pointer becomes invalid.

Note: I think this is the 2nd time I accidentally did this in the zig standard library, which caused a runtime memory corruption error, which I had to troubleshoot. This is the kind of code we don't want people to write on accident.

If we wanted this to work we would need to make this "return value is on caller's stack" thing more explicit, which I'm not sure we want to do. But it would look something like this:

fn make_foo(x: i32, y: i32) -> (foo: &Foo) {
    foo.x = x;
    foo.y = y;
}

But now this function assumes that there is memory available to write to. If you called the function with no assignment, we'd need a hidden stack allocation (a concept we already have) to provide the memory:

fn f() {
    make_foo(1, 2);
}

Another point is that if we had this different return value syntax, it implies that we should support tuples which is another can of worms. Maybe we should open it.

Let's say we went with this though, and we wanted the instance on the heap:

fn f() {
    const list = alloc_one(List(i32), 1);
    *list = List(i32).init();
}

Kind of awkward. Compare to:

fn heap() {
    const list = alloc_one(List(i32), 1);
    list.init();
}
fn stack() {
    var list: List(i32) = undefined;
    list.init();
}

The latter seems more uniform. Maybe? Now I'm not so sure.

But the point of this comment is that this optimization needs to be explicitly part of the syntax rather than something that sometimes works and is hidden when it doesn't work.

Another option that I am considering is that we could make byvalue struct parameters and return values a compile error. The user would be forced by the compiler to do the undefined/init method, which notably is always available even if we want the other method to be idiomatic. Which means that "only one way to do it" gently suggests disallowing structs as byvalue params and return types.

@andrewrk andrewrk changed the title return value optimization for structs figure out how by value return types should work Aug 12, 2016
@fsaintjacques
Copy link
Contributor

fsaintjacques commented Nov 3, 2016

var list: List(i32) = undefined;
list.init();

I'd parse this as a null dereference.

@kiljacken
Copy link

kiljacken commented Nov 3, 2016

Well, there no null dereferencing there. Just calling a function on uninitialized stack-allocated struct (which probably passes the struct as a pointer, but it won't be null).
Looks very reasonable to me.

@fsaintjacques
Copy link
Contributor

I am aware there's no null dereferencing. I'm trying to say is that undefined is a really poor keyword name. Maybe adding constructor or initializer, e.g.

var list: List(32) = List(32) { .len = 0, ... };

@kiljacken
Copy link

Yeah, undefined seems kind of weirds. As a plain keyword, uninitialized would probably convey the programmer's intent more clearly. I'm not so sure about a constructor/initializer style syntax, as it would require ways both to say: "the rest of the fields are zeroed" and "the rest of the fields are uninitialized".

@fsaintjacques
Copy link
Contributor

Oh no, this is just me being lazy, the initializer list would be forced to put all fields.

@fsaintjacques
Copy link
Contributor

So it does already support the initializer list construct.

@andrewrk
Copy link
Member Author

andrewrk commented Nov 8, 2016

Equivalent C (ignoring the generic stuff):

List list;
list_init(&list);

Translated to zig:

var list: List = undefined;
list.init();

This looks like a reasonable translation to me.

If it looks bad it's because we're playing with fire by using uninitialized memory. I would argue that your alarm is well deserved. Ideally we would construct the language design so that using undefined is an edge case.

As you mentioned on IRC, something that will help this a bunch is named return value. Then it could look like:

const list = List.init();

Let's look at the proposed initializer syntax:

var list = List(i32) { .len = 0, ... };

Status quo zig says you should write it this way instead:

var list: List(i32) = undefined; // you can also put `zeroes` instead of `undefined`
list.len = 0;

If you want a compile error if a field is added/removed to the struct, then use the initializer list which requires populating all fields (and you can specify some of the fields as undefined or zeroes).

But status quo I think correctly represents the dangers of undefined values.

As for undefined vs uninitialized, it's technically allowed to set something to undefined that was previously defined. This can have implications for optimizations as well as protecting against some mistakes by detecting undefined usage at compile time. Given this it makes more sense to use undefined rather than uninitialized. It makes sense because the compiler requires that variables are initialized to a value, and we tell the compiler that the value is undefined. It's a tad bit higher level of an abstraction as uninitialized. The compiler is allowed to initialize the value if it wants to, and in fact, in debug mode it does exactly that. In debug mode, all undefined bytes are set to 0xaa. Side note, we might want to allow disabling this for the purpose of tools such as valgrind which can emit an error when uninitialized memory is used. Although ideally we would somehow be able to flag this memset in debug info as uninitialized so that valgrind and other tools could track the value as undefined. That would require cooperation/standardization between zig and these other tools though.

@fsaintjacques
Copy link
Contributor

zeroes is a keyword?

@andrewrk
Copy link
Member Author

andrewrk commented Nov 8, 2016

@thejoshwolfe
Copy link
Contributor

Here's a proposal for named return values:

Functions can have one of two different return styles:

  • Unnamed return value. In this case the return type must meet some criteria, such as being primitive types or being small enough to fit into the ISA's return value registers or something. A void return type, including an implicit void return type, fits into this category. This is status quo.
  • Named return values. Any number of return values can be declared, and each must have a unique name just like parameters. In fact, these return types are effectively implemented as pointer parameters.

In both cases, any (non-void) return value must be accepted into variables rather than ignored. This is a change from status quo. The special _ variable name is available for storing values that are intended to be ignored.

The semantics of unnamed return values are unchanged. Every exit point from a function with a non-void unnamed return value must provide a value to return.

Analogously, all named return values must be fully initialized at every exit point from a function.

Static analysis for a function with named return values assumes the initial values of the return variables are undefined. The aliasing assumptions for named return values should work as though the named return values are pointer parameters.

Examples:

fn init1() -> (result: Foo) {
    result = Foo{
        .field1 = value1,
        .field2 = value2,
    };
}
fn init2() -> (result: Foo) {
    result.field1 = value1;
    result.field2 = value2;
}
fn init_equivalent(result: &Foo) {
    result.field1 = value1;
    result.field2 = value2;
}
fn init4() -> (result: Foo) {
    result.field1 = value1;
    // ERROR: field2 is not initialized
}
fn init5() -> (result: Foo) {
    return Foo{ // ERROR: attempt to return unnamed return value in function with named return values
        .field1 = value1,
        .field2 = value2,
    };
}
fn init6() -> Foo { // ERROR: struct Foo cannot be used as an unnamed return value
}

fn div(numerator: i32, denominator: i32) -> (quotient: i32, remainder: i32) {
    quotient = numerator / denominator;
    remainder = numerator % denominator;
}

fn main() {
    var foo = init1(); // type is inferred.
    init_equivalent(&foo); // this is status quo
    const foo2 = init1(); // const works too, which you can't do with status quo.
    _ = init1(); // _ can be any type
    init1(); // ERROR: cannot ignore return value

    var x: i32;
    var y: i32;
    x, y = div(3,1); // this is not general-purpose tuples. this just named return values.
    var x2, var y2 = div(3,2);
    const x2, const y2 = div(3,2);
    var x3, y = div(3,2);
    x, var y3 = div(3,2);
    x, _ = div(3,2);
    _, y = div(3,2);
    _, _ = div(3,2);
    var broken = div(3,2); // ERROR: wrong number of return values
    div(3,2); // ERROR: cannot ignore return values
}

In practice, here's a case where the error for ignoring return values will matter:

fn add_to_set(set: &Set, x: u32) -> bool {
    // return true if x was actually added, and false if x was already in the set.
}

fn main() {
    var set: Set = something();
    add_to_set(&set, 1); // ERROR: cannot ignore return value
    add_to_set(&set, 2); // ERROR: cannot ignore return value
    add_to_set(&set, 3); // ERROR: cannot ignore return value
    _ = add_to_set(&set, 1);
    _ = add_to_set(&set, 2);
    _ = add_to_set(&set, 3);
}

Even though it's "annoying" to have to type those _ ='s, I think this is a good idea. It makes the caller document the presence of a non-void return value, and explicitly ignore it.

@andrewrk
Copy link
Member Author

andrewrk commented Nov 8, 2016

I like the proposal. How would it account for error union return types?

@andrewrk
Copy link
Member Author

andrewrk commented Nov 8, 2016

Proposal:

error DivByZero;

fn div(numerator: i32, denominator: i32) -> %(quotient: i32, remainder: i32) {
    if (denominator == 0) return error.DivByZero;
    quotient = numerator / denominator;
    remainder = numerator % denominator;
}

fn main() -> %void {
    var x2, var y2 = div(3,2) %% ([]type{i32, i32}){0, 0};

    const x3, const y3 = %return div(3, 2);
}

Maybe multiple named return types are in fact just tuples?

@andrewrk
Copy link
Member Author

andrewrk commented Nov 9, 2016

New proposal from #212

  • add try construct
  • Remove error.Ok.
  • Types without the "copyable" property are not allowed to be passed by value as a parameter, and as a return type they must be named return values. We will have @setIsCopyable builtin available for container types to make them copyable. Non-copyable by default.
  • variable declarations are statements not expressions.
  • functions which have no return type are not void - they have a number of return values equal to 0.
  • error for unused return value, but you can throw away return value with _. Not even allowed to throw away void.
  • error for throwing away return value of function with no side-effects.
  • add %-> operator for functions which can return an error
  • functions can have multiple named return values
  • all expressions and return types have a number of expressions returned, possibly 0.
  • make assembly syntax return 0 or more expressions. Maybe remove the ability for assembly syntax to write to local variables and have the return values be the way to extract info.
  • Remove error union type.

New error function syntax, multiple return syntax, and try syntax:

error DivByZero;

fn div(numerator: i32, denominator: i32)
    %-> (quotient: i32, remainder: i32)
{
    if (denominator == 0) return error.DivByZero;
    quotient = numerator / denominator;
    remainder = numerator % denominator;
}

fn foo(c: i32, d: i32, condition: bool) {
    try (const num, const den = div(3, 2)) {
        // do something with num and den
    } else |err| {
        // do something with err
    }

    const a, const b = if (condition) {
        c, d
    } else {
        d, c
    };
}

Example of copyable concept:

struct Vec2 {
    x: f32,
    y: f32,
}
struct Vec3 {
    {@setIsCopyable(this, true);}

    x: f32,
    y: f32,
    z: f32,
}

// error: non-copyable type passed by value
fn thisIsBroken(v: Vec2) {}

// error: non-copyable types require named return values
fn thisIsAlsoBroken() -> Vec2 {}

fn thisIsOk(v: Vec3) {}
fn alsoOk() -> Vec3 {}
fn alsoOkWithVec2() -> (result: Vec2) {}

@andrewrk andrewrk changed the title figure out how by value return types should work multiple expression return values, error type redesign, introduction of copyable property of types Nov 9, 2016
@andrewrk
Copy link
Member Author

Here's a test I'm deleting. Reminding myself to port it to the new error syntax when that's done.

fn switchOnErrorUnion() {
    @setFnTest(this);

    const x = switch (returnsTen()) {
        Ok => |val| val + 1,
        ItBroke, NoMem => 1,
        CrappedOut => 2,
    };
    assert(x == 11);
}
error ItBroke;
error NoMem;
error CrappedOut;
fn returnsTen() -> %i32 {
    10
}

@thejoshwolfe
Copy link
Contributor

Inspired by #250, here's a usecase where this proposal gets awkward. This usecase does some very fancy first-class function stuff, which most languages are ill-equipped to handle:

fn workerThreadMain() {
    // we're not in the UI thread here.
    try (const width, const height = runInGuiThread(delegate)) {
        setDimensions(width, height);
    }
}
fn delegate() %-> (s32, s32) {
    // now we're in the UI thread
    if (comboBox.selectedIndex() == 1) {
        return widthSpinner.value(), heightSpinner.value();
    } else {
        return error.NotApplicable;
    }
}

// this is the overloading you might find in Java or C#
fn runInGuiThread(f: fn() -> var) -> f.resultType { }
fn runInGuiThread(f: fn() -> (var, var)) -> (f.resultTypes[0], f.resultTypes[1]) { }
fn runInGuiThread(f: fn() %-> var) %-> f.resultType { }
fn runInGuiThread(f: fn() %-> (var, var)) %-> (f.resultTypes[0], f.resultTypes[1]) { }

// maybe something like this would be better:
fn runInGuiThread(f: fn() -> ...) -> ...f.resultTypes { }
// or maybe this:
fn runInGuiThread(f: fn() %-> ...) %-> ...f.resultTypes { }

I haven't figured out how a generic function runner like that would be implemented, but #229 would probably help.

@andrewrk
Copy link
Member Author

andrewrk commented Feb 7, 2017

These examples are starting to get outside the range of syntax complexity I'm comfortable having. If it gets too weird I'd rather stick with single return values.

@andrewrk
Copy link
Member Author

andrewrk commented Feb 8, 2017

Did we ever consider this for multiple return values?

fn div(numerator: i32, denominator: i32) -> struct {quotient: i32, remainder: i32} {
    return this.ReturnType {
        .quotient = numerator / denominator,
        .remainder = numerator % denominator,
    };
}

To account for named return values:

fn div(numerator: i32, denominator: i32) -> (result: struct {quotient: i32, remainder: i32}) {
    result.quotient = numerator / denominator;
    result.remainder = numerator % denominator;
}

Now with an error:

error DivByZero;

fn div(numerator: i32, denominator: i32) -> %struct {quotient: i32, remainder: i32} {
    if (denominator == 0) return error.DivByZero;
    return this.ReturnType {
        .quotient = numerator / denominator,
        .remainder = numerator % denominator,
    };
}

Named return value with an error:

fn div(numerator: i32, denominator: i32) -> (result: %struct {quotient: i32, remainder: i32}) {
    if (denominator == 0) {
        result = error.DivByZero;
        return;
    }
    result = @typeOf(result).ChildType {
        .quotient = numerator / denominator,
        .remainder = numerator % denominator,
    };
}

That last one is pretty awkward. I'm not satisfied with it.

As a reminder, one of the main driving use cases of this issue is so that the pattern of struct initialization is instead of (status quo):

var list: List(i32) = undefined;
list.init();

We want instead:

var list = List(i32).init();

It's more than just aesthetics; we're trying to reduce the difference between compile-time code and run-time code, and the former cannot be used to initialize a global variable, while the latter can. So we're trying to make the common pattern work for both.

@andrewrk
Copy link
Member Author

andrewrk commented Feb 8, 2017

Another idea for that last one. We introduce another syntax for error unions and nullable types. The syntax lets you set the value to non-null/non-error, and a pointer to the payload (presumably to write through it). The contents of memory the pointer points to when you use this operation are undefined.

I'm also going to throw in there, that you can still return error.DivByZero if the type of the return value (this proposal requires there is always exactly one return value) is an error union.

fn div(numerator: i32, denominator: i32) -> (result: %struct {quotient: i32, remainder: i32}) {
    if (denominator == 0) return error.DivByZero;
    const payload = &%result; // result becomes non-error now, with undefined value payload
    payload.quotient = numerator / denominator;
    payload.remainder = numerator % denominator;
}

Similarly this would introduce the &? operation for nullable types.

This syntax seems to make sense because it's something you might want to do with a nullable type or an error union type anyway.

Downside is that it introduces another sigil, which seems to be a common complaint from people newly exposed to zig.

Also, even if we don't have multiple function return values, I think it still makes sense that you have to explicitly throw away void, and expressions can result in 0 or 1 values, instead of expressions using void to result in nothing.

@andrewrk
Copy link
Member Author

This proposal is too big and gnarly. Some of the issues it brought up are solved, some are no longer valid, and the rest are split up into other issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Solving this issue will likely involve adding new logic or components to the codebase.
Projects
None yet
Development

No branches or pull requests

4 participants