Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

functions inside functions (closures) #229

Closed
andrewrk opened this issue Jan 31, 2017 · 29 comments
Closed

functions inside functions (closures) #229

andrewrk opened this issue Jan 31, 2017 · 29 comments
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@andrewrk
Copy link
Member

It makes sense for there to be a function inside a function, as long as it doesn't require allocating any memory, and the function is only callable while the outer function is still running.

@andrewrk andrewrk added the enhancement Solving this issue will likely involve adding new logic or components to the codebase. label Jan 31, 2017
@phase
Copy link
Contributor

phase commented Jan 31, 2017

Would you generate this as a separate function in the IR or as a block?

@andrewrk
Copy link
Member Author

andrewrk commented Feb 1, 2017

Good question. Probably a separate function - that seems easier to do. Do you have any thoughts on the matter?

@phase
Copy link
Contributor

phase commented Feb 1, 2017

If the functions can access variables in the parent function's scope, then it would need to be implemented as a separate block. Function call overhead would also be removed (if there is any).

@andrewrk
Copy link
Member Author

andrewrk commented Feb 1, 2017

Consider that an inner function could call itself recursively.

If it were implemented as a function, then the compiler would create a struct for all the local variables in the outer function, and then pass that struct address to the inner function as a hidden parameter, and the inner function has access to the outer function's local variables.

If it were implemented as a block, the inner function's local variables would need to be a struct, and when you call the inner function, we allocate the stack space in the outer function's stack for the local variables of the inner function, and push the return address.

Both of these strategies would work fine, and I believe the function call overhead we would avoid by doing the block thing is exactly the same overhead we introduce by allocating space for the inner function's variables and pushing the return address.

I think the only reason to prefer the function strategy over the block strategy is that it might play more nicely with LLVM's optimizer. But that's just a hunch, and to be sure, an empirical test would be best.

@thejoshwolfe
Copy link
Contributor

I think it's important to put the word "closures" in the title of this issue, since this isn't just about functions with locally scoped names, but functions that have implicitly managed references to bound variables.

@AndreaOrru
Copy link
Contributor

AndreaOrru commented Apr 7, 2017

Proposal: convenient syntax for anonymous functions, or function literals. Useful and very common for callbacks and/or higher order functions.

const io = @import("std").io;

fn any(comptime T: type, array: []const T, pred: fn(T) -> bool) -> bool {
    for (array) |elem| {
        if (pred(elem)) return true;
    }
    return false;
}

pub fn main() -> %void {
    const array = []u64 {4096, 256, 16, 1};

    if (any(u64, array, x => x > 1000)) {
        %%io.stdout.printf("True\n");
    } else {
        %%io.stdout.printf("False\n");
    }
}

Returning functions should also be possible:

fn const(x: u64) -> (fn() -> u64) {
    return () => x;
}

Lambda functions can also capture and refer variables from the scope of the outer function. While non-capturing lambda functions can be thought as simple function pointers to global procedures, capturing requires also a pointer to the outer environment. We can think of this couple { funcptr, envptr } as a different type, say bound fn, analogous to delegate in D 2.0.

However, unlike D 2.0, and like D 1.0, I vote against the possibility to return delegates from functions:

fn foo() -> (bound fn() -> &u64) {
    var x: u64 = 10;

    return () => &x;
}

Implementing this behavior safely seems extremely difficult (impossible?) and even provably safe cases require the continuation to be saved somewhere in the heap. Having a core language feature depending on an allocator sounds like a bad idea, and introducing a potential source of unsafety even more so.

I argue that by only allowing fn and not bound fn to be returned from functions, we'd get most of the advantages and none of the disadvantages.

@thejoshwolfe
Copy link
Contributor

thejoshwolfe commented Apr 12, 2017

There are several topics here to discuss:

  • where can functions be declared? currently only at top-level-declaration scope, but we can add the syntax for functions declared somewhere somehow within the body of a function. at a minimum, this just means that you can cut+paste (and indent) a function from tld scope into the body of a function (or into any block) and the name would be scoped to that block. one question here is if forward declarations could be recognized, like javascript or Zig's tld, or if execution order defines the function, like python or vars in Zig's blocks.
  • can you declare anonymous functions? at a minimum, this is a simple syntactic change from fn foo() {} to const foo = <something>() {};. A change analogous to this happened to struct declarations in all containers anonymous #160. An anonymous function is called a lambda function. This could happen inside functions or at tld scope. This does not necessarily have anything to do with bound variables, which is called a closure.
  • what implicit parameters get passed to a function when you call it? this is a really complicated issue. we've already got some implicit parameter passing for "self" parameters. if we're going to support closures (which means bound variables), then that's more implicit parameter behavior. if there is some type like bound fn, what is it bound to? what is the actual data in the implementation? it's at least the function pointer, and then at a minimum a context pointer. if that's all, then the size is predictable, and APIs dealing with bound fn don't need comptime polymorphism (meaning different instantiations, like a C++ template). but if the context is just a pointer, where does it point?
  • do context objects get managed on the heap? this is "the possibility to return delegates from functions" mentioned above. This is contrary to Zig's design principal of no hidden allocations, so the answer to this question is definitive. no.
  • can a context pointer point to a stack frame, and should the compiler provide any help in assuring that the stack frame does not return while a context pointer is pointing to it? This is relevant to the "Returning functions should also be possible" example above. this would cater to the functional programming paradigm, the visitor design pattern, and other usecases.
  • can a bound fn be bound to more than just a context pointer? can it include a comptime variable amount of context? if the context is void, then it's effectively just an (unbound) fn. can we generalize all function pointers to have a comptime variable amount of context data bound to them?

Each one of the above bullets could be its own issue to discuss.

@tiehuis tiehuis added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Sep 15, 2017
@andrewrk andrewrk modified the milestones: 0.2.0, 0.3.0 Oct 19, 2017
@andrewrk andrewrk modified the milestones: 0.3.0, 0.4.0 Feb 28, 2018
@tgschultz
Copy link
Contributor

tgschultz commented Mar 27, 2018

I've been doing some thinking on this topic, so I thought I'd post some thoughts here for consideration.

Currently, the compiler lets you do this:

fn outer(comptime x: u32) (fn(u32) u32) {
    const InnerSt = struct {
        fn inner(y: u32) u32 {
            return y * x + 3;
        }
    };
    
    return InnerSt.inner;
}

Which I've used to create a sort of comptime closure in the past. Of course if I remove the 'comptime' restriction it crashes when called (even if called from within the outer fn). Compiler should probably stop me from doing that.

It'd be nice to skip the struct part, but personally I don't see much of a need for non-comptime closures or lambdas, but here's a way I think they could work:

Step 1: Anonymous functions, similar to how structs work:

const func = fn(x: u32) u32 { return x + 1; };
const FuncDef = fn(u32) u32;
const func2 = FuncDef |x| { return x + 1; };

filteredList = myList.filter( fn(x: u32) bool {return x > 5;} );

//alternatively
const List = struct {
    const Filter = fn(u32) bool;
    ...
    const filter = fn(self: &List, f: Filter) List {
        ...
        for(self.items) |i| {
            if(f(i)) newList.add(i);
        }
        ...
    };
};

filteredList = myList.filter( List.Filter |x| {return x > 5;} );

Step 2: Introduce a new kind of type: closure. A closure would internally be a struct with variables and a function pointer, but can be called like a function. When defining the closure, you specify which currently scoped variables get copied into the struct, they are given the same name. Some inference can make it less painful to write.

const outer = fn(x: u32) closure {
    return closure(x)->(y: u32) u32 {
        return y * x + 3;
    };
};

var func = outer(10);
var z = func(5);  //z == 53

This is functionally equivalent to:

const Closure_outer = struct {
    x: u32,
    f: fn(&Closure_outer, u32) u32,
};

const outer = fn(x: u32) Closure_outer {
   const f = fn(self: &Closure_outer, y: u32) u32 {
        return y * self.x +3;
    };
    return Closure_outer{.x = x, .f = f}; 
};

var func = outer(10);
var z = func.f(&func, 5);  //z == 53

You can allocate a closure on the heap (or wherever) by specifying an allocator:

const outer = fn(alloc: &std.mem.Allocator, x: u32) &closure {
    const MyClosure = closure(u32)->(u32) u32;
    var c = alloc.alloc(MyClosure, 1);
    c = MyClosure |x| -> |y| {
        return y * x +3;
    };
    return &c[0];
};

var func = outer(&myAllocator, 10);
var z = func(5);  //z == 53
myAllocator.free(func[0..1]);

@Hejsil
Copy link
Contributor

Hejsil commented Mar 27, 2018

@tgschultz
How would you handle the fact that different closures would need to be different sizes, and therefore different types? Here's a C++ example, since C++ does closures the same way your example descripes.

auto a(int i)  { return [=]() { return i; };      }
auto b(char c) { return [=]() { return int(c); }; }

int main() {
    auto       al  = a(5);
    typeof(al) al2 = a(5);
    typeof(al) bl  = b(5); // error: ... note: no known conversion for argument 1 from 'b(char)::<lambda()>' to 'const a(int)::<lambda()>&'
}

@Hejsil
Copy link
Contributor

Hejsil commented Mar 27, 2018

Arg! Played a little more around with C++ and my example is invalid, as all lambdas in C++ have different types (Idk why. Why doesn't two lambdas with the same closure have the same type?). The size problem is still there though:

#include <iostream>

auto a(int i)  { return [=]() { return i; };      }
auto b(char c) { return [=]() { return int(c); }; }

int main() {
    auto al = a(5);
    auto bl = b(5);

    std::cout << sizeof(al) << ' ' << sizeof(bl) << '\n';
}
// Output: 4 1

@tgschultz
Copy link
Contributor

tgschultz commented Mar 27, 2018

@Hejsil: closure is like struct in that way. Structs have different sizes depending on how they're defined. Here I define a closure type:

const MyClosure = closure(u32)->(u32) u32;

MyClosure is now basically the same as this struct:

const MyClosure = struct {
    _: u32,
    f: fn(&MyClosure, u32) u32,
};

later, it is initialized:

c = MyClosure |x| {
    return y * x +3;
};

I thought it'd make sense to use the payload operator here, since the type of the variable(s) the closure encloses is already defined and we're really just assigning a name for use in the code block.

When closure is used as the return type, that just means the specifics are inferred. I think this is how promise works, but maybe I'm misremembering.

@Hejsil
Copy link
Contributor

Hejsil commented Mar 27, 2018

@tgschultz Alright, so I would assume that all functions taking functions would need to be generic yes?

// This is just some made up syntax. Here we specify that we take any closure that has the
// function type `fn() void`. Behind the scene, it's generic like `var` arguments.
const a = fn(f: closure(?)->fn() void) void { f(); };

This also means that it's quite hard to store an array of closures. I guess one could store pointers to them, and when called, the functions would know internally how to interpret their closure.

var a: u8 = undefined;
var b: u16 = undefined;
var c: u32 = undefined;
const ca = closure(a)->fn() u32 { return a; };
const cb = closure(b)->fn() u32 { return b; };
const cc = closure(c)->fn() u32 { return c; };

// We can't store closures directly together in an array like this:
// const closures = [3]closure(?)->fn () u32 { ca, cb, cc };
// Because each closure are of a different size. Pointers should be fine though:
const closures = [3]&const closure(?)->fn () u32 { &ca, &cb, &cc };

@tgschultz
Copy link
Contributor

I'd say yes. If it takes closure, that's basically var. If it takes a specific closure type like closure(u32)->(u32) u32 then it must be that type. Passing a closure in place of an fn type would be right out.

And yeah, you wouldn't be able to have an array of closures of different types, as it'd be equivalent to having an array of different struct types. So you'd have to use a union or pointers.

@kyle-github
Copy link

I am trying to understand the proposed closure type. Is this used because you cannot tell the type of the closure itself? It almost sounds like an equivalent of C++ auto would be useful here.

@tgschultz if you could pass closures where functions were expected, you could do a lot of really interesting things. It is an open question whether that violates the Zen of Zig too much. Closures inherently hide things. It can enable some really clean idioms.

@tgschultz
Copy link
Contributor

tgschultz commented Mar 28, 2018

The closure type is not a type per-se, think of it like struct or promise, it's used to define a type. This is a closure type: closure(u32)->(u32) u32. It defines a closure that closes a u32 and a function that takes a u32 and returns a u32.

Where I use a naked closure as a return type or a function argument, it is basically the same as using var in this same context. It is comptime ducktyping and really I only included it in the proposal because it saves a lot of typing if you just infer the type. In that way, yeah, it's like auto.

Perhaps I have combined too many ideas here (there's also anonymous functions and initializing functions using a type alias), and I made a few mistakes in the example code, leading to some of the confusion.

The thing about passing a closure to something expecting a function is that a closure is actually a struct in this proposal. I suppose you can have the compiler transparently handle it via ducktyping, but I think that makes it difficult to reason about what's going on. And I can't think of a way it could work with extern, but maybe there is one.

If I'm honest, I'm not fond of the closure part of this proposal. I spent a lot of time thinking about how it could work and fit in with what I understand to be Zig's philosophy, and I just don't think the concept works well in that context. If you make it more obvious what's going on, you end up with my "functionally equivalent" code in the example, if you make it more transparent you make it difficult to reason about and potentially need to introduce hidden allocations.

I think allowing anonymous functions makes sense for consistency with how pretty much everything else in Zig works, and it would allow comptime closures without having to wrap it in an empty struct, and I think that's good enough, personally.

Maybe we should split this issue off into closures and anonymous functions?

@binary132
Copy link

binary132 commented May 6, 2018

@Hejsil, I don't know about C++, but I think that in Rust, the implicit Closure type actually generates a new struct type with its own name. So, even if the "method" signature is the same as another Closure, it is a different named type.

@haudan
Copy link

haudan commented Aug 8, 2018

Hi, someone new to Zig chiming in, with some ideas:

Coming from C, all I really would want from Zig are anonymous functions (not closures / no environment capturing), ideally as function-ptr expressions, ex.:

const double_i32 = fn(x: i32) i32 { return x * 2; };
comptime {
    assert(@typeOf(double_i32) == fn(i32)i32);
}

Anonymous function-ptr-expressions I think would be a great addition, they seem to harmonize with the zen and keep the code just as readable as if named functions were used. Possible use-case example:

fn inc(x: *i32) void { x.* += 1; }
addOperator('+', inc);
// vs
addOperator('+', fn(x: *i32) void { x.* += 1; });

Coming from C++ / Rust, I would also want comptime function parameters.
In the example below, func would uniquely identify a fn or a closure, it is not a pointer. Calling func would be a direct function call, without an additional pointer indirection.

fn doNTimes(comptime n: usize, comptime T: type, comptime func: fn(x: T) void, arg: T) {
    comptime var i: usize = 0;
    inline while (i < n) : (i += 1) {
        func(arg);
    }
}

// from doNTimes(3, i32, foo, 123)
fn doNTimes(arg: i32) {
    foo(arg);
    foo(arg);
    foo(arg);
}

Hower, this opens up a whole can questions:

  • Are we gonna support comptime-comptime-funcs, ie comptime func: comptime fn(x: T) void, which would make func only accept comptime functions?
  • What about variadic comptime functions?
  • Would adding traits to the language be benefitial for this kind of stuff?
  • Should we scrap the comptime fn parameter syntax in favour of type? If so, every functions needs an implicit type. How would we represent that in TypeInfo?

@thejoshwolfe
Copy link
Contributor

@Lisoph aside from anonymous functions and traits (which has its own issue), what you're asking about already exists. Notably comptime evaluation does everything you're describing and more (although we may remove variadic functions entirely, but that's a separate issue).

Someday we'll have better docs, examples, blog posts, video talks, etc. Github issues are pretty inefficient for teaching about the language.

@andrewrk andrewrk removed the enhancement Solving this issue will likely involve adding new logic or components to the codebase. label Nov 21, 2018
@andrewrk andrewrk modified the milestones: 0.4.0, 0.5.0 Nov 21, 2018
@adrusi
Copy link

adrusi commented Apr 28, 2019

I'm not completely sure how stack closures are implemented, but I don't think they need to be different sizes from one another. I think that a pointer to a stack frame (of the lexically containing function) and a pointer to a function should be sufficient.

fn main() void {
    // ...
    var a: f64 = 12.0;
    iter.map(x -> x * a);
}

becomes (and I'm making up some builtins that probably shouldn't exist):

fn _anonymous(frame: @FramePtr(main), x: f64) f64 {
    return x * @frameLookup(frame, "a");
}

fn main() void {
    // ...
    var a: f64 = 12.0;
    iter.map(Callback(f64, f64) {
        .context = @framePtr(),
        .function = _anonymous,
    });
}

I'd much prefer that the usage of closures tailor to only the most common uses; no syntax to specify by-value capturing, and definitely not heap-allocation. In order to accommodate less-common usage, the Callback(...) type could be made accessible to the programmer, with a documented simple calling convention. Non-closure functions should get automatically converted to closures — effectively a syntax transformation. Some examples:

// Custom callback structs
const CountClicksCtx = struct {
    nclicks: u32,
};
fn countClicks(ctx: *CountClicksCtx, event: *Event) void {
    if (event.kind != Event.Kind.Click) return;
    ctx.nclicks += 1;
}

fn main() !void {
    // ...
    const clickCounter = try arena.alloc(CountClicksCtx, 1);
    clickCounter.nclicks = 0;
    window.addListener(Callback(*Event, void) {
        .context = clickCounter,
        .function = countClicks,
    });
}
// Non-closure syntax sugar
fn double(x: f64) f64 {
    return x * 2;
}

fn main() void {
    // ...
    iter.map(double);
    // becomes:
    iter.map(x -> double(x));
}

I could be overlooking something, but I think Callback(...) could just be a generic struct with a "call" method in builtin. The varargs syntax might not be expressible in Zig source code, but it has to be a builtin regardless.

@andrewrk andrewrk modified the milestones: 0.5.0, 0.6.0 Jul 4, 2019
@distractedlambda
Copy link

distractedlambda commented Jul 13, 2019

Throwing in my thoughts (and apologies if I'm repeating anything already discussed):

  • I think having a proper syntax for comptime closures would be awesome, and on its own satisfy a lot of use cases. A comptime closure would be an anonymous function which implicitly captures all comptime things in scope by-value. Since all "captures" would be known at compile-time, a comptime closure would just be a function pointer.
  • I think the best way to do closures with runtime captures, would be to have all captures be implicit but by value. Nothing would stop you capturing the address of something you knew would live long enough, but you'd have to do so by explicitly taking the address of the object and then capturing the resulting pointer by value.
  • While being able to mutate a local var in a closure is useful, I'm not sure the use cases outweigh the potential complications. The only way I think implicit by-reference capture would be reasonable, is if the compiler could guarantee that you would never use such a capture outside its lifetime, and that starts to require nontrivial escape analysis.

Edit

After looking through @tgschultz's earlier post, I guess comptime closures basically exist in Zig today, just without a nice syntax. I think a proper syntax is a no-brainer addition to the language, but that would probably have to coincide with function-definitions-as-expressions.

@fengb
Copy link
Contributor

fengb commented Feb 25, 2020

I hacked together userland closures: https://github.com/fengb/zig-closure

No idea if this is actually usable in practice...

@adrusi
Copy link

adrusi commented Jun 15, 2020

I also hacked together userland closures, although it looks like with quite a different goal from fengb.

https://gist.github.com/adrusi/c32e8133914fab5800d0eea7e3d6be15

tl;dr

test "closures" {
    var x: i32 = 60;

    const foo = closure(struct {
        x: *i32,
        pub fn call(self: *const @This(), y: i32) i32 {
            self.x.* += y;
            return 420;
        }
    } { .x = &x });

    assert(foo.call(.{ @as(i32, 9) }) == 420);
    assert(x == 69);
}

@SpexGuy
Copy link
Contributor

SpexGuy commented Nov 3, 2020

After discussing with @andrewrk and @thejoshwolfe, we've decided that function literals (as accepted in #1717) should not be able to implicitly capture any state. It would be too easy to accidentally close over a pointer to stack memory and clobber your stack. It would also cause a problem where closures are often half-comptime (function pointer known) and half-runtime (runtime closed-over data), which the type system doesn't have a good way of handling.

Additionally, closures tend to encourage a highly functional programming style. While there are some merits to this style, it often obscures where memory is stored or what the associated lifetime is. The explicit capture syntax in C++ sidesteps this problem to some degree, but only works well because of C++'s use of constructors and destructors. A Zig equivalent would need to be more explicit, probably in ways that are dependent on the specific use case. It also forces the optimizer to choose between two terrible options: duplicate functions which accept closures (causing potentially large code bloat), or pass runtime function pointers (damaging cpu performance). Because of these drawbacks, we feel that the language should not take on additional complexity to support this style. However, note that #1717 alongside Zig's support for runtime function pointers and anonymous tuples make it possible to implement type safe abstractions that suffer less from these drawbacks.

For those reasons, we are closing this issue. The workaround for this use case is to use a stateless lambda alongside a state object, and pass the state object to the lambda when it is invoked.

@SpexGuy SpexGuy closed this as completed Nov 3, 2020
@andrewrk andrewrk modified the milestones: 0.8.0, 0.7.0 Nov 6, 2020
@kyle-github
Copy link

To revisit this a bit, I noted in issue #6965 that some parts of this (not general closures!) do solve the problem there.

The general idea:

Allow a limited form of nested function.

  • Nested functions can access variables in their enclosing scope (TBD whether that is just the enclosing scope or all the way up to their enclosing function).
  • This means three things that impact size and performance:
    • Any time a nested function is passed, the called function may need to be split and regenerated such that it takes the nested function pointer and an additional pointer (or a fat pointer) to the enclosing activation record at the time of the call. In general this would mean that there could be two (and I think only two) versions of functions that are called with nested functions, and only if the function is called both with and without a nested function.
    • Access of variables in the enclosing scope within a nested function will take a performance hit because they will need to reference through the passed activate record pointer.
    • Access of variable in the enclosing scope will likely not be through registers (unless the compiler gets smarter about this).
  • It will be illegal to explicitly take the address of a nested function (it is meaningless without the runtime activation record info).
  • It will be illegal to assign a nested function in any way outside of its declaring scope other than passing it as a function argument.
  • Interaction of nested functions and async functions is TBD. Splitting the async function may be a possibility but that seems complicated to manage.

These constraints are somewhat onerous, but GCC manages this just fine, so there is an existence proof. I have not done investigation with Godbolt or other to see what GCC does for optimization.

The key problem here is that without the restrictions on assignment etc. a nested function could escape to be used outside of its defining call tree. The prohibition on taking the address and on assignment aim to solve that. I have not thought through all the edge cases to see if it is still a problem with these restrictions.

@frmdstryr
Copy link
Contributor

closures tend to encourage a highly functional programming style. While there are some merits to this style, it often obscures where memory is stored or what the associated lifetime is.

I'm glad this was the decision because this would have been really bad news for anyone using zig on low cost/power embedded devices. Some projects completely disallow any heap allocations.

@htqx
Copy link

htqx commented Apr 11, 2024

I need anonymous functions and don't want to be wrapped in a struct. Closures are very complicated and can be omitted. I support zig to keep it simple and close to the implementation logic of c. For those who really need closures, you can turn all the variables you need to capture into explicit parameters, so just anonymous functions are needed.

const f = struct {
fn f()void{}
}.f;

// i want
const f = fn()void{};

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests