Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patterns desugaring can lead to overcapturing on VM #52805

Closed
mraleph opened this issue Jun 28, 2023 · 12 comments
Closed

patterns desugaring can lead to overcapturing on VM #52805

mraleph opened this issue Jun 28, 2023 · 12 comments
Assignees
Labels
area-front-end Use area-front-end for front end / CFE / kernel format related issues. cfe-encodings Encoding related CFE issues.

Comments

@mraleph
Copy link
Member

mraleph commented Jun 28, 2023

CFE uses late final variable with an initializer to achieve lazy evaluation and caching of temporary values. The initializer itself is represented by a closure (in accordance to late final representation which is currently used). This closure captures variables from the surrounding scope which can lead to potential over capturing if there are other closures in the same scope. Consider for example this code:

void Function() f(List<int> list) {
  return switch (list) {
    [final item] => () {  // (*)
      use(item);
    },
    [] => () {
    },
  };
}

The first closure will end up holding to list due to the late final field introduced by pattern desugaring.

It is in some sense duplicate of #36983, but we could consider if we want to change desugaring of late variables with initializers (e.g. simply duplicate the body of the initializer?)

/cc @alexmarkov @johnniwinther @chloestefantsova

@mraleph mraleph added the area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. label Jun 28, 2023
@rrousselGit
Copy link

I've seen folks do:

void Function() f(List<int> list) {
  return switch (list) {
    [final item] => () {
      use(item);
      use2(item);
    }(),
  };
}

To have a multi-line expression switch.

Would this suffer from the same issue?

@chloestefantsova
Copy link
Contributor

@mraleph, just to clarify, by duplication of the body of the initializer do you mean using the body of the initializing closure directly as the initializer of the intermediate lazy variable?

So that we'll replace the following code for your example (in Kernel)

    final synthesized core::List<core::int> #0#0 = list;
    function ##0#1#initializer() → core::int
      return #0#0.{core::List::length}{core::int};
    late final synthesized core::int #0#1 = ##0#1#initializer(){() → core::int};

with something like the following?

    final synthesized core::List<core::int> #0#0 = list;
    late final synthesized core::int #0#1 = #0#0.{core::List::length}{core::int};

@mraleph
Copy link
Member Author

mraleph commented Jun 29, 2023

@chloestefantsova

@mraleph, just to clarify, by duplication of the body of the initializer do you mean using the body of the initializing closure directly as the initializer of the intermediate lazy variable?

Sorry for the confusion I did not suggest that CFE should do it. I was suggesting that back-end does it (e.g. so it forcefully inlines initialization closure when the IL graph is constructed).

@rrousselGit

I've seen folks do:
Would this suffer from the same issue?

IIFL (immediately invoked function literals) don't live long (they become unreachable after being invoked) so they are not really affected by "overcapturing".

But IIFLs themselves introduce the same problem - if they capture variables that affects the context chain structure and can lead to surprising leaks.

@johnniwinther
Copy link
Member

johnniwinther commented Jul 10, 2023

@mraleph Would this be addressed if the CFE didn't use late final to encode this but instead put the "initializer" inline, like we do for cache variable with multiple initializers:

void Function() f(List<int> list) {
  return { // a block expression
    var #result;
    #outer: {
      var #0 = list;
      {
        final item;
        if (#0.length == 1 && 
            let #1 = item = #[0] in true) { // a let expression of the initializer
          #result = () {  // (*)
            use(item);
          };
          break #outer;
        }
      }
      {
        #result = () {
        };
        break #otuer;
      }
    }
  } => #result;
}

@mraleph
Copy link
Member Author

mraleph commented Jul 10, 2023

@johnniwinther yes, that would also solve it. If there are no closures involved then there is no problem.

@mraleph
Copy link
Member Author

mraleph commented Oct 19, 2023

Bump. Can we prioritize to fix this? As patterns get used in the wild this leads to sever code quality issues on the VM.

@johnniwinther
Copy link
Member

@mraleph Should we go for the suggested CFE fix?

@mraleph
Copy link
Member Author

mraleph commented Oct 19, 2023

I don't see any issues with that. @alexmarkov Alex do you have any opinion?

@alexmarkov
Copy link
Contributor

+1 to avoid using late variables in pattern desugaring, as @johnniwinther described in #52805 (comment).

@johnniwinther johnniwinther self-assigned this Nov 27, 2023
@mraleph
Copy link
Member Author

mraleph commented Nov 28, 2023

@rakudrama @alexmarkov I have chatted about this with @johnniwinther a bit and I have the following question: should we fix late variable implementation in the VM / dart2js instead to cover for the cases when people use late local variables themselves?

CFE can make sure they don't emit redundant closures, but they still could use late variables (instead of code duplication) to express caching. And then dart2js and VM could optimize late local variables.

@alexmarkov
Copy link
Contributor

@mraleph Always duplicating initializers of user-defined late variables could result in bloated code size as they can be arbitrarily complex. Maybe we should more aggressively inline closures and un-capture variables to deal with simple initializers (other code involving closures could also benefit from this optimization), or heuristically choose to duplicate simple (but not all) initializers on AST.

Patterns currently use late variables to cache simple expressions like property access a.b, so it makes sense to always duplicate cached pattern expressions instead of wrapping them into late variable initializers.

@a-siva a-siva added area-front-end Use area-front-end for front end / CFE / kernel format related issues. and removed area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. labels Nov 30, 2023
@a-siva
Copy link
Contributor

a-siva commented Nov 30, 2023

Since we are going with a CFE fix I have changed the area to front end

@johnniwinther johnniwinther added the cfe-encodings Encoding related CFE issues. label Dec 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-front-end Use area-front-end for front end / CFE / kernel format related issues. cfe-encodings Encoding related CFE issues.
Projects
None yet
Development

No branches or pull requests

6 participants