Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Untracked arguments in function scopes. #4232

Open
rzvxa opened this issue Jul 12, 2024 · 19 comments
Open

Untracked arguments in function scopes. #4232

rzvxa opened this issue Jul 12, 2024 · 19 comments
Assignees
Labels
C-bug Category - Bug

Comments

@rzvxa
Copy link
Contributor

rzvxa commented Jul 12, 2024

Related to this: #3374 (comment)

We do not keep track the arguments object in functions, It causes false positives in linters as of now and it can impose more serious limitations or errors in other crates(transformers and minifier come to mind as we might shadow the function arguments by mistake).

@rzvxa rzvxa added the C-bug Category - Bug label Jul 12, 2024
@rzvxa
Copy link
Contributor Author

rzvxa commented Jul 12, 2024

Since you are heavily invested in the scopes right now, @Dunqing What do you think?

@overlookmotel
Copy link
Contributor

Funny, I was wondering how we handle arguments just this morning. Sounds like we don't.

I can see 3 options:

  1. Add arguments binding to all function scopes eagerly.
  2. Add arguments binding to function scopes lazily, only once a reference to it is found in function body (or params).
  3. Special case arguments so IdentifierReference for it would have no SymbolId.

Personally, I would tend towards (2) - simple, and avoids perf hit of (1).

NB: arguments is almost like an invisible function param, but there is one oddity in sloppy mode:

function f1(x) {
  // SyntaxError: Identifier 'x' has already been declared
  let x;
}

function f2(arguments) {
  // SyntaxError: Identifier 'arguments' has already been declared
  let arguments;
}

function f3() {
  // No problem
  let arguments;
}

Also, you can have 2 bindings for arguments in a function:

function f(x, y = arguments) {
    let arguments = 456;
    console.log(y[0]); // Logs 123
    console.log(arguments); // Logs 456
}
f(123);

So we'd need an extra scope for function body.

@rzvxa
Copy link
Contributor Author

rzvxa commented Jul 12, 2024

I think the option 3 isn't a really good approach here. I'm torn between 1 and 2.

The latter seems like a good thing for performance but I'm afraid it would cause issues later on because javascript is whimsical at times:

function f() {
    eval("console.log(arguments)");
}

f("a", "b", "c")

With the second approach, we still have the same issue of shadowing arguments and generating the wrong code!

@overlookmotel
Copy link
Contributor

overlookmotel commented Jul 12, 2024

He had to go and bring eval() into it... 😄

But even then, I can't actually see how that can go wrong. eval() will (I assume) prevent mangling names of all bindings which it can "see". And we can just ban creating new bindings called arguments in the transformer (that's illegal in strict mode anyway).

@Boshen
Copy link
Member

Boshen commented Jul 12, 2024

@rzvxa
Copy link
Contributor Author

rzvxa commented Jul 12, 2024

He had to go and bring eval() into it... 😄

😆

But even then, I can't actually see how that can go wrong. eval() will (I assume) prevent mangling names of all bindings which it can "see". And we can just ban creating new bindings called arguments in the transformer (that's illegal in strict mode anyway).

You are right that we shouldn't ever mangle arguments if it isn't user-defined. I'm just saying since arguments are a context-sensitive identifier and javascript allows for weird ways to access variables through non-conventional ways it might cause issues further down the road.

Maybe no sane person uses new Function or eval in the production code, But javascript frameworks might generate such code and expect us to bundle it correctly for them.

I might be overthinking this.

@Dunqing
Copy link
Member

Dunqing commented Jul 13, 2024

We should add an argument symbol to the function scope. Both TypeScript and typescript-eslint do the same thing

@overlookmotel
Copy link
Contributor

What do you think of this example?

// Sloppy mode
function f(x, y = arguments) {
    let arguments = 456;
    console.log(y[0]); // Logs 123
    console.log(arguments); // Logs 456
}
f(123);

This demonstrates there can be 2 distinct bindings for arguments in a function. arguments in y = arguments references a different binding from let arguments = 456.

@rzvxa
Copy link
Contributor Author

rzvxa commented Jul 14, 2024

I was reading the eslint-scope tests today when saw an oddity I couldn't figure.

It contains arguments in the global scope. Your comment made me go and look it up, Turns out they always fake arguments in all non-function scopes. Here's my educated guess about how it is done there:

  • Step 1: It visits each node and starts adding their refs and IDs to the scope manager. After doing the same for the inner nodes it would then call close on the referencer.
  • Step 2: Close the scope(s) for the visited node.
  • Step 3: Try to resolve the identifier
  • Step 4: Do not resolve default parameters.
  • Step 5: If not resolved delegate it to the upper scope for resolution.
  • Step 6: Continue to hit the global scope.
  • Step 7: Resolve implicit variable declarations
  • Step 8: Call the vanilla super.__close at the end of global scope closing.
  • Step 9: Everything is valid at this stage so something like arguments would be registered as implicit/builtin/global/dangling identifier.

With these steps, your code would work similarly to this:

let arguments = [123];
function f(x, y = arguments) {
    let arguments = 456;
    console.log(y[0]); // Logs 123
    console.log(arguments); // Logs 456
}
f(123);

@overlookmotel
Copy link
Contributor

overlookmotel commented Jul 14, 2024

Treating arguments as external to the function just seems weird. And it can't handle the case where there's another arguments binding in scope above e.g.:

// Sloppy mode
let arguments;
function f(x, y = arguments) {
    let arguments = 456;
}

Here we have 3 bindings for arguments, and only 2 scopes to put them in. I know this is an edge case, and will be very rare in practice, but Oxc aims for absolute correctness.

Only solutions I can see are:

  1. Always add an extra scope for function body.
  2. Only add an extra scope for function body if there's a reference to arguments in function params.
  3. Treat implicit arguments as a special case and don't create a binding for it.

(1) is simpler and more conceptually correct, but imposes quite a large penalty. Every function generates 2 scopes, and functions are a common construct, so it will slow down symbol resolution significantly. (3) has been rejected. So I would lean towards (2).

@rzvxa
Copy link
Contributor Author

rzvxa commented Jul 14, 2024

I personally would prefer to find a way to avoid unnecessary scopes, Can't we declare arguments as a hoisted variable on the function scope? with that, we should be able to redeclare it in the function body.

But that's just my preference and I have nothing against your approach here, It just feels wrong to add scopes left and right as a means to simplify stuff. As you said correctness is more important. I'm not a huge fan of premature optimization.

I just have a quick question. We are wrapping the whole function in the scope right now. Is this going to change? Is the scope going to be entered after the parameters?

@overlookmotel
Copy link
Contributor

To clarify, what I'm saying about option (2) above is:

  • By default, only create 1 scope for the function (as now).
  • Add binding for arguments to that scope.
  • Only if arguments is referenced in the function's params, create a further scope for function body. This will be very rare.

i.e. Handle the weird uncommon edge case as a de-opt. No overhead from extra scope in common case, but maintain complete correctness.

@rzvxa
Copy link
Contributor Author

rzvxa commented Jul 14, 2024

To clarify, what I'm saying about option (2) above is:

* By default, only create 1 scope for the function (as now).

* Add binding for `arguments` to that scope.

* _Only if_ `arguments` is referenced in the function's params, create a further scope for function body. This will be very rare.

i.e. Handle the weird uncommon edge case as a de-opt. No overhead from extra scope in common case, but maintain complete correctness.

Actually, it's not as bad as I initially thought. I'm just afraid of something like this.

When you are accessing the Ancestors of a scope you have to check if that scope is a legit or a fake one to prevent issues in places when we want to get the "actual" parent with something like scope.ancestors(xxx).nth(1) or scope.get_parent_id(xxx).

It also means that there are now some really obscure edge cases with arguments that won't get tested, It is not just about parent scope, With this approach, you'd have some identifiers that are pointing to the parent of a function's body but aren't the actual scope of the enclosing node. I feel it would cause a lot of confusion and heisenbugs for both us(especially contributions that don't get reviews by people who are aware of this) and our crate users.

What are your thoughts on this? Even if we do all of these checks doesn't it mean we won't get much performance out of this in the long run(as we start to use scope more and more)?


Edit:

I've checked the spec and even there, They only create a new lexical environment if strict is false to prevent eval from creating a new binding. Otherwise, they would always use the lexical environment of the function's callee. I know these aren't apple to apple comparisons as environments are a runtime concept and aren't 1:1 with scopes. But it might help.

image

image

But maybe I'm missing something here.

https://tc39.es/ecma262/multipage/ordinary-and-exotic-objects-behaviours.html#sec-functiondeclarationinstantiation
https://tc39.es/ecma262/multipage/ordinary-and-exotic-objects-behaviours.html#sec-arguments-exotic-objects

@Dunqing
Copy link
Member

Dunqing commented Jul 19, 2024

I found out how TypeScript handles arguments. TypeScript adds an arguments symbol but does not add it in scopes. https://github.com/microsoft/TypeScript/blob/d8086f14b6b97c0df34a0cc2f56d4b5926a0c299/src/compiler/utilities.ts#L11361-L11380

@Boshen
Copy link
Member

Boshen commented Jul 19, 2024

I found out how TypeScript handles arguments. TypeScript adds an arguments symbol but does not add it in scopes. https://github.com/microsoft/TypeScript/blob/d8086f14b6b97c0df34a0cc2f56d4b5926a0c299/src/compiler/utilities.ts#L11361-L11380

Nice find. I also see tsc defines

    globals: SymbolTable;
    argumentsSymbol: Symbol;
    requireSymbol: Symbol;

@overlookmotel
Copy link
Contributor

TypeScript adds an arguments symbol but does not add it in scopes.

TS may do it, but it seems like a bad idea to me. Rather confusing, no?

@Boshen
Copy link
Member

Boshen commented Sep 23, 2024

For the record, I added this to the mangler

fn is_special_name(name: &str) -> bool {
matches!(name, "exports" | "arguments")
}

@overlookmotel
Copy link
Contributor

arguments can be renamed if it's a local const, let, or function declaration declared in function scope.

// Sloppy mode
function outer() {
    let arguments = 123;
    console.log(arguments);
}

function outer2() {
    function arguments() {}
    console.log(arguments);
}

But not if it's a var, or hoisted function declaration:

// Sloppy mode
function outer() {
    console.log(arguments); // Logs [1, 2, 3]
    var arguments = 456;
    console.log(arguments); // Logs 456
}
outer(1, 2, 3);

function outer2() {
    console.log(arguments); // Logs [1, 2, 3]
    {
        function arguments() {}
    }
    console.log(typeof arguments); // Logs 'function'
}
outer2(1, 2, 3);

@overlookmotel
Copy link
Contributor

Personally I'm in favour of adding scopes for function bodies anyway for other reasons (#5017), so that'd resolve some of the questions above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category - Bug
Projects
None yet
Development

No branches or pull requests

4 participants