Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(mangler): reuse variable names #8562

Merged
merged 8 commits into from
Jan 25, 2025

Conversation

sapphi-red
Copy link
Contributor

@sapphi-red sapphi-red commented Jan 17, 2025

Changed the mangler to reuse variable names where possible.

This will reduce the code size as shorter variable names can be used in more places. But requires global information and limits parallelism in a single file and requires more memory.

@github-actions github-actions bot added A-semantic Area - Semantic A-minifier Area - Minifier C-enhancement Category - New feature or request labels Jan 17, 2025
Copy link
Contributor Author


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@sapphi-red sapphi-red force-pushed the 01-17-feat_mangler_reuse_variable_names branch from dd44926 to 4d6c3ee Compare January 17, 2025 08:04
Copy link

codspeed-hq bot commented Jan 17, 2025

CodSpeed Performance Report

Merging #8562 will degrade performances by 74.8%

Comparing 01-17-feat_mangler_reuse_variable_names (98a95b4) with main (8587965)

Summary

❌ 3 regressions
✅ 30 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
mangler[antd.js] 6 ms 14.2 ms -57.43%
mangler[react.development.js] 139.4 µs 290.1 µs -51.94%
mangler[typescript.js] 9.6 ms 38 ms -74.8%

@sapphi-red sapphi-red force-pushed the 01-17-feat_mangler_reuse_variable_names branch from 4d6c3ee to 1b9026f Compare January 17, 2025 08:11
@sapphi-red
Copy link
Contributor Author

The perf regression looks really bad, but if you take a step back and see the overall minifier timings, it's not that big regression.

antd: 196.8ms (6.4ms is 3.3%)
react: 1.8ms (125.1µs is 7.0%)
typescript: 344.6ms (24.9ms is 7.2%)

@overlookmotel
Copy link
Contributor

overlookmotel commented Jan 17, 2025

Just to raise one other concern:

The addition of scope_id field to Reference will cause difficulties in transformer. In transformer, we update semantic data as we mutate the AST, to keep AST + semantic in sync.

This is quite difficult already, and if Reference has a scope_id field, we'd need to update that field too whenever we move an IdentifierReference into a different scope, or e.g. wrap some code in an IIFE (which introduces a new scope). Both of these are fairly common operations in the transformer.

The changes to TraverseScoping::create_bound_reference and create_unbound_reference in this PR aren't correct - often we are creating a Reference in a different scope from the current one.

FYI: The bigger picture is: Currently we run semantic analysis multiple times during the pipeline. The hope (well certainly my hope) is that we can eventually make all stages of the pipeline keep semantic data in sync with AST changes. Then we'll be able to remove these extra semantic passes, which will be a large perf improvement.


The performance issue needs to be looked at first. But if we decide that it's worth it (very possibly it is), I would suggest the following:

  • Split the introduction of Reference::scope_id field into a separate PR.
  • Add checks for correctness of that field to the transformer conformance checker (which checks semantic data is correct after transform), and see how many errors we have due to incorrect Reference::scope_id.
  • Assess how much work required to handle updating this extra state in transformer.

It will of course be achievable with some effort, but it'd be useful to understand how much effort before we go ahead. Maybe it's easier than I think!

cc @Dunqing


Last thing: I'm not very familiar with the mangler. Could you possibly give a brief explanation of how this algorithm works? I'm wondering if any other way to achieve the effect without the Reference::scope_id field.

@Dunqing
Copy link
Member

Dunqing commented Jan 20, 2025

This is a significant minification improvement. I've thought about this before, but I have the same concern as @overlookmotel.

I am guessing the smaller output is more important than performance here? So I think we can add an extra AST pass to collect identifier references that belong to which scope, this way we can only change in the mangler crate.

@sapphi-red sapphi-red force-pushed the 01-17-feat_mangler_reuse_variable_names branch from 1b9026f to 510e592 Compare January 20, 2025 03:56
@sapphi-red
Copy link
Contributor Author

The addition of scope_id field to Reference will cause difficulties in transformer. In transformer, we update semantic data as we mutate the AST, to keep AST + semantic in sync.

Yeah, I had that concern while writing this code. I went this way at that time to see how much this change would improve the minification rate.
@Dunqing 's idea of collecting the scope_ids of the references before running the mangler sounds worth exploring.

Could you possibly give a brief explanation of how this algorithm works?

This PR introduces a concept of "liveness". This is the set of scopes that a given variable / slot is used in.
For example, if we have the following code:

var top_level_a = 0;
var top_level_b = 1;

function foo() {
  var foo_a = 1;
  console.log(top_level_b, foo_a);
}

function bar() {
  var bar_a = 1;
  console.log(top_level_b, bar_a);
}

console.log(top_level_a, foo(), bar())

Each liveness of the variables will be:

  • top_level_a: {root}
  • top_level_b: {root, foo function, bar function}
  • foo_a: {foo function}
  • bar_a: {bar function}
  • foo: {root}
  • bar: {root}

Since top_level_a is only used in the root scope and foo_a is only used in the foo function and bar_a is only used in the bar function, foo_a and bar_a can shadow top_level_a.
In other words, since the liveness of top_level_a and foo_a and bar_a does not overlap, they can be assigned to the same slot.

For this example, the slots assigned to the variables will be:

  • slot 0: top_level_a
  • slot 1: top_level_b, foo_a, bar_a
  • slot 2: foo
  • slot 3: bar

@Boshen Boshen force-pushed the 01-17-feat_mangler_reuse_variable_names branch from 510e592 to faaa1ee Compare January 24, 2025 10:42
@Boshen Boshen marked this pull request as ready for review January 24, 2025 10:43
@Boshen Boshen requested a review from Dunqing as a code owner January 24, 2025 10:43
@Boshen
Copy link
Member

Boshen commented Jan 24, 2025

@sapphi-red To be honest I spent the last hour and still don't comprehend any of this ... probably due to my lack knowledge of JavaScript.

May I ask you a favor. Can you update the top level documentation, reference all reading materials, and write a line by line explanation for your newly added code? Just like how I did it with the next one ... so the next person can understand what's going in.

I'll do some micro optimizations.

And also ... is the debug mode still useful or should it be adjusted?

@Boshen
Copy link
Member

Boshen commented Jan 24, 2025

CodSpeed Performance Report

Merging #8562 will degrade performances by 75.24%

Comparing 01-17-feat_mangler_reuse_variable_names (faaa1ee) with main (b977678)

Summary

❌ 3 regressions ✅ 29 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
mangler[antd.js] 6 ms 14.6 ms -58.67%
mangler[react.development.js] 139.2 µs 295.6 µs -52.91%
mangler[typescript.js] 9.6 ms 38.7 ms -75.24%

oxc  main ❯ hyperfine './target/release/examples/mangler --nospace --mangle ./target/typescript.js' './target/release/examples/minifier --nospace --mangle ./target/typescript.js'
Benchmark 1: ./target/release/examples/mangler --nospace --mangle ./target/typescript.js
  Time (mean ± σ):     142.2 ms ±   1.1 ms    [User: 129.1 ms, System: 11.4 ms]
  Range (min … max):   140.9 ms … 145.8 ms    20 runs

Benchmark 2: ./target/release/examples/minifier --nospace --mangle ./target/typescript.js
  Time (mean ± σ):     137.5 ms ±   0.5 ms    [User: 125.2 ms, System: 10.6 ms]
  Range (min … max):   136.4 ms … 138.3 ms    21 runs

Summary
  ./target/release/examples/minifier --nospace --mangle ./target/typescript.js ran
    1.03 ± 0.01 times faster than ./target/release/examples/mangler --nospace --mangle ./target/typescript.js

Only 5ms for typescript.js

@sapphi-red sapphi-red force-pushed the 01-17-feat_mangler_reuse_variable_names branch from 9c5f846 to db394c0 Compare January 25, 2025 05:00
@sapphi-red
Copy link
Contributor Author

I added a comment that describes the algorithm and added some test cases. I hope that my explanation makes sense, but let me know if there's any places that are unclear.

@Boshen Boshen merged commit 6589c3b into main Jan 25, 2025
28 checks passed
@Boshen Boshen deleted the 01-17-feat_mangler_reuse_variable_names branch January 25, 2025 06:00
Boshen added a commit that referenced this pull request Jan 26, 2025
## [0.48.1] - 2025-01-26

### Features

- b7f13e6 ast: Implement utf8 to utf16 span converter (#8687) (Boshen)
- 6589c3b mangler: Reuse variable names (#8562) (翠 / green)
- 29bd215 minifier: Minimize `Infinity.toString(radix)` to `'Infinity'`
(#8732) (Boshen)
- e0117db minifier: Replace `const` with `let` for non-exported
read-only variables (#8733) (sapphi-red)
- 9e32f55 minifier: Evaluate `Math.sqrt` and `Math.cbrt` (#8731)
(sapphi-red)
- 360d49e minifier: Replace `Math.pow` with `**` (#8730) (sapphi-red)
- 2e9a560 minifier: `NaN.toString(radix)` is always `NaN` (#8727)
(Boshen)
- cbe0e82 minifier: Minimize `foo(...[])` -> `foo()` (#8726) (Boshen)
- e9fb5fe minifier: Dce pure expressions such as `new Map()` (#8725)
(Boshen)

### Bug Fixes

- 0944758 codegen: Remove parens from `new (import(''), function() {})`
(#8707) (Boshen)
- 33de70a mangler: Handle cases where a var is declared in a block scope
(#8706) (翠 / green)
- d982cdb minifier: `Unknown.fromCharCode` should not be treated as
`String.fromCharCode` (#8709) (sapphi-red)
- e7ab96c transformer/jsx: Incorrect `isStaticChildren` argument for
`Fragment` with multiple children (#8713) (Dunqing)
- 3e509e1 transformer/typescript: Enum merging when same name declared
in outer scope (#8691) (branchseer)

### Performance

- dc0b0f2 manger: Remove useless `tmp_bindings` (#8735) (Dunqing)
- e472ced mangler: Optimize handling of collecting lived scope ids
(#8724) (Dunqing)
- 8587965 minifier: Normalize `undefined` to `void 0` before everything
else (#8699) (Boshen)

### Refactor

- 58002e2 ecmascript: Remove the lifetime annotation on
`MayHaveSideEffects` (#8717) (Boshen)
- 10e5920 linter: Move finishing default diagnostic message to
`GraphicalReporter` (#8683) (Sysix)
- 52a37d0 mangler: Simplify initialization of `slots` (#8734) (Dunqing)
- 6bc906c minifier: Allow mutating arguments in methods called from
`try_fold_known_string_methods` (#8729) (sapphi-red)
- bf8be23 minifier: Use `Ctx` (#8716) (Boshen)
- 0af0267 minifier: Side effect detection needs symbols resolution
(#8715) (Boshen)
- 32e0e47 minifier: Clean up `Normalize` (#8700) (Boshen)
- c792068 semantic: Simplify `ScopeTree::iter_bindings` (#8723)
(Dunqing)

### Testing

- 03229c5 minifier: Fix broken tests (#8722) (Boshen)

Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-minifier Area - Minifier A-semantic Area - Semantic C-enhancement Category - New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants