Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserve directive prologues fixes #5 #7 #6

Merged
merged 3 commits into from
Apr 27, 2022
Merged

Preserve directive prologues fixes #5 #7 #6

merged 3 commits into from
Apr 27, 2022

Conversation

blutorange
Copy link
Collaborator

@blutorange blutorange commented Apr 18, 2022

This fixes #5 and also includes the fix for #7 since I need it for this. While the relevance of strict directives may be doubtful for modules, correctness is still a goal and we should still preserve the original code and only order imports.

For references, this code

// below is a directive prologue
  'use custom' ; /* more directives... */ 'enable typecheck'
                 'forbid IE'

import './SetupEnvironment';
import type { Period } from './Period'

"this is not a directive prologue";

function foo() {
}

now becomes

// below is a directive prologue
"use custom";
/* more directives... */

"enable typecheck";
"forbid IE";

import "./SetupEnvironment";

import type { Period } from "./Period";

("this is not a directive prologue");

function foo() {}

and the following

; 'use strict'

import './SetupEnvironment';
import type { Period } from './Period'

"this is not a directive prologue";

function foo() {
}

still becomes

import "./SetupEnvironment";

import type { Period } from "./Period";

("use strict");

("this is not a directive prologue");

function foo() {}

since an empty statement is not a directive prologue according to the spec.

The most important goal is correctness. Directives were destroyed by placing
imports above it. Directive prologues are defined by the spec as

> A Directive Prologue is the longest sequence of ExpressionStatements occurring as
> the initial StatementListItems or ModuleItems of a FunctionBody, a ScriptBody, or
> a ModuleBody and where each ExpressionStatement in the sequence consists entirely
> of a StringLiteral token followed by a semicolon. The semicolon may appear explicitly
> or may be inserted by automatic semicolon insertion (12.9). A Directive Prologue may
> be an empty sequence.
Previously, a regex string replacement was used, and regex is an inadequate
tool for transforming entire JavaScript code files. Instead, this commit
changes that to a proper algorithm that removes the code at the source position
ranges of the nodes to be removed. This will not work properly if the Babel
parser ever did not return the correct source positions, but that is not a
regression since the former algortihm already relied on the positions
being correct.
@blutorange blutorange changed the title Draft: Preserve directive prologues, fixes #5 Draft: Preserve directive prologues fixes #5 #7 Apr 18, 2022
The comments are added back as part of the new directives placed at the top, so
we should remove them from the original code we add after the directives and the
imports.
@blutorange blutorange marked this pull request as ready for review April 18, 2022 19:25
@IanVS
Copy link
Owner

IanVS commented Apr 19, 2022

I can't give this a thorough review right now, but I wonder, is there a way to check if there is a performance impact to this change?

@blutorange
Copy link
Collaborator Author

We could of course do it manually, not sure if we can add performance tests. But regarding how it "should" affect the performance:

  • Preserving directive prologues should not affect it all all, since I'm only using the result from the parse that's already being done
  • Other than that, I only changed how the imports are removed from the code. Before, a Regex was created for each import, then run in the code. Now, I build a list of ranges to replace (which calls sort on the list of ranges, but that's around O(n log n) and the input size is only the number of imports present in the file), do a bunch of substrings to extract the parts to keep, then join those parts back together.

So I wouldn't expect this to have an effect on performance (if anything it could improve it due to less Regexes). But perhaps we should take a large JS file and compare the times it takes to format, just so we can rule out an unexpected bottleneck.

@blutorange blutorange changed the title Draft: Preserve directive prologues fixes #5 #7 Preserve directive prologues fixes #5 #7 Apr 26, 2022
@blutorange
Copy link
Collaborator Author

@IanVS Do you have time to take a look at this in the next few days? Otherwise do you mind if I do a quick test myself and then merge?

@IanVS
Copy link
Owner

IanVS commented Apr 26, 2022

I'll give it a quick glance through today and approve. Curious about perf, but I don't think that's a blocker.

@IanVS
Copy link
Owner

IanVS commented Apr 26, 2022

@blutorange In reading the spec, I see that it says:

A Directive Prologue may be an empty sequence.

I'm not so familiar with the spec, is an empty sequence different from an empty statement? I guess an empty sequence is ""?

@blutorange
Copy link
Collaborator Author

@IanVS As far as I understand it

A Directive Prologue is the longest sequence of ExpressionStatements [...] where each ExpressionStatement in the sequence consists entirely of a StringLiteral token followed by a semicolon.

An empty sequence is like the empty set, so that's just their way of saying that having the directive prologue is optional.

This is a source file with an empty directive prologue list, consisting of an empty statement and a function declaration statement.

;
function foo() {}

This is a source file with a single-element directive sequence, and the value of the directive is the empty string.

"";
function foo() {}

This is also source file with a single-element directive sequence. (The semicolon is inserted via the process of automatic semicolon insertion)

""
function foo() {}

It doesn't really matter though, since we aren't doing the parsing ourselves and just rely on the parser being implemented correctly. I just added some tests for these cases just in case somebody gets the idea to implement the parsing manually ; )

PS: You can use e.g. https://astexplorer.net/ to see how a piece of code parses.

Copy link
Owner

@IanVS IanVS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this out in my own project, and it seems to make no difference in performance, prettier takes roughly 14.5 seconds to check the formatting either way, and I did not get any formatting problems when running this branch. Nicely done!

* @param directives All directive prologues from the original code (e.g.
* `"use strict";`).
* @param interpreter Optional interpreter directives, if present (e.g.
* `#!/bin/node`).
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments are helpful, thanks.

//
// |-----------xxxxx-----xxxx-----xxxx-----------|
// ^---------^
// This part
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I've come to the realization that otherwise even I myself won't understand what's going on when I look at code weeks or months later ; )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

'use strict' pushed below imports
2 participants