Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to chevrotain 7 #423

Closed
clementdessoude opened this issue Aug 30, 2020 · 25 comments · Fixed by #616
Closed

Upgrade to chevrotain 7 #423

clementdessoude opened this issue Aug 30, 2020 · 25 comments · Fixed by #616

Comments

@clementdessoude
Copy link
Contributor

Why ?

We should try to have latest versions of our libs to include easily their improvements (bugs, enhancements, etc.)

What ?

I tried to upgrade, but ran into an error:

jhipster/prettier-java/node_modules/chevrotain/lib/src/parse/parser/traits/recognizer_engine.js:184
            throw e;
            ^

TypeError: Cannot read property 'location' of undefined
    at JavaParser.TreeBuilder.cstPostNonTerminal (/Users/clementdessoude/Documents/dev/jhipster/prettier-java/node_modules/chevrotain/lib/src/parse/parser/traits/tree_builder.js:160:73)
    at JavaParser.cstPostNonTerminal (/Users/clementdessoude/Documents/dev/jhipster/prettier-java/packages/java-parser/src/parser.js:81:11)
    at JavaParser.RecognizerEngine.subruleInternal (/Users/clementdessoude/Documents/dev/jhipster/prettier-java/node_modules/chevrotain/lib/src/parse/parser/traits/recognizer_engine.js:423:18)
    at JavaParser.RecognizerApi.SUBRULE (/Users/clementdessoude/Documents/dev/jhipster/prettier-java/node_modules/chevrotain/lib/src/parse/parser/traits/recognizer_api.js:72:21)
    at JavaParser.<anonymous> (/Users/clementdessoude/Documents/dev/jhipster/prettier-java/packages/java-parser/src/productions/interfaces.js:241:7)
    at JavaParser.invokeRuleWithTry (/Users/clementdessoude/Documents/dev/jhipster/prettier-java/node_modules/chevrotain/lib/src/parse/parser/traits/recognizer_engine.js:117:33)
    at JavaParser.wrappedGrammarRule [as annotation] (/Users/clementdessoude/Documents/dev/jhipster/prettier-java/node_modules/chevrotain/lib/src/parse/parser/traits/recognizer_engine.js:131:38)
    at JavaParser.RecognizerEngine.doSingleRepetition (/Users/clementdessoude/Documents/dev/jhipster/prettier-java/node_modules/chevrotain/lib/src/parse/parser/traits/recognizer_engine.js:385:16)
    at JavaParser.RecognizerEngine.manyInternalLogic (/Users/clementdessoude/Documents/dev/jhipster/prettier-java/node_modules/chevrotain/lib/src/parse/parser/traits/recognizer_engine.js:318:29)
    at JavaParser.RecognizerEngine.manyInternal (/Users/clementdessoude/Documents/dev/jhipster/prettier-java/node_modules/chevrotain/lib/src/parse/parser/traits/recognizer_engine.js:295:21)
@clementdessoude
Copy link
Contributor Author

@bd82 do you think you could have a look ? I have to admit that I am a bit lost on the origin of the issue.

Can we add a bounty on this @pascalgrimaud ?

@pascalgrimaud pascalgrimaud added $$ bug-bounty $$ https://www.jhipster.tech/bug-bounties/ $200 labels Jun 19, 2021
@pascalgrimaud
Copy link
Member

Starting with 200, which can be increased if it's a lot of work

@bd82
Copy link
Contributor

bd82 commented Jun 19, 2021

Chevrotain version 9.x is out

@bd82
Copy link
Contributor

bd82 commented Jun 19, 2021

There was a conditional inside the cstPostNonTerminal method.
It seems to have been removed in newer versions

Try putting the super call inside the conditional check in the override of the JavaParser

@bd82
Copy link
Contributor

bd82 commented Jun 19, 2021

Not sure a bounty is needed for this...

clementdessoude added a commit to clementdessoude/prettier-java that referenced this issue Jun 19, 2021
@clementdessoude
Copy link
Contributor Author

Thanks a lot @bd82, it looks indeed that your suggestion works !

Even if it was so much work, you deserve the bounty for all the great work you did on this lib, so if you want to take it, it's yours !!

@bd82
Copy link
Contributor

bd82 commented Jun 19, 2021

I don't have a paypal account :)

@clementdessoude
Copy link
Contributor Author

clementdessoude commented Jun 19, 2021

It seems that upgrading to chevrotain >=7 increase parser execution by 3 (for a file over 6000 lines) :/

image

@pascalgrimaud
Copy link
Member

If it reduces the execution time, it is a good news, isn't it? Don't understand your smiley @clementdessoude

@clementdessoude
Copy link
Contributor Author

If it reduces the execution time, it is a good news, isn't it? Don't understand your smiley @clementdessoude

Increase, sorry

@pascalgrimaud
Copy link
Member

Oh in this case, it is not a good news indeed

@bd82
Copy link
Contributor

bd82 commented Jun 19, 2021

Is that for most test files or just for that single one?

@clementdessoude
Copy link
Contributor Author

clementdessoude commented Jun 19, 2021

Comparing the logs between these two pipeline, it seems a common behaviour:

For instance ConfigFileApplicationListener was parsed in 56ms in the first pipeline, and 140ms in the second one

It is approximately the same on my computer

@bd82
Copy link
Contributor

bd82 commented Jun 19, 2021

There are not supposed to be any performance regressions, (definitively none so large).
The main performance hog with Java Parser is the backtracking done to keep the grammar strongly aligned with the specifications.
Perhaps somehow the way that "smart" backtracking has been implemented its not as efficient as with newer versions of Chevrotain?

@bd82
Copy link
Contributor

bd82 commented Jun 19, 2021

Possible things that can go wrong:

  • Maybe this.outputCst = false; does not disable as many things as it did in 6.x (what caused the execption)
    so more work is being done while backtracking?
  • Maybe isRecognitionException() behaves differently and the block does not return?

I'd think the first one is more likely

@bd82
Copy link
Contributor

bd82 commented Jun 19, 2021

Is the performance drop happening with the update to 7.x or to 9.x?

@clementdessoude
Copy link
Contributor Author

Is the performance drop happening with the update to 7.x or to 9.x?

It's in the update to 7.x.

@bd82
Copy link
Contributor

bd82 commented Jun 20, 2021

I will try to investigate it, but not sure when.

@bd82
Copy link
Contributor

bd82 commented Nov 4, 2021

I've finally have some time to start looking into it.

I've had trouble with the local dev env because of windows + node-gyp + python.

What I'm doing now is:

  • running the benchmark on the folder below:
    image

  • But without the latest released version, only against the master version. (because I could not get niv to work).

    • note this it sometimes seems to me that benchmarking two versions of the similar library will give more consistent results if each library is run in a different process,.
  • and on this branch: https://github.com/clementdessoude/prettier-java/tree/chore/upgrade-chevrotain

  • and changing the chevrotain version in the package.json and executing yarn between benchmarks.

I can detect a performance regression switching from 6.5 to 7.0, but not x3 slower, rather 25-30% reduction.
I am not sure why is that as with a quick look I can't spot any commit hat could have cause it between 6.5 and 7.0

Perhaps it is some shenanigans inside of V8 engine's optimization...

Perhaps I can try npm linking chevrotain and doing a "binary" search to detect the "offensive" commit.

@bd82
Copy link
Contributor

bd82 commented Nov 4, 2021

I think I was mistaken, the regression I noticed was likely between 6.5.0 and 7.12 (latest 7.x)
As I can not reproduce a performance regression between 6.5.0 -> 7.0.0

But I can now repeatedly reproduce a performance regression between 7.1.0 -> 7.1.1

@bd82
Copy link
Contributor

bd82 commented Nov 5, 2021

Okay this is the culprit: it was a refactor which made the exceptions use ES6 syntax and inheritance.

What is happening is that due to the "naive" backtracking nature of the parser (no memorization)

  • Backtracking because we wanted to stick as close to the spec as possible (at cost of performance).

A great many exceptions are thrown, when the parser tries and retries backtracking again and again.
This seems to be quite extreme.

For samples under: `/packages/java-parser/samples/spring-framework/spring-expression/src/main/java/org/springframework/expression/spel/

Which are not that much code:

github.com/AlDanial/cloc v 1.90  T=0.20 s (579.8 files/s, 91927.9 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Java                           118           2363           5101          11246
-------------------------------------------------------------------------------
SUM:                           118           2363           5101          11246
-------------------------------------------------------------------------------

The RecognitionException constructor gets called exactly: 263251 times.
Which is approx 23.4 per "real" line of code.

So when the exception classes became more complex:

  1. class inheritance
  2. additional logic in the constructor

It negatively impacted the performance.

  • Note that I only saw 30-40% performance r, regressions, but it could depend on the specific set of files
    and the node engine used.

@bd82
Copy link
Contributor

bd82 commented Nov 5, 2021

Possible Solutions

In Chevrotain revert the exceptions implementation to the old style

Problems:

  • Won't be available quickly, as Chevrotain's master branch currently contains breaking changes as work on a new major release is being done.
  • While I would like to simplify the exceptions (no real need for class inheritance), the scenario of throwing 24 exceptions per line is not exactly "common" so how much should it be optimized for on the library level?
  • Some of the logic added to the Exception constructor may have actually been needed to solve some issues with stack traces.

In java-parser implement a simple hack

Basically use something like https://github.com/ds300/patch-package to apply a patch on the exceptions_public.js file
in chevrotain. I am not sure if the java-parser needs any information of the state of the exceptions other than the .message property, so a very simple alternative implementation could be provided.

  • Maybe even just replace with the same file.
  • alternatively if you can prove you only need the message and no other state, perhaps an optimized version could be "injected" which would provide a speed boost.

This should be easy to perform and not too fragile as the exceptions file does not often change.

In java-parser attempt to implement memoization on the backtracking logic

This is basically trying to improve the parsing backtracking algorithm.
It is quite likely the parser could be several times or even an order of magnitude faster if it was smarter on how it backtracks.

  • Edit: the above statement may be too "strong", lets instead say I suspect there is substantial room for improvements.

But this will require more research to prove and time investment to implement.

I will try to find time to open a separate issue with more details on this tomorrow

@bd82
Copy link
Contributor

bd82 commented Jul 13, 2023

I suspect that using the LL(*) plugin for chevrotain could get rid of much of the backtracking used in the Java-Parser.

@msujew WDYT?

Example of backtracking production:

  • $.RULE("isLocalVariableDeclaration", () => {
    $.MANY(() => {
    $.SUBRULE($.variableModifier);
    });
    $.SUBRULE($.localVariableType);
    $.SUBRULE($.variableDeclaratorId);
    const nextTokenType = this.LA(1).tokenType;
    switch (nextTokenType) {
    // Int x;
    case t.Semicolon:
    // Int x, y, z;
    case t.Comma:
    // Int x = 5;
    case t.Equals:
    return true;
    default:
    return false;
    }
    });

Usage of the backtracking production:

  • $.RULE("blockStatement", () => {
    const isLocalVariableDeclaration = this.BACKTRACK_LOOKAHEAD(
    $.isLocalVariableDeclaration
    );
    const isClassDeclaration = this.BACKTRACK_LOOKAHEAD($.isClassDeclaration);
    $.OR({
    DEF: [
    {
    GATE: () => isLocalVariableDeclaration,
    ALT: () => $.SUBRULE($.localVariableDeclarationStatement)
    },
    {
    GATE: () => isClassDeclaration,
    ALT: () => $.SUBRULE($.classDeclaration)
    },
    {
    ALT: () => $.SUBRULE($.interfaceDeclaration)
    },
    { ALT: () => $.SUBRULE($.statement) }
    ],
    IGNORE_AMBIGUITIES: true
    });
    });

Backtracking wrapper logic

  • BACKTRACK_LOOKAHEAD(production, errValue = false) {
    return this.ACTION(() => {
    this.isBackTrackingStack.push(1);
    // TODO: "saveRecogState" does not handle the occurrence stack
    const orgState = this.saveRecogState();
    try {
    // hack to enable outputting none CST values from grammar rules.
    this.outputCst = false;
    return production.call(this);
    } catch (e) {
    if (isRecognitionException(e)) {
    return errValue;
    }
    throw e;
    } finally {
    this.outputCst = true;
    this.reloadRecogState(orgState);
    this.isBackTrackingStack.pop();
    }
    });
    }

@msujew
Copy link

msujew commented Jul 13, 2023

@bd82 I've used a modified version of the parser used in this project for my thesis. I could probably contribute a PR which takes care of this.

Edit: Tried it, but just removing all backtracking isn't possible. The grammar is ambiguous in a lot of places and a lot of tests are failing on me.

@bd82
Copy link
Contributor

bd82 commented Jul 14, 2023

Thanks @msujew

Do you have any estimate or guess regarding what percentage of these lookaheads can be solved with LL*
and what percentage cannot?

I am trying to understand if it may still be worth using the LL(*) plugin in this project.

jtkiesel added a commit to jtkiesel/prettier-java that referenced this issue Nov 6, 2023
jtkiesel added a commit to jtkiesel/prettier-java that referenced this issue Nov 6, 2023
jtkiesel added a commit to jtkiesel/prettier-java that referenced this issue Nov 6, 2023
jtkiesel added a commit to jtkiesel/prettier-java that referenced this issue Nov 11, 2023
jtkiesel added a commit to jtkiesel/prettier-java that referenced this issue Nov 12, 2023
jtkiesel added a commit to jtkiesel/prettier-java that referenced this issue Nov 26, 2023
jtkiesel added a commit to jtkiesel/prettier-java that referenced this issue Nov 26, 2023
jtkiesel added a commit to jtkiesel/prettier-java that referenced this issue Nov 26, 2023
jtkiesel added a commit to jtkiesel/prettier-java that referenced this issue Dec 30, 2023
clementdessoude pushed a commit that referenced this issue Jan 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants