Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guard against tokenizing strings > 20K #557

Merged
merged 6 commits into from
Feb 12, 2020
Merged

Conversation

PEZ
Copy link
Collaborator

@PEZ PEZ commented Jan 31, 2020

Work in progress...

What has Changed?

I've put a guard in lexer.ts to guard against tokenization of long lines. Right now using 20K length hardcoded, but probably should be using vscode's tokenization maxlength setting.

The current approach is to recognize when a line is too long and only represent it as a too-long-line token.

Fixes #556

My Calva PR Checklist

I have:

  • Read How to Contribute.
  • Made sure I am directing this pull request at the dev branch. (Or have specific reasons to target some other branch.)
  • Made sure I am changed the default PR base branch, so that it is not master. (Sorry for the nagging.)
  • Tested the VSIX built from the PR (well, if this is a PR that changes the source code.) You'll find the artifacts by clicking Show all checks in the CI section of the PR page, and then Details on the ci/circleci: build test. (For now you'll need to opt in to the CircleCI New Experience UI to see the Artifacts tab, because bug.)
    • Tested the particular change
    • Figured if the change might have some side effects and tested those as well.
    • Smoke tested the extension as such.
  • Referenced the issue I am fixing/addressing in a commit message for the pull request.
  • Updated the [Unreleased] entry in CHANGELOG.md, linking the issue(s) that the PR is addressing.

The Calva Team PR Checklist:

Before merging we (at least one of us) have:

  • Made sure the PR is directed at the dev branch (unless reasons).
  • Read the source changes.
  • Given feedback and guidance on source changes, if needed. (Please consider noting extra nice stuff as well.)
  • Tested the VSIX built from the PR (well, if this is a PR that changes the source code.)
    • Tested the particular change
    • Figured if the change might have some side effects and tested those as well.
    • Smoke tested the extension as such.
  • If need be, had a chat within the team about particular changes.

Ping @PEZ, @kstehn, @cfehse, @bpringe

@PEZ PEZ added bug Something isn't working paredit Paredit and structural editing parsing labels Jan 31, 2020
@bpringe
Copy link
Member

bpringe commented Jan 31, 2020

Sounds like a good approach. I agree that using VS Code's setting would probably be ideal.

@PEZ
Copy link
Collaborator Author

PEZ commented Jan 31, 2020

Now I fixed a proper scanner object for tokenising the too-long-strings.

I can't figure out how to get hold of the VSCode setting for tokenisation max-length though...

@PEZ
Copy link
Collaborator Author

PEZ commented Feb 1, 2020

Now added an initScanner function that needs to get called before lexing anything. It's a bit messy, but I think it is the right place to pass in the vscode max tokenisation setting.

import { ModelEdit, ModelEditSelection } from "./cursor-doc/model";
import { ModelEdit, ModelEditSelection, initScanner } from "./cursor-doc/model";

const MAX_LINE_TOKENIZATION_LENGTH = 20000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also use the vs code setting here? And do we want to?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to do that, but it wasn't immediately obvious to me how to do it.

@PEZ PEZ merged commit 58fa91b into dev Feb 12, 2020
@PEZ
Copy link
Collaborator Author

PEZ commented Feb 12, 2020

We will have to make it prettier later. This tokenization freeze is causing too much troubles out there.

@PEZ PEZ deleted the fix-556-skip-tokenize-long-lines branch April 6, 2021 06:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working paredit Paredit and structural editing parsing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants