Guard against tokenizing strings > 20K #557

PEZ · 2020-01-31T09:04:27Z

Work in progress...

What has Changed?

I've put a guard in lexer.ts to guard against tokenization of long lines. Right now using 20K length hardcoded, but probably should be using vscode's tokenization maxlength setting.

The current approach is to recognize when a line is too long and only represent it as a too-long-line token.

Fixes #556

My Calva PR Checklist

I have:

The Calva Team PR Checklist:

Before merging we (at least one of us) have:

Made sure the PR is directed at the dev branch (unless reasons).
Read the source changes.
Given feedback and guidance on source changes, if needed. (Please consider noting extra nice stuff as well.)
Tested the VSIX built from the PR (well, if this is a PR that changes the source code.)
- Tested the particular change
- Figured if the change might have some side effects and tested those as well.
- Smoke tested the extension as such.
If need be, had a chat within the team about particular changes.

Ping @PEZ, @kstehn, @cfehse, @bpringe

bpringe · 2020-01-31T16:38:12Z

Sounds like a good approach. I agree that using VS Code's setting would probably be ideal.

Fixes #556

PEZ · 2020-01-31T20:43:34Z

Now I fixed a proper scanner object for tokenising the too-long-strings.

I can't figure out how to get hold of the VSCode setting for tokenisation max-length though...

PEZ · 2020-02-01T13:15:55Z

Now added an initScanner function that needs to get called before lexing anything. It's a bit messy, but I think it is the right place to pass in the vscode max tokenisation setting.

bpringe · 2020-02-01T17:14:21Z

src/webview.ts

-import { ModelEdit, ModelEditSelection } from "./cursor-doc/model";
+import { ModelEdit, ModelEditSelection, initScanner } from "./cursor-doc/model";
+
+const MAX_LINE_TOKENIZATION_LENGTH = 20000;


Can we also use the vs code setting here? And do we want to?

It would be nice to do that, but it wasn't immediately obvious to me how to do it.

PEZ · 2020-02-12T07:44:43Z

We will have to make it prettier later. This tokenization freeze is causing too much troubles out there.

Guard against tokenizing strings > 20K

e252b79

PEZ added bug Something isn't working paredit Paredit and structural editing parsing labels Jan 31, 2020

Cook a working scanner object for too-long-line

850489b

Fixes #556

PEZ added 4 commits February 1, 2020 13:03

Merge branch 'dev' into fix-556-skip-tokenize-long-lines

48f67cc

Fixe merge mess in changelog

fa48418

Grab max-tokenization length from vscode settings

f2679c5

Add initScanner call to unit test mock document

0ceebc0

bpringe reviewed Feb 1, 2020

View reviewed changes

PEZ merged commit 58fa91b into dev Feb 12, 2020

PEZ deleted the fix-556-skip-tokenize-long-lines branch April 6, 2021 06:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guard against tokenizing strings > 20K #557

Guard against tokenizing strings > 20K #557

PEZ commented Jan 31, 2020 •

edited

Loading

bpringe commented Jan 31, 2020

PEZ commented Jan 31, 2020

PEZ commented Feb 1, 2020

bpringe Feb 1, 2020

PEZ Feb 1, 2020

PEZ commented Feb 12, 2020

Guard against tokenizing strings > 20K #557

Guard against tokenizing strings > 20K #557

Conversation

PEZ commented Jan 31, 2020 • edited Loading

What has Changed?

My Calva PR Checklist

The Calva Team PR Checklist:

bpringe commented Jan 31, 2020

PEZ commented Jan 31, 2020

PEZ commented Feb 1, 2020

bpringe Feb 1, 2020

Choose a reason for hiding this comment

PEZ Feb 1, 2020

Choose a reason for hiding this comment

PEZ commented Feb 12, 2020

PEZ commented Jan 31, 2020 •

edited

Loading