-
-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Guard against tokenizing strings > 20K #557
Conversation
Sounds like a good approach. I agree that using VS Code's setting would probably be ideal. |
Now I fixed a proper scanner object for tokenising the too-long-strings. I can't figure out how to get hold of the VSCode setting for tokenisation max-length though... |
Now added an |
import { ModelEdit, ModelEditSelection } from "./cursor-doc/model"; | ||
import { ModelEdit, ModelEditSelection, initScanner } from "./cursor-doc/model"; | ||
|
||
const MAX_LINE_TOKENIZATION_LENGTH = 20000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also use the vs code setting here? And do we want to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to do that, but it wasn't immediately obvious to me how to do it.
We will have to make it prettier later. This tokenization freeze is causing too much troubles out there. |
Work in progress...
What has Changed?
I've put a guard in
lexer.ts
to guard against tokenization of long lines. Right now using 20K length hardcoded, but probably should be using vscode's tokenization maxlength setting.The current approach is to recognize when a line is too long and only represent it as a
too-long-line
token.Fixes #556
My Calva PR Checklist
I have:
dev
branch. (Or have specific reasons to target some other branch.)master
. (Sorry for the nagging.)ci/circleci: build
test. (For now you'll need to opt in to the CircleCI New Experience UI to see the Artifacts tab, because bug.)[Unreleased]
entry inCHANGELOG.md
, linking the issue(s) that the PR is addressing.The Calva Team PR Checklist:
Before merging we (at least one of us) have:
dev
branch (unless reasons).Ping @PEZ, @kstehn, @cfehse, @bpringe