-
-
Notifications
You must be signed in to change notification settings - Fork 395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFE: streaming input processing #501
Comments
To be honest, your use case seems very niche for a potentially large amount of work. Your workaround seems more practical. |
Not sure I agree with very niche, but I guessed it might be a lot of work. FWIW I can't immediately think of a case for which streaming/chunked wouldn't be the right thing to do though. But nevermind, I had already started implementing a workaround for this in CLI only, finished it and opened #502 in case you'd be interested in that. Unfortunately as expected it does add a bunch of lines just for this purpose. If you don't like that approach, do you think we should document the current behavior and perhaps add the above script as an example somewhere instead? |
It is definitely niche within the context of what Chroma is used for, which is syntax highlighting source code. It's vanishingly rare that source code is large enough to be an issue in terms of buffering. For example, the amalgamated sqlite3.c is 8MB and this can be entirely loaded into RAM and lexed by Chroma without issue. The issue with buffering in chunks as you suggest is that it is slower in the common case due to a couple of reasons:
Another option would be to use an |
I think the PR is a good compromise - I don't mind adding this complexity to the command line tool for this purpose. |
Thanks for taking the time to explain. |
What problem does this feature solve?
Currently chroma seems to require entire source contents to process to be in memory, which is not memory efficient, and places limits in how large files can be processed.
What feature do you propose?
Process input as a stream instead of requiring a string; use a reader, process it in say 1024 or 2048 byte chunks.
CLI with
--fail
could exit if a lexer cannot be determined already after examining just the first chunk or whatever smallish is deemed enough.CLI case/consideration
A particular problem for the CLI when used as a
less
preprocessor is that it's not that uncommon to pass e.g. large compressed archives to less, andlesspipe
or the like will then output archive contents file listing as its output. Passing large compressed archives to the chroma CLI doesn't do anything good, and the newly added--fail
flag doesn't help because the reading happens too early for that to kick in.Incidentally, the current slurping behavior was why I didn't document the specifics how to use the CLI as
less
preprocessor initially when implementing--fail
, but failed to remember that when asked to document it. Currently I'm using this as my~/.lessfilter
to avoid hitting the issue:The text was updated successfully, but these errors were encountered: