-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frontend should break up multiline selections that get sent to Console (by expression or by \n
)
#1326
Comments
I have determined the exact number of characters it fails on is suspiciously 4096 (or 2048 * 2, or 1024 * 4). Regardless, it is a suspicious number. Here is exact text where 1 character less does not fail, but this does: # This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It |
Oh hello there:
|
So it seems like we are actually refusing to send all of that input to R by not copying it over to R's read-console buffer, but then something freaks out later on since nothing got copied over. Ignoring the freakout for a minute, maybe we should look at what RStudio does to avoid this problem. @kevinushey am I right in understanding that RStudio takes all of that input and splits it on Might be a little work to implement on our side though. |
All of that code there is really a workaround for Python code insertion; I don't think it's related to the 4096 buffer size limit. In RStudio, it looks like we just silently drop characters over the buffer limit? E.g. if you try executing this:
you end up with a |
Ok maybe I didn't find the right place in the source, but it definitely seems like if you highlight a selection and send the selection to the console, then RStudio handles each line separately i.e. sending mtcars %>%
mutate(x = mpg + 1) gives me this in the debug logs
|
@kevinushey i do actually think that that
i.e. even before we had the python specific code in there, we had: which did the newline splitting. That has basically just been moved into the I have tracked it all the way back to this 2011 commit from Joe which does suggest that we split to avoid sending too much to R at once |
Some additional color here is that sending the lines one by one (instead of all at once) allows us to discard pending input after an error has occurred. This behavior is controversial but is currently the RStudio default. rstudio/rstudio#3014 |
In the ark meeting we determined that the idea of "breaking up the large selection into individual statements" should be done on the frontend (middleware? #1155) side of things instead of within ark itself. It sounds like ideally we'd have some middleware that could use tree-sitter on the typescript side to split up the selection into individual R statements that would get stored in a queue and sent off to R one at a time. This is a more language agnostic solution (besides needing a tree-sitter language implementation), and would mean that ark would not need to manage this queue itself. It is also quite nice because, as Jonathan mentioned, it would give us the option to discard the queue if we encountered an error (which would be optional behavior since it is controversial). The one action item that I do think we need to do for ark is to handle the case of a "too large" buffer a little better "Doing nothing" as we do there seems to actually be the cause of the crash, as it seems that either R or ark is expecting that we did actually send something over on the buffer. RStudio handles this edge case by trimming the input to the size of the buffer, and just sending everything that it can. We could probably start there, as that would at least avoid the crash on the ark side, and then let the middleware changes add the rest of the improvements. |
I'd just like to note that since tree-sitter is designed with the main goal of being an error-recovering parser, using it for this purpose might lead to unexpected behaviour in edge cases. Since that eval queue would be the heart of R interactions in positron, it should be as accurate and reliable as possible. A more robust alternative would be to build wasm parsers for each versions of R based on trimmed-down versions of |
\n
)
Came up again today on Bluesky: https://bsky.app/profile/tjmahr.com/post/3kwctblqx362o
|
Also here a couple days ago: https://x.com/meghansharris/status/1807022600251408552. It's a screenshot, so no text to capture here. But it's definitely this. |
I just got bit by this bug on Positron Version: 2024.07.0 build 125.
|
Also just reported on the discussion forum. #4262 |
@lionel- and I talked about this today, we have a few notes on how this should work. This expands on @jmcphers's comment here #4264 (comment) FrontendWe think the frontend should be in charge of splitting large selections into expressions to send to the backend. The practical way to do this is to try and reuse some existing behavior we already have for pending input:
positron/src/vs/workbench/services/positronConsole/browser/positronConsoleService.ts Lines 1944 to 1954 in f1d6e84
While this is currently being done for "pending input", it is not yet being done for the case where you send a selection of code to R while R is already idle. In practice, this technically means we could send multiple expressions to R at once, like 1 + 1; 2 + 2 That's two expressions on 1 line. But RStudio has been doing this for years, and it really doesn't cause any issues, so I am completely fine with that behavior. Input / Output groupingOne important bit that comes from this change is that it will positively impact how inputs and outputs are grouped together in the Positron console. Currently it looks like this when you send a whole selection to the console: That's because Positron isn't splitting anything. It treats the whole selection as one "input" and everything we get back as one "output" that goes with that input. But if Positron splits by expression then we will be able to replicate this much nicer RStudio behavior Expressions that are too darn longIn RStudio, the frontend actually splits by But this means that in Positron we can run into the following case: {
# a really big top level expression
} where this 1 expression is longer than R's buffer, which brings me to.... BackendTo support the case of really really long single expressions (which is honestly pretty common!), Ark itself is going to gain the ability to split that long complete expression by We think that will also improve the experience when Ark is used in other frontends besides Positron. |
This does have the advantage of working w/o any additional support from language packs, but it's very chatty. In the pathological case like the The other thing worth noting is that I think the queue of statements to be run ultimately needs to be moved -- an in-memory array of pending inputs gets wiped if you close your browser, or reload the window, but I think users will expect them to continue to be consumed by the backend (especially if they are queued b/c some long-running thing is going on). Some notes on that in #1155. |
Yea @lionel- thought we might need that too. Running the whole selection through R's C level |
Could there be a possible stop-gap measure to do something like "take current selection, write to temporary R file, and source the file"? I'm finding myself hitting this bug a lot. My current solution is to delete text that I don't want to run, sourcing the file, and then undoing the deletion (scary!) |
@kylebutts We're working on it! 😄 |
Because of the possibility that a single expression exceeds the input buffer size, we are going to prioritise fixing on the backend side: #4745. On the frontend side, the fix will mostly be a UX improvement, in particular to get correct interpolation of inputs and outputs with multiline selections. To support splitting inputs into complete expressions, Ark now has a new LSP request of type Here are the types involved: type ParseBoundaries = Array<ParseBoundary>
interface LineRange {
start: number;
end: number;
}
interface ParseBoundary {
kind: 'whitespace' | 'complete' | 'incomplete' | 'invalid';
range: LineRange;
data: ParseBoundaryData;
}
interface ParseBoundaryData {
[k: string]: unknown;
}
interface ParseBoundaryDataInvalid {
message: string;
} The boundaries are a vector of sorted ranges. Invariants:
More notes:
|
We now split by newlines on the backend side, which fixes the errors with overflowing inputs, but still need to split by expressions on the frontend side. We're now tracking the latter in #5272. |
To reproduce, create a .R file that contains 50 lines, each of which has this text on it:
# This is a long comment that takes a lot of space. It will be duplicated many times.
Then, start (or restart) R, such that the R session hasn't run any code yet.
Highlight the entire contents of the .R file and press Cmd+Enter. The R kernel exits instead of running the code.
Interesting facts:
The text was updated successfully, but these errors were encountered: