Frontend should break up multiline selections that get sent to Console (by expression or by `\n`) #1326

jmcphers · 2023-09-15T21:02:54Z

To reproduce, create a .R file that contains 50 lines, each of which has this text on it:

# This is a long comment that takes a lot of space. It will be duplicated many times.

Then, start (or restart) R, such that the R session hasn't run any code yet.

Highlight the entire contents of the .R file and press Cmd+Enter. The R kernel exits instead of running the code.

Interesting facts:

There is definitely a specific threshold at which this starts happening. For example, it won't reproduce if you only run 30 lines from the above test file.
The kernel isn't crashing! It's exiting, and even exiting normally.

Executable module set to "/Users/jmcphers/git/positron/extensions/positron-r/resources/ark/ark".
Architecture set to: arm64-apple-macosx-.
(lldb) c
Process 75332 resuming
Process 75332 exited with status = 0 (0x00000000)

The text was updated successfully, but these errors were encountered:

DavisVaughan · 2023-09-27T17:01:13Z

I have determined the exact number of characters it fails on is suspiciously 4096 (or 2048 * 2, or 1024 * 4). Regardless, it is a suspicious number. Here is exact text where 1 character less does not fail, but this does:

# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It

DavisVaughan · 2023-09-27T17:06:27Z

Oh hello there:

[R] 2023-09-27T16:54:10.946936000Z [ark-unknown] INFO crates/ark/src/interface.rs:728: Error: input too large for buffer.

https://github.com/posit-dev/amalthea/blob/b3177e04ed54829eca385a597c89f37581fd17dc/crates/ark/src/interface.rs#L725-L730

DavisVaughan · 2023-09-27T17:46:01Z

So it seems like we are actually refusing to send all of that input to R by not copying it over to R's read-console buffer, but then something freaks out later on since nothing got copied over. Ignoring the freakout for a minute, maybe we should look at what RStudio does to avoid this problem.

@kevinushey am I right in understanding that RStudio takes all of that input and splits it on \n and then sends the inputs one at a time to R? I guess this would help circumvent the 4096 buffer size limit.
https://github.com/rstudio/rstudio/blob/e32fb38f34af7cf3fd0fff630dba41cf79b69df5/src/cpp/session/SessionConsoleInput.cpp#L210

Might be a little work to implement on our side though.

kevinushey · 2023-09-27T18:18:17Z

All of that code there is really a workaround for Python code insertion; I don't think it's related to the 4096 buffer size limit.

In RStudio, it looks like we just silently drop characters over the buffer limit? E.g. if you try executing this:

"################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################"

you end up with a continue prompt because the trailing quote was lost.

kevinushey · 2023-09-27T18:19:21Z

https://github.com/rstudio/rstudio/blob/e32fb38f34af7cf3fd0fff630dba41cf79b69df5/src/cpp/r/session/RStdCallbacks.cpp#L355-L360

DavisVaughan · 2023-09-27T19:24:40Z

Ok maybe I didn't find the right place in the source, but it definitely seems like if you highlight a selection and send the selection to the console, then RStudio handles each line separately

i.e. sending

mtcars %>%
  mutate(x = mpg + 1)

gives me this in the debug logs

2023-09-27T19:22:58.446645Z [rsession-davis] DEBUG Handle console_input:     mtcars %>%

2023-09-27T19:22:58.449792Z [rsession-davis] DEBUG Handle console_input:       mutate(x = mpg + 1)

https://github.com/rstudio/rstudio/blob/e32fb38f34af7cf3fd0fff630dba41cf79b69df5/src/cpp/session/SessionConsoleInput.cpp#L351

DavisVaughan · 2023-09-27T19:38:01Z

@kevinushey i do actually think that that fixup*() function does two things:

splits on newlines
handles python fixups

i.e. even before we had the python specific code in there, we had:
rstudio/rstudio@2f49492#diff-38d04249284e40942a016b67b733d4a998fa6659e096636dd5893f2ebbe68584L250

which did the newline splitting. That has basically just been moved into the fixup*() helper.

I have tracked it all the way back to this 2011 commit from Joe
rstudio/rstudio@7f23dee

which does suggest that we split to avoid sending too much to R at once

DavisVaughan · 2023-09-27T19:44:33Z

Another bit of proof about R sending selections 1 line at a time is the difference in the console output between RStudio and Positron.

For Positron, sending a selection to the console looks like you typed in all of the lines in a multi-line prompt directly at the cursor. And all of the output is returned in bulk at the end.
For RStudio, you get a new prompt for each line of input, and the output for each line of input is displayed before the next line of input is processed, even though you sent them all at once.

jmcphers · 2023-09-27T20:06:30Z

Some additional color here is that sending the lines one by one (instead of all at once) allows us to discard pending input after an error has occurred. This behavior is controversial but is currently the RStudio default. rstudio/rstudio#3014

DavisVaughan · 2023-10-03T17:08:01Z

In the ark meeting we determined that the idea of "breaking up the large selection into individual statements" should be done on the frontend (middleware? #1155) side of things instead of within ark itself.

It sounds like ideally we'd have some middleware that could use tree-sitter on the typescript side to split up the selection into individual R statements that would get stored in a queue and sent off to R one at a time.

This is a more language agnostic solution (besides needing a tree-sitter language implementation), and would mean that ark would not need to manage this queue itself.

It is also quite nice because, as Jonathan mentioned, it would give us the option to discard the queue if we encountered an error (which would be optional behavior since it is controversial).

The one action item that I do think we need to do for ark is to handle the case of a "too large" buffer a little better
https://github.com/posit-dev/amalthea/blob/810d5325a4499bf8e4196071f63c04e2162e302b/crates/ark/src/interface.rs#L725-L730

"Doing nothing" as we do there seems to actually be the cause of the crash, as it seems that either R or ark is expecting that we did actually send something over on the buffer. RStudio handles this edge case by trimming the input to the size of the buffer, and just sending everything that it can. We could probably start there, as that would at least avoid the crash on the ark side, and then let the middleware changes add the rest of the improvements.

lionel- · 2023-10-04T06:15:40Z

It sounds like ideally we'd have some middleware that could use tree-sitter on the typescript side

I'd just like to note that since tree-sitter is designed with the main goal of being an error-recovering parser, using it for this purpose might lead to unexpected behaviour in edge cases. Since that eval queue would be the heart of R interactions in positron, it should be as accurate and reliable as possible.

A more robust alternative would be to build wasm parsers for each versions of R based on trimmed-down versions of gram.y / gram.c. These parsers would have limited functionality compared to the tree-sitter parser but would allow precise parsing in a way that's 100% compatible with the R runtime to ensure alignment (i.e. if we send an expression we know for sure there's not going to be a parser error or an incomplete parse on the runtime side).

jmcphers · 2024-07-03T15:07:33Z

Came up again today on Bluesky: https://bsky.app/profile/tjmahr.com/post/3kwctblqx362o

wait i think the block was just too big. if i use a 500-line list() for my targets, i get

Error:
! Can't pass console input on to R, it exceeds R's internal console buffer size.
This is a Positron limitation we plan to fix. In the meantime, you can:
[...break it up or use source()]

jennybc · 2024-07-03T15:10:00Z

Also here a couple days ago: https://x.com/meghansharris/status/1807022600251408552. It's a screenshot, so no text to capture here. But it's definitely this.

rmflight · 2024-08-06T15:00:52Z

I just got bit by this bug on Positron Version: 2024.07.0 build 125.

Error:
! Can't pass console input on to R, it exceeds R's internal console buffer size.
This is a Positron limitation we plan to fix. In the meantime, you can:
- Break the command you sent to the console into smaller chunks, if possible.
- Otherwise, send the whole script to the console using `source()`.

jmcphers · 2024-08-07T15:06:00Z

Also just reported on the discussion forum. #4262

DavisVaughan · 2024-08-07T16:14:46Z

@lionel- and I talked about this today, we have a few notes on how this should work.

This expands on @jmcphers's comment here #4264 (comment)

Frontend

We think the frontend should be in charge of splitting large selections into expressions to send to the backend.

The practical way to do this is to try and reuse some existing behavior we already have for pending input:

Split input by \n
For each line, check if that line is a complete expression with isCodeFragmentComplete()
If not complete, append the next line and try again
When you get a complete fragment, send it to the backend

positron/src/vs/workbench/services/positronConsole/browser/positronConsoleService.ts

Lines 1944 to 1954 in f1d6e84

    
           // Find a complete code fragment to execute. 
        
           const codeLines: string[] = []; 
        
           for (let i = 0; i < pendingInputLines.length; i++) { 
        
           	// Push the pending input line to the code lines. 
        
           	codeLines.push(pendingInputLines[i]); 
        
           	// Determine whether the code lines are a complete code fragment. If they are, execute 
        
           	// the code fragment. 
        
           	const codeFragment = codeLines.join('\n'); 
        
           	const codeFragmentStatus = await this.session.isCodeFragmentComplete(codeFragment); 
        
           	if (codeFragmentStatus === RuntimeCodeFragmentStatus.Complete) {

While this is currently being done for "pending input", it is not yet being done for the case where you send a selection of code to R while R is already idle.

In practice, this technically means we could send multiple expressions to R at once, like

1 + 1; 2 + 2

That's two expressions on 1 line. But RStudio has been doing this for years, and it really doesn't cause any issues, so I am completely fine with that behavior.

Input / Output grouping

One important bit that comes from this change is that it will positively impact how inputs and outputs are grouped together in the Positron console. Currently it looks like this when you send a whole selection to the console:

That's because Positron isn't splitting anything. It treats the whole selection as one "input" and everything we get back as one "output" that goes with that input.

But if Positron splits by expression then we will be able to replicate this much nicer RStudio behavior

Expressions that are too darn long

In RStudio, the frontend actually splits by \n, not by expression. This means that when RStudio and R talk to each other, R gets to say "hey, this is an incomplete expression, give me more input". In Positron, since Ark is a Jupyter kernel it is typically much better if we send it a known-to-be-complete expression rather than a single line at a time.

But this means that in Positron we can run into the following case:

{

 # a really big top level expression

}

where this 1 expression is longer than R's buffer, which brings me to....

Backend

To support the case of really really long single expressions (which is honestly pretty common!), Ark itself is going to gain the ability to split that long complete expression by \n. It is way way easier and faster for Ark to do this dance with R where Ark provides one line at a time to R, R gets a chance to say "this is incomplete, give me more", and Ark's read-console hook can provide the next line. The important part is that Positron gives Ark 1 complete expression - regardless of length, Ark should be able to handle it with this change.

We think that will also improve the experience when Ark is used in other frontends besides Positron.

jmcphers · 2024-08-07T21:28:38Z

If not complete, append the next line and try again
When you get a complete fragment, send it to the backend

This does have the advantage of working w/o any additional support from language packs, but it's very chatty. In the pathological case like the { ... really big top level expression ... }, we're going to be calling the backend 100 times! I think it would be worth adding a way for the frontend to ask the backend to break up the statement all at once.

The other thing worth noting is that I think the queue of statements to be run ultimately needs to be moved -- an in-memory array of pending inputs gets wiped if you close your browser, or reload the window, but I think users will expect them to continue to be consumed by the backend (especially if they are queued b/c some long-running thing is going on). Some notes on that in #1155.

DavisVaughan · 2024-08-07T21:48:50Z

I think it would be worth adding a way for the frontend to ask the backend to break up the statement all at once.

Yea @lionel- thought we might need that too. Running the whole selection through R's C level R_ParseVector() should work pretty well for this, as I think it would return a list of the individual expressions to run, or a signal if it is an incomplete statement or a parse error

kylebutts · 2024-08-22T20:35:53Z

Could there be a possible stop-gap measure to do something like "take current selection, write to temporary R file, and source the file"? I'm finding myself hitting this bug a lot. My current solution is to delete text that I don't want to run, sourcing the file, and then undoing the deletion (scary!)

lionel- · 2024-08-22T21:26:31Z

@kylebutts We're working on it! 😄

lionel- · 2024-09-19T12:09:48Z

Because of the possibility that a single expression exceeds the input buffer size, we are going to prioritise fixing on the backend side: #4745.

On the frontend side, the fix will mostly be a UX improvement, in particular to get correct interpolation of inputs and outputs with multiline selections.

To support splitting inputs into complete expressions, Ark now has a new LSP request of type positron/inputBoundaries posit-dev/ark#533.

Here are the types involved:

type ParseBoundaries = Array<ParseBoundary>

interface LineRange {
  start: number;
  end: number;
}

interface ParseBoundary {
  kind: 'whitespace' | 'complete' | 'incomplete' | 'invalid';
  range: LineRange;
  data: ParseBoundaryData;
}

interface ParseBoundaryData {
  [k: string]: unknown;
}

interface ParseBoundaryDataInvalid {
  message: string;
}

The boundaries are a vector of sorted ranges. Invariants:

The ranges are sorted, non-overlapping, and contiguous.
Empty lines and lines containing only whitespace or comments are complete inputs.
There is always at least one range since the empty string is a complete input.
Inputs are classified in complete, incomplete, and error sections. The sections are sorted in this order (there cannot be complete inputs after an incomplete or error one, there cannot be an incomplete input after an error one, and error inputs are always trailing).
There is at most one incomplete and/or one error input in a set of inputs.

More notes:

UI: We might not need any changes to the console UI, and instead enqueue complete expressions invisibly. Here is how it works in RStudio:

rstudio-input-queue.mov
API: @juliasilge suggested that the best place to expose the input boundaries functionality is probably in positron.d.ts. Then how an extension like positron-r implements it (LSP request, Jupyter request, a process invokation) is an implementation detail.
Fallback path: We will need to support a fallback path for runtimes that don't support an inputBoundaries request, in particular Python.
When an error occurs, the input queue should be flushed, for consistency with RStudio and the backend fix being implemented in Ark: Break up long inputs #4745.

lionel- · 2024-11-05T09:19:06Z

We now split by newlines on the backend side, which fixes the errors with overflowing inputs, but still need to split by expressions on the frontend side. We're now tracking the latter in #5272.

jmcphers added this to the Internal Preview milestone Sep 15, 2023

DavisVaughan self-assigned this Sep 27, 2023

DavisVaughan removed their assignment Oct 16, 2023

This was referenced Nov 2, 2023

How is multiline code executed? #1762

Open

ark: Cap console input that is "too large" for R's buffer #1767

Closed

DavisVaughan mentioned this issue Jan 10, 2024

Avoid crashing when given input longer than R's buffer posit-dev/ark#202

Merged

DavisVaughan changed the title ~~ark: Kernel exits if you send a large block of code as the first input~~ Frontend should break up multiline selections that get sent to Console (by expression or by \n) Jan 30, 2024

juliasilge modified the milestones: Private Alpha 2024 Q1, Release Candidate Feb 16, 2024

wesm added area: console Issues related to Console category. bug Something isn't working sharp-edge labels Feb 29, 2024

DavisVaughan mentioned this issue Apr 5, 2024

Ark: Truncated input causes weirdness if incomplete #2675

Closed

jmcphers mentioned this issue Aug 7, 2024

R commands submitted as multi-line selection report back from Console out of order #4264

Closed

lionel- mentioned this issue Aug 22, 2024

Add SrcRef type and create srcrefs with parse_status() posit-dev/ark#482

Merged

lionel- self-assigned this Aug 22, 2024

juliasilge modified the milestones: Release Candidate, 2024.10.0 Pre-Release Aug 26, 2024

This was referenced Sep 9, 2024

Add ParseData type posit-dev/ark#508

Merged

Detect boundaries of parse inputs posit-dev/ark#522

Merged

This was referenced Sep 18, 2024

Distinguish whitespace and include syntax error messages posit-dev/ark#532

Merged

Add LSP request for input boundaries posit-dev/ark#533

Merged

Ark: Break up long inputs #4745

Closed

petetronic mentioned this issue Oct 1, 2024

2024.11 Pre-Release Plan #4860

Closed

lionel- mentioned this issue Oct 2, 2024

Test multiple expressions in console and notebook modes posit-dev/ark#559

Merged

juliasilge modified the milestones: 2024.10.0 Pre-Release, 2024.11.0 Pre-Release Oct 3, 2024

lionel- mentioned this issue Oct 7, 2024

Some multiline code in R console not working a #4901

Closed

lionel- modified the milestones: 2024.11.0 Pre-Release, Release Candidate Oct 22, 2024

lionel- mentioned this issue Nov 5, 2024

Frontend should break up multiline selections that get sent to Console by complete expressions #5272

Open

lionel- closed this as completed Nov 5, 2024

github-actions bot locked as resolved and limited conversation to collaborators Nov 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Frontend should break up multiline selections that get sent to Console (by expression or by `\n`) #1326

Frontend should break up multiline selections that get sent to Console (by expression or by `\n`) #1326

jmcphers commented Sep 15, 2023

DavisVaughan commented Sep 27, 2023

DavisVaughan commented Sep 27, 2023 •

edited

Loading

DavisVaughan commented Sep 27, 2023 •

edited

Loading

kevinushey commented Sep 27, 2023

kevinushey commented Sep 27, 2023

DavisVaughan commented Sep 27, 2023 •

edited

Loading

DavisVaughan commented Sep 27, 2023

DavisVaughan commented Sep 27, 2023

jmcphers commented Sep 27, 2023

DavisVaughan commented Oct 3, 2023 •

edited

Loading

lionel- commented Oct 4, 2023

jmcphers commented Jul 3, 2024

jennybc commented Jul 3, 2024 •

edited

Loading

rmflight commented Aug 6, 2024

jmcphers commented Aug 7, 2024

DavisVaughan commented Aug 7, 2024 •

edited

Loading

jmcphers commented Aug 7, 2024

DavisVaughan commented Aug 7, 2024 •

edited

Loading

kylebutts commented Aug 22, 2024

lionel- commented Aug 22, 2024 •

edited

Loading

lionel- commented Sep 19, 2024

lionel- commented Nov 5, 2024

Frontend should break up multiline selections that get sent to Console (by expression or by \n) #1326

Frontend should break up multiline selections that get sent to Console (by expression or by \n) #1326

Comments

jmcphers commented Sep 15, 2023

DavisVaughan commented Sep 27, 2023

DavisVaughan commented Sep 27, 2023 • edited Loading

DavisVaughan commented Sep 27, 2023 • edited Loading

kevinushey commented Sep 27, 2023

kevinushey commented Sep 27, 2023

DavisVaughan commented Sep 27, 2023 • edited Loading

DavisVaughan commented Sep 27, 2023

DavisVaughan commented Sep 27, 2023

jmcphers commented Sep 27, 2023

DavisVaughan commented Oct 3, 2023 • edited Loading

lionel- commented Oct 4, 2023

jmcphers commented Jul 3, 2024

jennybc commented Jul 3, 2024 • edited Loading

rmflight commented Aug 6, 2024

jmcphers commented Aug 7, 2024

DavisVaughan commented Aug 7, 2024 • edited Loading

Frontend

Input / Output grouping

Expressions that are too darn long

Backend

jmcphers commented Aug 7, 2024

DavisVaughan commented Aug 7, 2024 • edited Loading

kylebutts commented Aug 22, 2024

lionel- commented Aug 22, 2024 • edited Loading

lionel- commented Sep 19, 2024

lionel- commented Nov 5, 2024

Frontend should break up multiline selections that get sent to Console (by expression or by `\n`) #1326

Frontend should break up multiline selections that get sent to Console (by expression or by `\n`) #1326

DavisVaughan commented Sep 27, 2023 •

edited

Loading

DavisVaughan commented Sep 27, 2023 •

edited

Loading

DavisVaughan commented Sep 27, 2023 •

edited

Loading

DavisVaughan commented Oct 3, 2023 •

edited

Loading

jennybc commented Jul 3, 2024 •

edited

Loading

DavisVaughan commented Aug 7, 2024 •

edited

Loading

DavisVaughan commented Aug 7, 2024 •

edited

Loading

lionel- commented Aug 22, 2024 •

edited

Loading