Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Frontend should break up multiline selections that get sent to Console (by expression or by \n) #1326

Closed
jmcphers opened this issue Sep 15, 2023 · 22 comments
Assignees
Labels
area: console Issues related to Console category. bug Something isn't working sharp-edge

Comments

@jmcphers
Copy link
Collaborator

To reproduce, create a .R file that contains 50 lines, each of which has this text on it:

# This is a long comment that takes a lot of space. It will be duplicated many times.

Then, start (or restart) R, such that the R session hasn't run any code yet.

Highlight the entire contents of the .R file and press Cmd+Enter. The R kernel exits instead of running the code.

image

Interesting facts:

  • There is definitely a specific threshold at which this starts happening. For example, it won't reproduce if you only run 30 lines from the above test file.
  • The kernel isn't crashing! It's exiting, and even exiting normally.
Executable module set to "/Users/jmcphers/git/positron/extensions/positron-r/resources/ark/ark".
Architecture set to: arm64-apple-macosx-.
(lldb) c
Process 75332 resuming
Process 75332 exited with status = 0 (0x00000000)
@jmcphers jmcphers added this to the Internal Preview milestone Sep 15, 2023
@DavisVaughan DavisVaughan self-assigned this Sep 27, 2023
@DavisVaughan
Copy link
Contributor

I have determined the exact number of characters it fails on is suspiciously 4096 (or 2048 * 2, or 1024 * 4). Regardless, it is a suspicious number. Here is exact text where 1 character less does not fail, but this does:

# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It will be duplicated many times.
# This is a long comment that takes a lot of space. It

@DavisVaughan
Copy link
Contributor

DavisVaughan commented Sep 27, 2023

Oh hello there:

[R] 2023-09-27T16:54:10.946936000Z [ark-unknown] INFO crates/ark/src/interface.rs:728: Error: input too large for buffer.

https://github.com/posit-dev/amalthea/blob/b3177e04ed54829eca385a597c89f37581fd17dc/crates/ark/src/interface.rs#L725-L730

@DavisVaughan
Copy link
Contributor

DavisVaughan commented Sep 27, 2023

So it seems like we are actually refusing to send all of that input to R by not copying it over to R's read-console buffer, but then something freaks out later on since nothing got copied over. Ignoring the freakout for a minute, maybe we should look at what RStudio does to avoid this problem.

@kevinushey am I right in understanding that RStudio takes all of that input and splits it on \n and then sends the inputs one at a time to R? I guess this would help circumvent the 4096 buffer size limit.
https://github.com/rstudio/rstudio/blob/e32fb38f34af7cf3fd0fff630dba41cf79b69df5/src/cpp/session/SessionConsoleInput.cpp#L210

Might be a little work to implement on our side though.

@kevinushey
Copy link
Contributor

All of that code there is really a workaround for Python code insertion; I don't think it's related to the 4096 buffer size limit.

In RStudio, it looks like we just silently drop characters over the buffer limit? E.g. if you try executing this:

"################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################################"

you end up with a continue prompt because the trailing quote was lost.

@DavisVaughan
Copy link
Contributor

DavisVaughan commented Sep 27, 2023

Ok maybe I didn't find the right place in the source, but it definitely seems like if you highlight a selection and send the selection to the console, then RStudio handles each line separately

i.e. sending

mtcars %>%
  mutate(x = mpg + 1)

gives me this in the debug logs

2023-09-27T19:22:58.446645Z [rsession-davis] DEBUG Handle console_input:     mtcars %>%

2023-09-27T19:22:58.449792Z [rsession-davis] DEBUG Handle console_input:       mutate(x = mpg + 1)

https://github.com/rstudio/rstudio/blob/e32fb38f34af7cf3fd0fff630dba41cf79b69df5/src/cpp/session/SessionConsoleInput.cpp#L351

@DavisVaughan
Copy link
Contributor

@kevinushey i do actually think that that fixup*() function does two things:

  • splits on newlines
  • handles python fixups

i.e. even before we had the python specific code in there, we had:
rstudio/rstudio@2f49492#diff-38d04249284e40942a016b67b733d4a998fa6659e096636dd5893f2ebbe68584L250

which did the newline splitting. That has basically just been moved into the fixup*() helper.

I have tracked it all the way back to this 2011 commit from Joe
rstudio/rstudio@7f23dee

which does suggest that we split to avoid sending too much to R at once

@DavisVaughan
Copy link
Contributor

Another bit of proof about R sending selections 1 line at a time is the difference in the console output between RStudio and Positron.

  • For Positron, sending a selection to the console looks like you typed in all of the lines in a multi-line prompt directly at the cursor. And all of the output is returned in bulk at the end.
  • For RStudio, you get a new prompt for each line of input, and the output for each line of input is displayed before the next line of input is processed, even though you sent them all at once.
Screenshot 2023-09-27 at 3 40 40 PM Screenshot 2023-09-27 at 3 40 58 PM

@jmcphers
Copy link
Collaborator Author

Some additional color here is that sending the lines one by one (instead of all at once) allows us to discard pending input after an error has occurred. This behavior is controversial but is currently the RStudio default. rstudio/rstudio#3014

@DavisVaughan
Copy link
Contributor

DavisVaughan commented Oct 3, 2023

In the ark meeting we determined that the idea of "breaking up the large selection into individual statements" should be done on the frontend (middleware? #1155) side of things instead of within ark itself.

It sounds like ideally we'd have some middleware that could use tree-sitter on the typescript side to split up the selection into individual R statements that would get stored in a queue and sent off to R one at a time.

This is a more language agnostic solution (besides needing a tree-sitter language implementation), and would mean that ark would not need to manage this queue itself.

It is also quite nice because, as Jonathan mentioned, it would give us the option to discard the queue if we encountered an error (which would be optional behavior since it is controversial).


The one action item that I do think we need to do for ark is to handle the case of a "too large" buffer a little better
https://github.com/posit-dev/amalthea/blob/810d5325a4499bf8e4196071f63c04e2162e302b/crates/ark/src/interface.rs#L725-L730

"Doing nothing" as we do there seems to actually be the cause of the crash, as it seems that either R or ark is expecting that we did actually send something over on the buffer. RStudio handles this edge case by trimming the input to the size of the buffer, and just sending everything that it can. We could probably start there, as that would at least avoid the crash on the ark side, and then let the middleware changes add the rest of the improvements.

@lionel-
Copy link
Contributor

lionel- commented Oct 4, 2023

It sounds like ideally we'd have some middleware that could use tree-sitter on the typescript side

I'd just like to note that since tree-sitter is designed with the main goal of being an error-recovering parser, using it for this purpose might lead to unexpected behaviour in edge cases. Since that eval queue would be the heart of R interactions in positron, it should be as accurate and reliable as possible.

A more robust alternative would be to build wasm parsers for each versions of R based on trimmed-down versions of gram.y / gram.c. These parsers would have limited functionality compared to the tree-sitter parser but would allow precise parsing in a way that's 100% compatible with the R runtime to ensure alignment (i.e. if we send an expression we know for sure there's not going to be a parser error or an incomplete parse on the runtime side).

@DavisVaughan DavisVaughan removed their assignment Oct 16, 2023
@DavisVaughan DavisVaughan changed the title ark: Kernel exits if you send a large block of code as the first input Frontend should break up multiline selections that get sent to Console (by expression or by \n) Jan 30, 2024
@wesm wesm added area: console Issues related to Console category. bug Something isn't working sharp-edge labels Feb 29, 2024
@jmcphers
Copy link
Collaborator Author

jmcphers commented Jul 3, 2024

Came up again today on Bluesky: https://bsky.app/profile/tjmahr.com/post/3kwctblqx362o

wait i think the block was just too big. if i use a 500-line list() for my targets, i get

Error:
! Can't pass console input on to R, it exceeds R's internal console buffer size.
This is a Positron limitation we plan to fix. In the meantime, you can:
[...break it up or use source()]

@jennybc
Copy link
Member

jennybc commented Jul 3, 2024

Also here a couple days ago: https://x.com/meghansharris/status/1807022600251408552. It's a screenshot, so no text to capture here. But it's definitely this.

@rmflight
Copy link

rmflight commented Aug 6, 2024

I just got bit by this bug on Positron Version: 2024.07.0 build 125.

Error:
! Can't pass console input on to R, it exceeds R's internal console buffer size.
This is a Positron limitation we plan to fix. In the meantime, you can:
- Break the command you sent to the console into smaller chunks, if possible.
- Otherwise, send the whole script to the console using `source()`.

@jmcphers
Copy link
Collaborator Author

jmcphers commented Aug 7, 2024

Also just reported on the discussion forum. #4262

@DavisVaughan
Copy link
Contributor

DavisVaughan commented Aug 7, 2024

@lionel- and I talked about this today, we have a few notes on how this should work.

This expands on @jmcphers's comment here #4264 (comment)

Frontend

We think the frontend should be in charge of splitting large selections into expressions to send to the backend.

The practical way to do this is to try and reuse some existing behavior we already have for pending input:

  • Split input by \n
  • For each line, check if that line is a complete expression with isCodeFragmentComplete()
  • If not complete, append the next line and try again
  • When you get a complete fragment, send it to the backend

// Find a complete code fragment to execute.
const codeLines: string[] = [];
for (let i = 0; i < pendingInputLines.length; i++) {
// Push the pending input line to the code lines.
codeLines.push(pendingInputLines[i]);
// Determine whether the code lines are a complete code fragment. If they are, execute
// the code fragment.
const codeFragment = codeLines.join('\n');
const codeFragmentStatus = await this.session.isCodeFragmentComplete(codeFragment);
if (codeFragmentStatus === RuntimeCodeFragmentStatus.Complete) {

While this is currently being done for "pending input", it is not yet being done for the case where you send a selection of code to R while R is already idle.

In practice, this technically means we could send multiple expressions to R at once, like

1 + 1; 2 + 2

That's two expressions on 1 line. But RStudio has been doing this for years, and it really doesn't cause any issues, so I am completely fine with that behavior.

Input / Output grouping

One important bit that comes from this change is that it will positively impact how inputs and outputs are grouped together in the Positron console. Currently it looks like this when you send a whole selection to the console:

Screenshot 2024-08-07 at 12 00 22 PM

That's because Positron isn't splitting anything. It treats the whole selection as one "input" and everything we get back as one "output" that goes with that input.

But if Positron splits by expression then we will be able to replicate this much nicer RStudio behavior

Screenshot 2024-08-07 at 12 02 03 PM

Expressions that are too darn long

In RStudio, the frontend actually splits by \n, not by expression. This means that when RStudio and R talk to each other, R gets to say "hey, this is an incomplete expression, give me more input". In Positron, since Ark is a Jupyter kernel it is typically much better if we send it a known-to-be-complete expression rather than a single line at a time.

But this means that in Positron we can run into the following case:

{

 # a really big top level expression

}

where this 1 expression is longer than R's buffer, which brings me to....

Backend

To support the case of really really long single expressions (which is honestly pretty common!), Ark itself is going to gain the ability to split that long complete expression by \n. It is way way easier and faster for Ark to do this dance with R where Ark provides one line at a time to R, R gets a chance to say "this is incomplete, give me more", and Ark's read-console hook can provide the next line. The important part is that Positron gives Ark 1 complete expression - regardless of length, Ark should be able to handle it with this change.

We think that will also improve the experience when Ark is used in other frontends besides Positron.

@jmcphers
Copy link
Collaborator Author

jmcphers commented Aug 7, 2024

If not complete, append the next line and try again
When you get a complete fragment, send it to the backend

This does have the advantage of working w/o any additional support from language packs, but it's very chatty. In the pathological case like the { ... really big top level expression ... }, we're going to be calling the backend 100 times! I think it would be worth adding a way for the frontend to ask the backend to break up the statement all at once.

The other thing worth noting is that I think the queue of statements to be run ultimately needs to be moved -- an in-memory array of pending inputs gets wiped if you close your browser, or reload the window, but I think users will expect them to continue to be consumed by the backend (especially if they are queued b/c some long-running thing is going on). Some notes on that in #1155.

@DavisVaughan
Copy link
Contributor

DavisVaughan commented Aug 7, 2024

I think it would be worth adding a way for the frontend to ask the backend to break up the statement all at once.

Yea @lionel- thought we might need that too. Running the whole selection through R's C level R_ParseVector() should work pretty well for this, as I think it would return a list of the individual expressions to run, or a signal if it is an incomplete statement or a parse error

@kylebutts
Copy link
Contributor

Could there be a possible stop-gap measure to do something like "take current selection, write to temporary R file, and source the file"? I'm finding myself hitting this bug a lot. My current solution is to delete text that I don't want to run, sourcing the file, and then undoing the deletion (scary!)

@lionel- lionel- self-assigned this Aug 22, 2024
@lionel-
Copy link
Contributor

lionel- commented Aug 22, 2024

@kylebutts We're working on it! 😄

@lionel-
Copy link
Contributor

lionel- commented Sep 19, 2024

Because of the possibility that a single expression exceeds the input buffer size, we are going to prioritise fixing on the backend side: #4745.

On the frontend side, the fix will mostly be a UX improvement, in particular to get correct interpolation of inputs and outputs with multiline selections.

To support splitting inputs into complete expressions, Ark now has a new LSP request of type positron/inputBoundaries posit-dev/ark#533.

Here are the types involved:

type ParseBoundaries = Array<ParseBoundary>

interface LineRange {
  start: number;
  end: number;
}

interface ParseBoundary {
  kind: 'whitespace' | 'complete' | 'incomplete' | 'invalid';
  range: LineRange;
  data: ParseBoundaryData;
}

interface ParseBoundaryData {
  [k: string]: unknown;
}

interface ParseBoundaryDataInvalid {
  message: string;
}

The boundaries are a vector of sorted ranges. Invariants:

  • The ranges are sorted, non-overlapping, and contiguous.
  • Empty lines and lines containing only whitespace or comments are complete inputs.
  • There is always at least one range since the empty string is a complete input.
  • Inputs are classified in complete, incomplete, and error sections. The sections are sorted in this order (there cannot be complete inputs after an incomplete or error one, there cannot be an incomplete input after an error one, and error inputs are always trailing).
  • There is at most one incomplete and/or one error input in a set of inputs.

More notes:

  • UI: We might not need any changes to the console UI, and instead enqueue complete expressions invisibly. Here is how it works in RStudio:

    rstudio-input-queue.mov
  • API: @juliasilge suggested that the best place to expose the input boundaries functionality is probably in positron.d.ts. Then how an extension like positron-r implements it (LSP request, Jupyter request, a process invokation) is an implementation detail.

  • Fallback path: We will need to support a fallback path for runtimes that don't support an inputBoundaries request, in particular Python.

  • When an error occurs, the input queue should be flushed, for consistency with RStudio and the backend fix being implemented in Ark: Break up long inputs #4745.

@lionel-
Copy link
Contributor

lionel- commented Nov 5, 2024

We now split by newlines on the backend side, which fixes the errors with overflowing inputs, but still need to split by expressions on the frontend side. We're now tracking the latter in #5272.

@lionel- lionel- closed this as completed Nov 5, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 20, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area: console Issues related to Console category. bug Something isn't working sharp-edge
Projects
None yet
Development

No branches or pull requests

9 participants