Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to how stream outputs are handled in Notebooks #161023

Closed
DonJayamanne opened this issue Sep 16, 2022 · 3 comments · Fixed by #160946
Closed

Changes to how stream outputs are handled in Notebooks #161023

DonJayamanne opened this issue Sep 16, 2022 · 3 comments · Fixed by #160946
Assignees
Labels
api-proposal insiders-released Patch has been released in VS Code Insiders notebook-execution Issues related to running cells in a notebook
Milestone

Comments

@DonJayamanne
Copy link
Contributor

DonJayamanne commented Sep 16, 2022

Related issues #146768, microsoft/vscode-jupyter#11031

Multiple Output Items per Output

  • Today we can have multiple output items per output
  • However all output items in an output must have a unique mime type.
  • Thus its not possible to have two stderr output items in the same output.
    This is check is done in extHostTypes.ensureUniqueMimeTypes
    I.e. if you were to call appendOutputItem with stdout multiple times, only the first stdout is stored in the model.
    E.g. if you were to call appendOutputItem with stdout twice with the values 1 and 2 respectively, then you'd end up with a single output item with the value 1 in the model. The second stdout output item is ignored.

Changes to the API

  • We will support calling appendOutputItem multiple times for the same stream mime types per output.
    However we'd merge the stream output items into one and we'd still end up with a single output item per mime type.
    E.g. if you were to call appendOutputItem with stdout twice with the values 1 and 2 respectively, then you'd end up with a single output item with the value 12 in the model.

    This change applies only to the stream mime types which include:

    • 'application/vnd.code.notebook.stdout',
    • 'application/x.notebook.stdout',
    • 'application/x.notebook.stream',
    • 'application/vnd.code.notebook.stderr',
    • 'application/x.notebook.stderr'

Current approach

Summary: Jupyter extension gets new output for each cell and compresses/merges the outpu items, and replaces the existing output with new output. E.g. if you get a new stream output, we replace the previous output items with a new one.

Draw backs:

  • Jupyter extension is always replacing existing output with new output
  • As Jupyter gets new output, the output grows and this large output is sent to Renderer process and form Renderer process back to Ext Host.
    • I.e. we send large strings back and forth (data sent back and forth grows over time)
    • Sending this ever growing large output back and forth has the potential to crash vs code.
sequenceDiagram
  participant Jupyter
  participant Extension Host
  participant VSCode Renderer Proc
  Jupyter->>Extension Host: Append Kernel Output `Line1` using `Task.appendOutput` 
  Extension Host->>VSCode Renderer Proc: `NotebookTextModel._spliceNotebookCellOutputs2` 
  Note right of VSCode Renderer Proc: Model updated to contain <br/>Output with OutputItem `Line1`
  VSCode Renderer Proc-->>Extension Host: Trigger change event for new Output `Line1`
  Note right of Extension Host: Model updated to contain <br/>Output with OutputItem `Line1`
  Jupyter->>Extension Host: Kernel output is `Line2`<br/>Merge previous outputs with new to get `Line1Line2`<br/>Replace output with new value `Line1Line2`<br/>Using `Task.replaceOutputItems`
  Extension Host->>VSCode Renderer Proc: `NotebookTextModel._replaceNotebookCellOutputItems`
  Note right of VSCode Renderer Proc: Model updated to replace output <br/>with new value `Line1Line2`.
  VSCode Renderer Proc-->>Extension Host: Trigger change event for replace with new output item `Line1Line2`
  Note right of Extension Host: Model updated to replace output <br/>with new value `Line1Line2`.
  Jupyter->>Extension Host: Kernel output is `Line3`<br/>Merge previous outputs with new to get `Line1Line2Line3`<br/>Replace output with new value `Line1Line2Line3`<br/>Using `Task.replaceOutputItems`
  Extension Host->>VSCode Renderer Proc: `NotebookTextModel._replaceNotebookCellOutputItems`
  Note right of VSCode Renderer Proc: Model updated to replace output <br/>with new value `Line1Line2Line3`.
  VSCode Renderer Proc-->>Extension Host: Trigger change event for replace with new output item `Line1Line2Line3`
  Note right of Extension Host: Model updated to replace output <br/>with new value `Line1Line2Line3`.
Loading

Why do we need to merge/compress outputs

Assume we get output from the kernel as follows:

  • First kernel output is Line1.
  • Second kernel output is ESC[ALine2
  • Third kernel output is ESC[ALine3

Note: Assume ESC[A is a special terminal escape sequence that tells the terminal to remove the previous line.

Thus when we get the output ESC[ALine3, what needs to be displayed is Line3 and same Line3 saved in *.ipynb file (the prevous lines need to be dropped).
This compression/merging of outputs needs to happen in realtime, so users can see updates from terminal progress bars & the like (very common in Jupyter, installing packages and running experiments).

Proposed Solution

Summary: We send the incremental updates to Renderer process. And instead of appending the new items we compress the new output items with the existing items and replace the existing items with a single output item. I.e. we always have one output item in the cell output.

Draw backs:

  • Changes to behaviour of existing API or need a new API.
    Current API method is appendOutputItems. Today this only supports having multiple output items with distinct mime types. If one were to call the API with the same mime type, the the new items get dropped.
    The change here allows us to call appendOutputItems for the same mime type again without losing this data and this applies to streams mime types.

Optionally we could have a new method appendStreamOutputItems to be more explicit about the fact that we have special handling for stream output items.

Benefits:

  • We only send the new output items (incremental updates) to Renderer process and Extension host.
  • No additional processing required when rendering output in notebook renderer (output items are already merged/compressed)
  • No additional processing required when serializing the output in Notebook Serializer (output items are already merged/compressed)
sequenceDiagram
  participant Jupyter
  participant Extension Host
  participant VSCode Renderer Proc
  Jupyter->>Extension Host: Append Kernel Output `Line1` using `Task.appendOutput` 
  Extension Host->>VSCode Renderer Proc: `NotebookTextModel._spliceNotebookCellOutputs2` 
  Note right of VSCode Renderer Proc: Model updated to contain <br/>Output with OutputItem `Line1`
  VSCode Renderer Proc-->>Extension Host: Trigger change event for new Output `Line1`
  Note right of Extension Host: Model updated to contain <br/>Output with OutputItem `Line1`
  Jupyter->>Extension Host: Kernel output is `Line2`<br/>Append new output item with value `Line2` using `Task.appendOutputItem`
  Extension Host->>VSCode Renderer Proc: `NotebookTextModel._appendNotebookCellOutputItems`
  Note right of VSCode Renderer Proc: We now have 2 output items in the output model. `Line`, and `Line2`
  Note right of VSCode Renderer Proc: When sending data to `Notebook Renderer` we need to loop through 2 output items to merge into one.
  VSCode Renderer Proc-->>Extension Host: Trigger change event for append new output item `Line2`
  Note right of Extension Host: Model contains two output items, `Line` and `Line2`.
  Note right of Extension Host: Notebook Serializer needs to loop through 2 output items to merge into one.
  Jupyter->>Extension Host: Kernel output is `Line3`<br/>Append new output item with value `Line3` using `Task.appendOutputItem`
  Extension Host->>VSCode Renderer Proc: `NotebookTextModel._appendNotebookCellOutputItems`
  Note right of VSCode Renderer Proc: We now have 3 output items in the output model. `Line`, `Line2`, and `Line3`
  Note right of VSCode Renderer Proc: When sending data to `Notebook Renderer` we need to loop through 3 output items to merge into one.
  VSCode Renderer Proc-->>Extension Host: Trigger change event for append new output item `Line3`
  Note right of Extension Host: Model contains two output items, `Line1`,`Line2` and `Line3`.
  Note right of Extension Host: Notebook Serializer needs to loop through 3 output items to merge into one.
Loading

Solution that we tried and found to not work well

Summary: We send the incremental updates to Renderer process. I.e. we always append output items. As we get new items this list of output items will grow.

Draw backs:

  • VS Code renderer process needs to merge/compress all output items into one for Notebook Output Renderer
  • IPynb Serializers Extension (in extension host) needs to merge/compress all output items to save to *.ipynb file.
  • Thus as the output items grow form 1 to n, we end up spending more and more time processing the same output items over & over in both places:
    • I.e. renderer process is processing a large list for Notebook Renderer and as the list grows, this process is busy
    • I.e. extension host is processing a large list for Serialization and as the list grows, this process is busy

Note: We have tried this approach and the results are not good. Basically VS Code is busy processing a large list of output items and this slows other parts of notebook.

Benefits:

  • We only send the new output items to Renderer process and Extension host.
    • i.e. only send incremental updates
sequenceDiagram
  participant Jupyter
  participant Extension Host
  participant VSCode Renderer Proc
  Jupyter->>Extension Host: Append Kernel Output `Line1` using `Task.appendOutput` 
  Extension Host->>VSCode Renderer Proc: `NotebookTextModel._spliceNotebookCellOutputs2` 
  Note right of VSCode Renderer Proc: Model updated to contain <br/>Output with OutputItem `Line1`
  VSCode Renderer Proc-->>Extension Host: Trigger change event for new Output `Line1`
  Note right of Extension Host: Model updated to contain <br/>Output with OutputItem `Line1`
  Jupyter->>Extension Host: Kernel output is `Line2`<br/>Append new output item with value `Line2`<br/>using `Task.appendOutputItem`
  Extension Host->>VSCode Renderer Proc: `NotebookTextModel._appendNotebookCellOutputItems`
  Note right of VSCode Renderer Proc: Merge previous output item with new item, <br/>and replace previous Output Item with `Line1Line2`.<br/>Now we have one output item `Line1Line2`
  VSCode Renderer Proc-->>Extension Host: Trigger change event for append new output item `Line2`
  Note right of Extension Host: Merge previous output item with new item, <br/>and replace previous Output Item with `Line1Line2`.<br/>Now we have one output item `Line1Line2`
  Jupyter->>Extension Host: Kernel output is `Line3`<br/>Append new output item with value `Line3`<br/>using `Task.appendOutputItem`
  Extension Host->>VSCode Renderer Proc: `NotebookTextModel._appendNotebookCellOutputItems`
  Note right of VSCode Renderer Proc: Merge previous output item with new item, <br/>and replace previous Output Item with `Line1Line2Line3`.<br/>Now we have one output item `Line1Line2Line3`
  VSCode Renderer Proc-->>Extension Host: Trigger change event for append new output item `Line3`
  Note right of Extension Host: Merge previous output item with new item, <br/>and replace previous Output Item with `Line1Line2Line3`.<br/>Now we have one output item `Line1Line2Line3`
Loading
@DonJayamanne DonJayamanne self-assigned this Sep 16, 2022
@DonJayamanne
Copy link
Contributor Author

@roblourens @rebornix /cc

@roblourens roblourens added api-proposal notebook-execution Issues related to running cells in a notebook labels Sep 20, 2022
@roblourens roblourens added this to the September 2022 milestone Sep 20, 2022
@jrieken
Copy link
Member

jrieken commented Sep 20, 2022

Thanks for the nice write-up.

Some follow questions/idea

  • I am confused by the application/x.notebook.XYZ mime-types. We don't spell them out anywhere and officially (by the docs) the only custom mime types are application/vnd.code.notebook.[stdout|stderr|error]
  • Merging repeated mime-types (of special types) makes sense. Do we recall why we/I made the restriction in the first place? Is that because of non-mergable types like images or audio files?

I see the ensureUniqueMimeTypes-call but it is also local to a single invocation of appendOutputItems. Did you try to simply call appendOutputItems-twice? Since the actual implementation is debounced merging should happen in the same debounced way, e.g here in update, so that calling appendOutputItems with repeated mime-types or repeated calls without repeated mimes should be any different

@DonJayamanne
Copy link
Contributor Author

confused by the application/x.notebook.XYZ mime-types. We don't spell them out anywhere and officially (by the docs) the only custom mime types are application/vnd.code.notebook.[stdout|stderr|error]

I only included those because we seemed to have a few of those streams. I agree the only one we need to merge are for application/vnd.code.notebook.[stdout|stderr] (the ones exposed via the shared methods on NotebookCellOutputItem)

Do we recall why we/I made the restriction in the first place? Is that because of non-mergable types like images or audio files?

Unfortunately I don't recall this.
Possible because, its just not possible to have multiple mime types repeated. For Jupyter we map each Jupyter output to a single NotebookCell output with single output items., and in Jupyter each Output is an Object and the keys are the mime types, which in turn map to an individual output item. Hence technically one cannot have 2 output items for the same mime type, as that would equate to two duplicate keys in a JS object, which isn't possible.

Did you try to simply call appendOutputItems-twice?

No didn't, i was primarily focused on merging the stream output items into single item.

@lszomoru lszomoru modified the milestones: September 2022, October 2022 Oct 5, 2022
@vscodenpa vscodenpa added unreleased Patch has not yet been released in VS Code Insiders insiders-released Patch has been released in VS Code Insiders and removed unreleased Patch has not yet been released in VS Code Insiders labels Oct 11, 2022
@github-actions github-actions bot locked and limited conversation to collaborators Nov 25, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-proposal insiders-released Patch has been released in VS Code Insiders notebook-execution Issues related to running cells in a notebook
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants