Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore using ArrayBuffer as backend for editor text contents #167719

Open
3 tasks
bpasero opened this issue Nov 30, 2022 · 17 comments
Open
3 tasks

Explore using ArrayBuffer as backend for editor text contents #167719

bpasero opened this issue Nov 30, 2022 · 17 comments
Assignees
Labels
debt Code quality issues file-io File I/O
Milestone

Comments

@bpasero
Copy link
Member

bpasero commented Nov 30, 2022

Explore to not use string but ArrayBuffer for the contents of a text editor which brings higher memory limits. See #167719 (comment)

Original Issue (now outdated) I see how https://github.com//issues/42839 introduced a way to avoid opening large files by telling the user to quit and restart with a larger `--max-memory`. First of all, I think asking the user to confirm opening a very large file is good because it may result in freezes or slowdown. However, I am no longer able to reproduce the endless freeze, even when opening a very large file (see comment [1] how to create a large file). Maybe given our work on sandboxing this issue is now different or not existent anymore?

At the same time, with the Electron 22 update, we get https://www.electronjs.org/blog/v8-memory-cage so I wonder if --max-memory is even still supported (maybe Deepak could clarify).

This issue is to:

  • validate that our original assumptions for adding this still apply
  • see if --max-memory still works with Electron 22
  • and if not, maybe we change the UX to still block opening the file at first by asking the user for confirmation (similar to how we ask for confirmation before opening a binary file)

[1] How to generate a large file
Inspired by this page, the following script that you can run from a macOS terminal produces a 4.23 GB random file for me after ~2min:

ruby -e 'a=STDIN.readlines;100000000.times do;b=[];4.times do; b << a[rand(a.size)].chomp end; puts b.join(" "); end' < /usr/share/dict/words > file.txt

//cc @deepak1556

@bpasero
Copy link
Member Author

bpasero commented Dec 21, 2022

Some interesting discoveries:

  • I can open 2 4gb each random files without problems from the E22 branch
  • ArrayBuffer is granted a larger space than 4 GB because (as far as I understand) they are not allocated on the heap
  • we have maxFileSize and maxHeapSize driving the decision if such a large file can open and I had to disable that [1]

[1]

maxFileSize: arch === Arch.IA32 ? 300 * ByteSize.MB : 16 * ByteSize.GB, // https://github.com/microsoft/vscode/issues/30180
maxHeapSize: arch === Arch.IA32 ? 700 * ByteSize.MB : 2 * 700 * ByteSize.MB, // https://github.com/v8/v8/blob/5918a23a3d571b9625e5cce246bdd5b46ff7cd8b/src/heap/heap.cc#L149

@rebornix
Copy link
Member

We discussed offline about this task and one of the idea is using ArrayBuffer directly as the backing store for the text buffer instead of using string so we won't be affected by the heap size limit. I experimented a bit (in https://github.com/microsoft/vscode/tree/rebornix/anonymous-donkey) and found moving from string to array buffer for PieceTree is pretty easy.

The real challenge is currently most our editor internal parts eagerly read line contents (through getLineContent). If we cache lineContent, it defeats the purpose of using ArrayBuffer. If we don't, then a component attempts to read every line content of the text model, might block the whole UI if the file is large. I have already moved getLineContent to getLineCharCode if possible in https://github.com/microsoft/vscode/tree/rebornix/anonymous-donkey but following components are not easy to move off line content:

@bpasero
Copy link
Member Author

bpasero commented Dec 31, 2022

👏 for pushing this forward, would be great if we can look into the 2 remaining ones, I am happy to help if possible.

@tahmidbintaslim
Copy link

Hi Everyone, I still see this issue happening to me whenever I try to open big files (Example: Larger than 10 mb) will all installed plugin and settings. Any update on this?

@bpasero
Copy link
Member Author

bpasero commented Feb 10, 2023

I still see this issue happening to me whenever I try to open big files

What issue?

@bpasero
Copy link
Member Author

bpasero commented Apr 18, 2023

Fyi I have removed support for --max-memory and related editor integrations from our product via 6d5b854. Turns out that max_old_space_size is unsupported since a very long time (electron/electron#31330).

@bpasero bpasero changed the title Revisit large file handling with max-memory Explore using ArrayBuffer as backend for editor text contents Apr 18, 2023
@alexdima alexdima self-assigned this Aug 23, 2023
@rebornix
Copy link
Member

rebornix commented Sep 13, 2023

Continued the ArrayBuffer exploration today. The first thing I did was trying to find a large file that can exceed the heap memory limit which can then be used as a file to validate the ArrayBuffer implementation. I could find such file as I didn't seem to trigger v8's heap memory cage.

  • Opening a file > 4GB via the script in Explore using ArrayBuffer as backend for editor text contents #167719 (comment) always works.
    • The used heap size by running performance.memory is very strange as the usedJSHeapSize can be larger than totalJSHeapSize

      performance.memory
      MemoryInfo {totalJSHeapSize: 12169242247, usedJSHeapSize: 12165238223, jsHeapSizeLimit: 4294705152}
      jsHeapSizeLimit: 4294705152
      totalJSHeapSize: 12169242247
      usedJSHeapSize: 12165238223
      
    • @deepak1556 do you have ideas of this behavior? Did we have any customization with our electron build?

  • After the file is opened, the UI can freeze for minutes. This is mainly because we still have components attempting to read all lines
    • WordHighlighter
    • StickyModelFromCandidateIndentationFoldingProvider
  • We can modify the file but save doesn't work. It will throw error Uncaught RangeError: Array buffer allocation failed
    • On Nodejs 18 64bit, the max size limit for an ArrayBuffer is 4GB.
    • Even though PieceTree holds the file content in an array of strings/buffers, we will concat all buffers into a single ArrayBuffer before we save:
      export function streamToBuffer(stream: streams.ReadableStream<VSBuffer>): Promise<VSBuffer> {
      return streams.consumeStream<VSBuffer>(stream, chunks => VSBuffer.concat(chunks));
      }
      , which will exceed the limit easily
    • @bpasero I thought we have streaming support for file saving, did I miss anything here?

@bpasero
Copy link
Member Author

bpasero commented Sep 13, 2023

@rebornix yes, we stream contents when saving. there is a heuristic to not stream if the overall size is below some threshold, just to reduce the overhead of streaming and make 99% of saves fast by just sending the buffer over entirely. But as soon as you are above the threshold, we use a stream.

// optimization: if the provider has unbuffered write capability and the data
// to write is not a buffer, we consume up to 3 chunks and try to write the data
// unbuffered to reduce the overhead. If the stream or readable has more data
// to provide we continue to write buffered.
let bufferOrReadableOrStreamOrBufferedStream: VSBuffer | VSBufferReadable | VSBufferReadableStream | VSBufferReadableBufferedStream;
if (hasReadWriteCapability(provider) && !(bufferOrReadableOrStream instanceof VSBuffer)) {
if (isReadableStream(bufferOrReadableOrStream)) {
const bufferedStream = await peekStream(bufferOrReadableOrStream, 3);
if (bufferedStream.ended) {
bufferOrReadableOrStreamOrBufferedStream = VSBuffer.concat(bufferedStream.buffer);
} else {
bufferOrReadableOrStreamOrBufferedStream = bufferedStream;
}
} else {
bufferOrReadableOrStreamOrBufferedStream = peekReadable(bufferOrReadableOrStream, data => VSBuffer.concat(data), 3);
}
} else {
bufferOrReadableOrStreamOrBufferedStream = bufferOrReadableOrStream;
}

If you see it works differently with large files, please let me know.

@deepak1556
Copy link
Collaborator

do you have ideas of this behavior? Did we have any customization with our electron build?

We don't have any customizations for V8 memory cage in electron builds, follows the same restrictions as web. The reason for the weird output in performance.memory is that the value is exposed differently,

The computation happens in https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/timing/memory_info.cc;l=50-54

used_heap_size = used_js_heap_size + external_memory

external_memory refers to allocations that happens outside the memory cage, hence the used heap size looks higher than the heap limit in the output.

On the other hand, Node.js separates the values properly in their process.memoryUsage() output https://github.com/nodejs/node/blob/ccf46ba0f5ce02ae85b8bdb5e29acfc376caf9fc/src/node_process_methods.cc#L196-L208

For sandboxed renderers, you can use process.getHeapStatistics() from Electron which should give the expected output.

@bpasero
Copy link
Member Author

bpasero commented Sep 13, 2023

@rebornix I conducted a test and was able to verify that writes happen chunked (256kb). However, if you make such a large editor dirty, we seem to write the contents entirely in one buffer to disk as backup. Which is a really good find and I reported it as #192970 👏 . Its a relatively new regression though, not shipped to stable yet.

rebornix added a commit that referenced this issue Sep 13, 2023
rebornix added a commit that referenced this issue Sep 13, 2023
@deepak1556
Copy link
Collaborator

@rebornix were you able to find a file test case that triggers a OOM crash due to the heap limit ?

@rebornix
Copy link
Member

rebornix commented Sep 14, 2023

However, if you make such a large editor dirty, we seem to write the contents entirely in one buffer to disk as backup

@bpasero this is exactly what I was seeing, backup kicks in and attempts to save the whole concatenated buffer. I'll give it another check once it's fixed.


@deepak1556 I could not find a test case that can exceed heap limit, it seems nodejs decoder always creates External* strings (as shown in below screenshot) for performance reasons(string_bytes.cc). That's why when we open a 4GB file, VS Code doesn't crash at all (@bpasero pointed out the same in #167719 (comment))

image

It means for file opening, we don't have to use array buffer as the backing store as piecetree never modify the string loaded so they will always be ExternlString, modifications will be stored in a new separate string buffer (as an internal string). I'll try to trigger OOM through editing large files and see if I can generate internal string objects > 4GB.


@deepak1556 @bpasero are you aware of any duplicate issues where users run into crashes due to heap limit (based on the crash dump they shared)? My current hypothesis is users ran into trouble as we have editor contributions which try to read every line (for example, intent guide used to read every line content) and trigger massive internal string objects to be created on heap. Now I have all of them disabled, the chances of running into the OOMs is very low.

@deepak1556
Copy link
Collaborator

Most recent one I got was #184298, I will check for other similar issues in my list.

lins0621 pushed a commit to lins0621/vscode that referenced this issue Sep 14, 2023
@rebornix
Copy link
Member

rebornix commented Sep 15, 2023

We discussed offline on latest learnings and issues we found, here is a gist of them:

  • ✅ When VS Code opens large file, Nodejs decoder creates External*String which are not allocated on V8's heap. So technically we are not limited by the 4GB cap for just reading the file.
    • VS Code can still crash if there is enough memory on the system
  • Currently all core and contrib features that are reading line content are disabled, including word wrap, codelens, folding, indent guide, sticky scroll, word highlighting and link detection. This ensures a smooth scrolling experience once the file is loaded into PieceTree
    • We might want to add some checks or tests to ensure that we don't regress this in the future.
    • In our codebase, we are using | 0 in various places to ensure numbers are all integer (no float, undefined, nan), but it limits the maximum line count. A file with >120M lines of code can not be opened, tracked in Editor is broken when opening files with >120M lines #193164
  • Editing. Modifying the large text buffer can still lead to OOMs as edits are allocated on heap, we have two potential solutions:
    • Open large files (>1GB) in readonly mode, and when users attempt to edit it, warn users that this might not work (freeze or crash VS Code) and ask for confirmation if they want to continue
    • Adopt ArrayBuffer for the editing buffer in PieceTree, so we could support modification up to 16GB
    • We want to ensure that all saving operations are using streaming. Currently backup concatenates all buffers from the snapshot, so a snapshot larger than 4GB can not be saved. This also blocks the UI for seconds to minutes. Issue is tracked in Backup file system provider can not do buffered write #193151

@alexdima
Copy link
Member

alexdima commented Sep 15, 2023

Great finds @rebornix !

  • I agree we shold keep the PieceTree unchanged given we can open files >4GB
  • The | 0 thing is in there since perhaps 10 years, so I don't think we ever could open more than 120M lines of code :)
  • IIRC the used memory for editing grows directly proportional with the edit size. So small edits in such a large text buffer should work fine, right? I guess things like Find/Replace All will not work so well because they could end up allocating a lot of pieces and then running out of memory. Maybe there's a cheap way to detect this case before the crash before applying an edit? We could just refuse to apply it and inform the user.

@rebornix
Copy link
Member

Thanks @alexdima!

The | 0 thing is in there since perhaps 10 years, so I don't think we ever could open more than 120M lines of code :)

agree, it's not worth it to optimize for such corner case.

Maybe there's a cheap way to detect this case before the crash before applying an edit? We could just refuse to apply it and inform the user.

Great suggestion. I added an additional check in the text model via #193309 and validate if the model is too large for any expensive heap operations:

  • Find contrib notifies users that the model is too large so we can't do a Replace All otherwise it might crash
  • It now throws error when we use getValue or getLinesContent on a model that's larger than 512MB, this will use 512MB memory on Heap the moment they are invoked.
  • We could potentially check the heap usage for each edit on the model, and validate if they might exceed the heap limit like what you did in
    private _ensureDisposedModelsHeapSize(maxModelsHeapSize: number): void {
    if (this._disposedModelsHeapSize > maxModelsHeapSize) {
    // we must remove some old undo stack elements to free up some memory
    const disposedModels: DisposedModelInfo[] = [];
    this._disposedModels.forEach(entry => {
    if (!entry.sharesUndoRedoStack) {
    disposedModels.push(entry);
    }
    });
    disposedModels.sort((a, b) => a.time - b.time);
    while (disposedModels.length > 0 && this._disposedModelsHeapSize > maxModelsHeapSize) {
    const disposedModel = disposedModels.shift()!;
    this._removeDisposedModel(disposedModel.uri);
    if (disposedModel.initialUndoRedoSnapshot !== null) {
    this._undoRedoService.restoreSnapshot(disposedModel.initialUndoRedoSnapshot);
    }
    }
    }
    }
    for _disposedModelsHeapSize. Didn't implement this yet as it might be overkill.

@jedwards1211
Copy link
Contributor

@bpasero not just ArrayBuffer -- VSCode should use SharedArrayBuffer for editors and terminals so that it can actually do long-running work on background threads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
debt Code quality issues file-io File I/O
Projects
None yet
Development

No branches or pull requests

6 participants