Explore using `ArrayBuffer` as backend for editor text contents #167719

bpasero · 2022-11-30T13:23:33Z

Explore to not use string but ArrayBuffer for the contents of a text editor which brings higher memory limits. See #167719 (comment)

Original Issue (now outdated)

I see how https://github.com//issues/42839 introduced a way to avoid opening large files by telling the user to quit and restart with a larger `--max-memory`. First of all, I think asking the user to confirm opening a very large file is good because it may result in freezes or slowdown. However, I am no longer able to reproduce the endless freeze, even when opening a very large file (see comment [1] how to create a large file). Maybe given our work on sandboxing this issue is now different or not existent anymore?

At the same time, with the Electron 22 update, we get https://www.electronjs.org/blog/v8-memory-cage so I wonder if --max-memory is even still supported (maybe Deepak could clarify).

This issue is to:

validate that our original assumptions for adding this still apply
see if --max-memory still works with Electron 22
and if not, maybe we change the UX to still block opening the file at first by asking the user for confirmation (similar to how we ask for confirmation before opening a binary file)

[1] How to generate a large file
Inspired by this page, the following script that you can run from a macOS terminal produces a 4.23 GB random file for me after ~2min:

ruby -e 'a=STDIN.readlines;100000000.times do;b=[];4.times do; b << a[rand(a.size)].chomp end; puts b.join(" "); end' < /usr/share/dict/words > file.txt

//cc @deepak1556

The text was updated successfully, but these errors were encountered:

bpasero · 2022-12-21T15:30:43Z

Some interesting discoveries:

I can open 2 4gb each random files without problems from the E22 branch
ArrayBuffer is granted a larger space than 4 GB because (as far as I understand) they are not allocated on the heap
we have maxFileSize and maxHeapSize driving the decision if such a large file can open and I had to disable that [1]

[1]

vscode/src/vs/platform/files/common/files.ts

Lines 1321 to 1322 in 9718242

    
           maxFileSize: arch === Arch.IA32 ? 300 * ByteSize.MB : 16 * ByteSize.GB,  // https://github.com/microsoft/vscode/issues/30180 
        
           maxHeapSize: arch === Arch.IA32 ? 700 * ByteSize.MB : 2 * 700 * ByteSize.MB, // https://github.com/v8/v8/blob/5918a23a3d571b9625e5cce246bdd5b46ff7cd8b/src/heap/heap.cc#L149

rebornix · 2022-12-31T01:25:51Z

We discussed offline about this task and one of the idea is using ArrayBuffer directly as the backing store for the text buffer instead of using string so we won't be affected by the heap size limit. I experimented a bit (in https://github.com/microsoft/vscode/tree/rebornix/anonymous-donkey) and found moving from string to array buffer for PieceTree is pretty easy.

The real challenge is currently most our editor internal parts eagerly read line contents (through getLineContent). If we cache lineContent, it defeats the purpose of using ArrayBuffer. If we don't, then a component attempts to read every line content of the text model, might block the whole UI if the file is large. I have already moved getLineContent to getLineCharCode if possible in https://github.com/microsoft/vscode/tree/rebornix/anonymous-donkey but following components are not easy to move off line content:

Folding range compute

vscode/src/vs/editor/contrib/folding/browser/indentRangeProvider.ts

Line 155 in 4786a49

const lineContent = model.getLineContent(line);
Line breaks computer

vscode/src/vs/editor/common/viewModel/viewModelLines.ts

Line 133 in 4786a49

lineBreaksComputer.addRequest(this.model.getLineContent(i + 1), lineInjectedText, previousLineBreaks ? previousLineBreaks[i] : null);
Model syncing to web worker (this can be no-op though once we have sab)

bpasero · 2022-12-31T06:02:29Z

👏 for pushing this forward, would be great if we can look into the 2 remaining ones, I am happy to help if possible.

tahmidbintaslim · 2023-02-10T07:17:48Z

Hi Everyone, I still see this issue happening to me whenever I try to open big files (Example: Larger than 10 mb) will all installed plugin and settings. Any update on this?

bpasero · 2023-02-10T07:29:24Z

I still see this issue happening to me whenever I try to open big files

What issue?

bpasero · 2023-04-18T06:43:41Z

Fyi I have removed support for --max-memory and related editor integrations from our product via 6d5b854. Turns out that max_old_space_size is unsupported since a very long time (electron/electron#31330).

rebornix · 2023-09-13T03:51:09Z

Continued the ArrayBuffer exploration today. The first thing I did was trying to find a large file that can exceed the heap memory limit which can then be used as a file to validate the ArrayBuffer implementation. I could find such file as I didn't seem to trigger v8's heap memory cage.

Opening a file > 4GB via the script in Explore using ArrayBuffer as backend for editor text contents #167719 (comment) always works.
- The used heap size by running performance.memory is very strange as the usedJSHeapSize can be larger than totalJSHeapSize
```
performance.memory
MemoryInfo {totalJSHeapSize: 12169242247, usedJSHeapSize: 12165238223, jsHeapSizeLimit: 4294705152}
jsHeapSizeLimit: 4294705152
totalJSHeapSize: 12169242247
usedJSHeapSize: 12165238223
```
- @deepak1556 do you have ideas of this behavior? Did we have any customization with our electron build?
After the file is opened, the UI can freeze for minutes. This is mainly because we still have components attempting to read all lines
- WordHighlighter
- StickyModelFromCandidateIndentationFoldingProvider

We can modify the file but save doesn't work. It will throw error Uncaught RangeError: Array buffer allocation failed

On Nodejs 18 64bit, the max size limit for an ArrayBuffer is 4GB.

Even though PieceTree holds the file content in an array of strings/buffers, we will concat all buffers into a single ArrayBuffer before we save:

vscode/src/vs/base/common/buffer.ts

Lines 303 to 306 in 7151639

    
           export function streamToBuffer(stream: streams.ReadableStream<VSBuffer>): Promise<VSBuffer> { 
        
           	return streams.consumeStream<VSBuffer>(stream, chunks => VSBuffer.concat(chunks)); 
        
           }

, which will exceed the limit easily

@bpasero I thought we have streaming support for file saving, did I miss anything here?

bpasero · 2023-09-13T05:11:44Z

@rebornix yes, we stream contents when saving. there is a heuristic to not stream if the overall size is below some threshold, just to reduce the overhead of streaming and make 99% of saves fast by just sending the buffer over entirely. But as soon as you are above the threshold, we use a stream.

vscode/src/vs/platform/files/common/fileService.ts

Lines 372 to 390 in f36df69

    
           // optimization: if the provider has unbuffered write capability and the data 
        
           // to write is not a buffer, we consume up to 3 chunks and try to write the data 
        
           // unbuffered to reduce the overhead. If the stream or readable has more data 
        
           // to provide we continue to write buffered. 
        
           let bufferOrReadableOrStreamOrBufferedStream: VSBuffer | VSBufferReadable | VSBufferReadableStream | VSBufferReadableBufferedStream; 
        
           if (hasReadWriteCapability(provider) && !(bufferOrReadableOrStream instanceof VSBuffer)) { 
        
           	if (isReadableStream(bufferOrReadableOrStream)) { 
        
           		const bufferedStream = await peekStream(bufferOrReadableOrStream, 3); 
        
           		if (bufferedStream.ended) { 
        
           			bufferOrReadableOrStreamOrBufferedStream = VSBuffer.concat(bufferedStream.buffer); 
        
           		} else { 
        
           			bufferOrReadableOrStreamOrBufferedStream = bufferedStream; 
        
           		} 
        
           	} else { 
        
           		bufferOrReadableOrStreamOrBufferedStream = peekReadable(bufferOrReadableOrStream, data => VSBuffer.concat(data), 3); 
        
           	} 
        
           } else { 
        
           	bufferOrReadableOrStreamOrBufferedStream = bufferOrReadableOrStream; 
        
           }

If you see it works differently with large files, please let me know.

deepak1556 · 2023-09-13T06:49:34Z

do you have ideas of this behavior? Did we have any customization with our electron build?

We don't have any customizations for V8 memory cage in electron builds, follows the same restrictions as web. The reason for the weird output in performance.memory is that the value is exposed differently,

The computation happens in https://source.chromium.org/chromium/chromium/src/+/main:third_party/blink/renderer/core/timing/memory_info.cc;l=50-54

used_heap_size = used_js_heap_size + external_memory

external_memory refers to allocations that happens outside the memory cage, hence the used heap size looks higher than the heap limit in the output.

On the other hand, Node.js separates the values properly in their process.memoryUsage() output https://github.com/nodejs/node/blob/ccf46ba0f5ce02ae85b8bdb5e29acfc376caf9fc/src/node_process_methods.cc#L196-L208

For sandboxed renderers, you can use process.getHeapStatistics() from Electron which should give the expected output.

bpasero · 2023-09-13T07:02:41Z

@rebornix I conducted a test and was able to verify that writes happen chunked (256kb). However, if you make such a large editor dirty, we seem to write the contents entirely in one buffer to disk as backup. Which is a really good find and I reported it as #192970 👏 . Its a relatively new regression though, not shipped to stable yet.

… the model is very large.

… the model is very large. (#193055)

deepak1556 · 2023-09-14T02:50:28Z

@rebornix were you able to find a file test case that triggers a OOM crash due to the heap limit ?

rebornix · 2023-09-14T03:34:45Z

However, if you make such a large editor dirty, we seem to write the contents entirely in one buffer to disk as backup

@bpasero this is exactly what I was seeing, backup kicks in and attempts to save the whole concatenated buffer. I'll give it another check once it's fixed.

@deepak1556 I could not find a test case that can exceed heap limit, it seems nodejs decoder always creates External* strings (as shown in below screenshot) for performance reasons(string_bytes.cc). That's why when we open a 4GB file, VS Code doesn't crash at all (@bpasero pointed out the same in #167719 (comment))

It means for file opening, we don't have to use array buffer as the backing store as piecetree never modify the string loaded so they will always be ExternlString, modifications will be stored in a new separate string buffer (as an internal string). I'll try to trigger OOM through editing large files and see if I can generate internal string objects > 4GB.

@deepak1556 @bpasero are you aware of any duplicate issues where users run into crashes due to heap limit (based on the crash dump they shared)? My current hypothesis is users ran into trouble as we have editor contributions which try to read every line (for example, intent guide used to read every line content) and trigger massive internal string objects to be created on heap. Now I have all of them disabled, the chances of running into the OOMs is very low.

deepak1556 · 2023-09-14T03:55:47Z

Most recent one I got was #184298, I will check for other similar issues in my list.

…ider when the model is very large. (microsoft#193055)

rebornix · 2023-09-15T04:07:41Z

We discussed offline on latest learnings and issues we found, here is a gist of them:

✅ When VS Code opens large file, Nodejs decoder creates External*String which are not allocated on V8's heap. So technically we are not limited by the 4GB cap for just reading the file.
- VS Code can still crash if there is enough memory on the system
Currently all core and contrib features that are reading line content are disabled, including word wrap, codelens, folding, indent guide, sticky scroll, word highlighting and link detection. This ensures a smooth scrolling experience once the file is loaded into PieceTree
- We might want to add some checks or tests to ensure that we don't regress this in the future.
- In our codebase, we are using | 0 in various places to ensure numbers are all integer (no float, undefined, nan), but it limits the maximum line count. A file with >120M lines of code can not be opened, tracked in Editor is broken when opening files with >120M lines #193164
Editing. Modifying the large text buffer can still lead to OOMs as edits are allocated on heap, we have two potential solutions:
- Open large files (>1GB) in readonly mode, and when users attempt to edit it, warn users that this might not work (freeze or crash VS Code) and ask for confirmation if they want to continue
- Adopt ArrayBuffer for the editing buffer in PieceTree, so we could support modification up to 16GB
- We want to ensure that all saving operations are using streaming. Currently backup concatenates all buffers from the snapshot, so a snapshot larger than 4GB can not be saved. This also blocks the UI for seconds to minutes. Issue is tracked in Backup file system provider can not do buffered write #193151

alexdima · 2023-09-15T12:55:43Z

Great finds @rebornix !

I agree we shold keep the PieceTree unchanged given we can open files >4GB
The | 0 thing is in there since perhaps 10 years, so I don't think we ever could open more than 120M lines of code :)
IIRC the used memory for editing grows directly proportional with the edit size. So small edits in such a large text buffer should work fine, right? I guess things like Find/Replace All will not work so well because they could end up allocating a lot of pieces and then running out of memory. Maybe there's a cheap way to detect this case before the crash before applying an edit? We could just refuse to apply it and inform the user.

…arge.

rebornix · 2023-09-17T19:26:47Z

Thanks @alexdima!

The | 0 thing is in there since perhaps 10 years, so I don't think we ever could open more than 120M lines of code :)

agree, it's not worth it to optimize for such corner case.

Maybe there's a cheap way to detect this case before the crash before applying an edit? We could just refuse to apply it and inform the user.

Great suggestion. I added an additional check in the text model via #193309 and validate if the model is too large for any expensive heap operations:

Find contrib notifies users that the model is too large so we can't do a Replace All otherwise it might crash
It now throws error when we use getValue or getLinesContent on a model that's larger than 512MB, this will use 512MB memory on Heap the moment they are invoked.

We could potentially check the heap usage for each edit on the model, and validate if they might exceed the heap limit like what you did in

vscode/src/vs/editor/common/services/modelService.ts

Lines 305 to 323 in 3b60c12

    
           private _ensureDisposedModelsHeapSize(maxModelsHeapSize: number): void { 
        
           	if (this._disposedModelsHeapSize > maxModelsHeapSize) { 
        
           		// we must remove some old undo stack elements to free up some memory 
        
           		const disposedModels: DisposedModelInfo[] = []; 
        
           		this._disposedModels.forEach(entry => { 
        
           			if (!entry.sharesUndoRedoStack) { 
        
           				disposedModels.push(entry); 
        
           			} 
        
           		}); 
        
           		disposedModels.sort((a, b) => a.time - b.time); 
        
           		while (disposedModels.length > 0 && this._disposedModelsHeapSize > maxModelsHeapSize) { 
        
           			const disposedModel = disposedModels.shift()!; 
        
           			this._removeDisposedModel(disposedModel.uri); 
        
           			if (disposedModel.initialUndoRedoSnapshot !== null) { 
        
           				this._undoRedoService.restoreSnapshot(disposedModel.initialUndoRedoSnapshot); 
        
           			} 
        
           		} 
        
           	} 
        
           }

for _disposedModelsHeapSize. Didn't implement this yet as it might be overkill.

…3309)

…ider when the model is very large. (microsoft#193055)

jedwards1211 · 2024-06-27T22:57:21Z

@bpasero not just ArrayBuffer -- VSCode should use SharedArrayBuffer for editors and terminals so that it can actually do long-running work on background threads.

bpasero added file-io File I/O electron-22-update labels Nov 30, 2022

bpasero added this to the December 2022 milestone Nov 30, 2022

bpasero assigned rebornix Nov 30, 2022

bpasero removed this from the December 2022 milestone Nov 30, 2022

rebornix added the debt Code quality issues label Dec 5, 2022

ckmilse mentioned this issue Dec 15, 2022

Reduce pressure on editor when opening very large files #169288

Closed

rebornix mentioned this issue Dec 30, 2022

VS Code crashing on restart after opening large file once with —max-memory=12288 #80701

Closed

joyceerhl mentioned this issue Mar 13, 2023

Large file working copy handling on web #176993

Closed

bpasero removed the electron-22-update label Apr 18, 2023

bpasero changed the title ~~Revisit large file handling with max-memory~~ Explore using ArrayBuffer as backend for editor text contents Apr 18, 2023

alexdima self-assigned this Aug 23, 2023

kieferrm mentioned this issue Sep 11, 2023

Iteration Plan for September 2023 #192822

Closed

62 tasks

deepak1556 mentioned this issue Sep 12, 2023

Crash every time at startup with 'Code 5' #184298

Open

rebornix unassigned alexdima Sep 13, 2023

bpasero mentioned this issue Sep 13, 2023

Atomic write should not be enabled for all usages of vscode-userdata #192970

Closed

rebornix added a commit that referenced this issue Sep 13, 2023

Re #167719. Disable word highlighting, codelens, sticky provider when…

b53b0cc

… the model is very large.

rebornix added a commit that referenced this issue Sep 13, 2023

Re #167719. Disable word highlighting, codelens, sticky provider when…

17f32ea

… the model is very large. (#193055)

lins0621 pushed a commit to lins0621/vscode that referenced this issue Sep 14, 2023

Re microsoft#167719. Disable word highlighting, codelens, sticky prov…

f52dd26

…ider when the model is very large. (microsoft#193055)

rebornix mentioned this issue Sep 15, 2023

Show progress indication when saving an editor takes long #193230

Closed

rebornix added a commit that referenced this issue Sep 17, 2023

Re #167719. Avoid heap operations when dealing with large files.

af1df65

rebornix added a commit that referenced this issue Sep 17, 2023

Re #167719. Avoid fetching whole editor content if the model is too l…

0050fbb

…arge.

rebornix added a commit that referenced this issue Sep 18, 2023

Re #167719. Avoid heap operations when dealing with large files. (#19…

90854de

…3309)

rebornix mentioned this issue Sep 19, 2023

Faster TextModel/TextBuffer access #193427

Closed

17 tasks

rebornix added this to the Backlog milestone Sep 26, 2023

hhc87 pushed a commit to lins0621/vscode that referenced this issue Oct 8, 2023

Re microsoft#167719. Disable word highlighting, codelens, sticky prov…

5f5212a

…ider when the model is very large. (microsoft#193055)

maldag mentioned this issue Dec 4, 2023

Implement File Reading and parsing for unlimited file sizes #199922

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore using `ArrayBuffer` as backend for editor text contents #167719

Explore using `ArrayBuffer` as backend for editor text contents #167719

bpasero commented Nov 30, 2022 •

edited

Loading

bpasero commented Dec 21, 2022

rebornix commented Dec 31, 2022

bpasero commented Dec 31, 2022

tahmidbintaslim commented Feb 10, 2023

bpasero commented Feb 10, 2023

bpasero commented Apr 18, 2023 •

edited

Loading

rebornix commented Sep 13, 2023 •

edited

Loading

bpasero commented Sep 13, 2023 •

edited

Loading

deepak1556 commented Sep 13, 2023

bpasero commented Sep 13, 2023 •

edited

Loading

deepak1556 commented Sep 14, 2023

rebornix commented Sep 14, 2023 •

edited

Loading

deepak1556 commented Sep 14, 2023

rebornix commented Sep 15, 2023 •

edited

Loading

alexdima commented Sep 15, 2023 •

edited

Loading

rebornix commented Sep 17, 2023

jedwards1211 commented Jun 27, 2024

Explore using ArrayBuffer as backend for editor text contents #167719

Explore using ArrayBuffer as backend for editor text contents #167719

Comments

bpasero commented Nov 30, 2022 • edited Loading

bpasero commented Dec 21, 2022

rebornix commented Dec 31, 2022

bpasero commented Dec 31, 2022

tahmidbintaslim commented Feb 10, 2023

bpasero commented Feb 10, 2023

bpasero commented Apr 18, 2023 • edited Loading

rebornix commented Sep 13, 2023 • edited Loading

bpasero commented Sep 13, 2023 • edited Loading

deepak1556 commented Sep 13, 2023

bpasero commented Sep 13, 2023 • edited Loading

deepak1556 commented Sep 14, 2023

rebornix commented Sep 14, 2023 • edited Loading

deepak1556 commented Sep 14, 2023

rebornix commented Sep 15, 2023 • edited Loading

alexdima commented Sep 15, 2023 • edited Loading

rebornix commented Sep 17, 2023

jedwards1211 commented Jun 27, 2024

Explore using `ArrayBuffer` as backend for editor text contents #167719

Explore using `ArrayBuffer` as backend for editor text contents #167719

bpasero commented Nov 30, 2022 •

edited

Loading

bpasero commented Apr 18, 2023 •

edited

Loading

rebornix commented Sep 13, 2023 •

edited

Loading

bpasero commented Sep 13, 2023 •

edited

Loading

bpasero commented Sep 13, 2023 •

edited

Loading

rebornix commented Sep 14, 2023 •

edited

Loading

rebornix commented Sep 15, 2023 •

edited

Loading

alexdima commented Sep 15, 2023 •

edited

Loading