Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce pressure on editor when opening very large files #169288

Closed
wants to merge 3 commits into from

Conversation

ckmilse
Copy link

@ckmilse ckmilse commented Dec 15, 2022

I see #167719.
When I use vscode to open a large file( maybe about 500MB, the text content , some server log ), the editor is pending, I can't do other operations.
bad experience expecially in the web cloud ide (the remote development)
I think some different way to do with the large file, may be better

@ckmilse
Copy link
Author

ckmilse commented Dec 15, 2022

@microsoft-github-policy-service agree

@ckmilse ckmilse force-pushed the fix_large_file_pending branch from 85ddc80 to 3f1c904 Compare December 15, 2022 17:04
@bpasero bpasero assigned bpasero and unassigned rebornix Dec 16, 2022
@bpasero bpasero added this to the Backlog milestone Dec 16, 2022
@bpasero
Copy link
Member

bpasero commented Dec 16, 2022

This is a good initiative and I certainly see that when opening a very large file, e.g. right on startup with other files in other groups visible:

  • colorization does not kick in
  • you cannot save
  • extension host does not start

I am however a little bit worried about the complexity of the change so would need more time understanding it and seeing if it causes regressions. To be honest, I am not a big fan of the new stat call that we now need anytime we read a file, which will have a perf impact even for opening small files.

I think the gist of this change and what makes it perform better is that today we pump the entire file contents into the stream which can slowdown the consumer and with this change we go back to disk and read chunks of 10MB, which slows down the reading but reduces the pressure on the consumer (the window).

I would be awesome if we could somehow leverage the highWaterMark option for this better:

write(data: T): void | Promise<void> {
if (this.state.destroyed) {
return;
}
// flowing: directly send the data to listeners
if (this.state.flowing) {
this.emitData(data);
}
// not yet flowing: buffer data until flowing
else {
this.buffer.data.push(data);
// highWaterMark: if configured, signal back when buffer reached limits
if (typeof this.options?.highWaterMark === 'number' && this.buffer.data.length > this.options.highWaterMark) {
return new Promise(resolve => this.pendingWritePromises.push(resolve));
}
}
}

We have our own simple stream implementation and it is certainly possible that our implementation misses the pieces to do this properly but in a nutshell it would be nice if the consumer of the stream could signal that it only accepts N chunks of data and the provider of the stream refrains from sending more data until processed. The possible downside of this approach is that the provider will see its buffers fill until the consumer processes it, so maybe after all we need the solution where we read chunks from disk but it would be great if this could be done behind the scenes automatically for large files and not from the text file service.

@bpasero bpasero changed the title fix: large file pending Reduce pressure on editor when opening very large files Dec 16, 2022
@bpasero
Copy link
Member

bpasero commented Dec 16, 2022

Briefly looked into this further, turns out we already do position based reading in the main process, which is what eventually the methods will reach to:

bytesRead = await provider.read(handle, posInFile, buffer.buffer, posInBuffer, buffer.byteLength - posInBuffer);

Here we read a chunk of 256kb data from disk to emit into a stream that eventually travels in the renderer process. If we were to add some yielding here, we can free up some pressure, but it comes at the cost of slower loading of large files...

@ckmilse ckmilse force-pushed the fix_large_file_pending branch from 7c0a3c3 to fbe9156 Compare December 19, 2022 13:28
@ckmilse
Copy link
Author

ckmilse commented Dec 19, 2022

This is a good initiative and I certainly see that when opening a very large file, e.g. right on startup with other files in other groups visible:

  • colorization does not kick in
  • you cannot save
  • extension host does not start

I am however a little bit worried about the complexity of the change so would need more time understanding it and seeing if it causes regressions. To be honest, I am not a big fan of the new stat call that we now need anytime we read a file, which will have a perf impact even for opening small files.

I think the gist of this change and what makes it perform better is that today we pump the entire file contents into the stream which can slowdown the consumer and with this change we go back to disk and read chunks of 10MB, which slows down the reading but reduces the pressure on the consumer (the window).

I would be awesome if we could somehow leverage the highWaterMark option for this better:

write(data: T): void | Promise<void> {
if (this.state.destroyed) {
return;
}
// flowing: directly send the data to listeners
if (this.state.flowing) {
this.emitData(data);
}
// not yet flowing: buffer data until flowing
else {
this.buffer.data.push(data);
// highWaterMark: if configured, signal back when buffer reached limits
if (typeof this.options?.highWaterMark === 'number' && this.buffer.data.length > this.options.highWaterMark) {
return new Promise(resolve => this.pendingWritePromises.push(resolve));
}
}
}

We have our own simple stream implementation and it is certainly possible that our implementation misses the pieces to do this properly but in a nutshell it would be nice if the consumer of the stream could signal that it only accepts N chunks of data and the provider of the stream refrains from sending more data until processed. The possible downside of this approach is that the provider will see its buffers fill until the consumer processes it, so maybe after all we need the solution where we read chunks from disk but it would be great if this could be done behind the scenes automatically for large files and not from the text file service.

This is a good initiative and I certainly see that when opening a very large file, e.g. right on startup with other files in other groups visible:

  • colorization does not kick in
  • you cannot save
  • extension host does not start

I am however a little bit worried about the complexity of the change so would need more time understanding it and seeing if it causes regressions. To be honest, I am not a big fan of the new stat call that we now need anytime we read a file, which will have a perf impact even for opening small files.

I think the gist of this change and what makes it perform better is that today we pump the entire file contents into the stream which can slowdown the consumer and with this change we go back to disk and read chunks of 10MB, which slows down the reading but reduces the pressure on the consumer (the window).

I would be awesome if we could somehow leverage the highWaterMark option for this better:

write(data: T): void | Promise<void> {
if (this.state.destroyed) {
return;
}
// flowing: directly send the data to listeners
if (this.state.flowing) {
this.emitData(data);
}
// not yet flowing: buffer data until flowing
else {
this.buffer.data.push(data);
// highWaterMark: if configured, signal back when buffer reached limits
if (typeof this.options?.highWaterMark === 'number' && this.buffer.data.length > this.options.highWaterMark) {
return new Promise(resolve => this.pendingWritePromises.push(resolve));
}
}
}

We have our own simple stream implementation and it is certainly possible that our implementation misses the pieces to do this properly but in a nutshell it would be nice if the consumer of the stream could signal that it only accepts N chunks of data and the provider of the stream refrains from sending more data until processed. The possible downside of this approach is that the provider will see its buffers fill until the consumer processes it, so maybe after all we need the solution where we read chunks from disk but it would be great if this could be done behind the scenes automatically for large files and not from the text file service.

hi, I changed the code, delete the stat before reading file, and change the idea, now vscode will read the file MAX_STREAM_SEGMENT_SIZE (100MB) for the first time, if the reading is not complete, then read MAX_STREAM_SEGMENT_SIZE(2MB or 10MB) each time;
MAX_STREAM_SEGMENT_SIZE and MIN_LARGE_FILE_SIZE is empiric, maybe need be changed;
this.fileService.readFileStream in the deep code , it would have 'src/vs/platform/files/common/fileService.ts line 516' , one statPromise for each read, so if we want reduce the 'stat', maybe we need to change the code of fileService, but It may affect some other features.
another thing, I'm so sorry for my English skill, make it difficult to read

@futurist
Copy link
Contributor

Any update on this issue? We also need to perform better in vscode-web compared to local-vscode for the large file reading.

@ckmilse ckmilse closed this Sep 2, 2023
@ckmilse
Copy link
Author

ckmilse commented Sep 2, 2023

This pr is not very good. so just close it first.

@github-actions github-actions bot locked and limited conversation to collaborators Oct 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants