-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parser worker isolation considerations #3368
Comments
I've gone back and forth a bit over time on this and ended up landing here: I think we should not do only workers, but offload only when incoming data pressure is high (eg. if the parsing time limit was hit in a frame). If we took this approach is should be a lot easier to manage as we'd keep snappy latency, not needing to jump to another thread for most cases, but do switch to another thread when latency isn't as important. Additionally we don't need to go all in on ArrayBuffers for data transfer because of this, we would probably still need to refactor such that we have a model object or something that can easily be transferred back and forth. Graceful degradation is also extremely important here:
This approach would also be a lot easier to do incrementally; model refactors, worker adoption, sab support with fallback, etc. |
@Tyriar Yes, I agree with the degradation needs, a proper worker+SAB setup is not easy anymore. In any case we would need a failover logic (maybe just drop back to fully mainthread-driven variant). I am not so sure about your middle ground idea - doesnt that mean we'd have to implement the full deal for both cases, additional we'd need to implement the "high pressure" transition logic. And about "all in on ArrayBuffers" - you mean like partially using array buffers, but transmit other states by messaging? Doesn't that create a much more cumbersome situation, as we'd have to synchronize across several worker interaction models (e.g. SAB atomics + the message states). Furthermore as soon as we use messaging for any state transmission, we need active code parts on both ends providing/digesting the message. Not so with a pure SAB solution. Well I still have to think about that idea, maybe I just dont see the benefits yet. 🤔 |
I'd think the code would be almost entirely shared between main/worker.
It's trying to solve it only when it becomes a problem, if the main thread is about to be busy, offload to a worker until it calms down. If we do fallback right it probably wouldn't be too cumbersome to switch between the 2 modes. |
From may experimentation with rendering the terminal through a Monaco Editor, I just recently faced the same questions. My solution now is to run xterm.js in a node.js thread which also spawns the pty. I'm hooking to In my case, these changes operate on a line level granularity, and they basically contain the line text that has changed, and an array of line decorations (range, foreground, background colors, font styles etc.). I'm also sending some buffer state to the frontend that is necessary for styling (e.g. if mouse events are enabled, so we can draw a pointer): {
buffer: {
normal: boolean,
rows: number,
cols: number,
startId: number,
endId: number,
mouseEvents: boolean,
hasScrollback: boolean,
scrollbackVisible: boolean,
applicationCursorKeys: boolean,
bracketedPasteMode: boolean,
sendFocus: boolean,
cursor: {
visible: boolean,
lineId: number,
colIndex: number
}
},
edits: [
startLineNumber: number,
startColumn: number,
endLineNumber: number,
endColumn: number,
text?: string
][],
decorations: [
lineId: number,
decorations: [
lineNumber: number,
startColumn: number,
endColumn: number,
attrs: TerminalRendererDecorationAttributes
][]
][]
} Note that with this kind of design I don't have to synchronise the terminal buffer of the backend with the frontend. I only send "render" instructions to the frontend, which really lowers the memory consumption and overall computation time of the frontend. I discovered this design has several advantages:
Caveats Running xterm.js in a node.js process requires some hacks at the moment. There is internal code that relies on the global global.self = global.window = {};
const { Terminal } = require('xterm');
const { Unicode11Addon } = require('xterm-addon-unicode11');
delete global.self;
delete global.window; Additionally, you cannot initialise the rendering part of xterm.js via Last but not least you need a way to track an individual line in the buffer via an ID. This is very important for diff computing, because our Buffer is circular and will remove lines at the top of the viewport once the maximum number of scrollback lines are reached. It's therefore not feasible to use the line index of the buffer for this. I'm hooking to the Conclusion IMHO utilising a WebWorker for the parser may create more overhead than it tries to solve. I think we should rather optimize towards a scenario where the xterm.js core can be run in the same process as the pty (in the node.js backend), and then create an interface that is specifically crafted to update a renderer in the frontend. There also have to be interfaces in the other direction to propagate abstracted mouse and keyboard events from the renderer to the core. ** Extra ** Here's a capture of running Kapture.2021-06-11.at.17.05.05.mp4 |
Isn't
the same as utlizing a webworker for the core parts (+ IO chunk handling in your case, which only applies to electron/nodejs apps), while you say
Can you explain, what would be different to a webworker isolation of the parser/VT emulation? Edit: Maybe my text above was misleading - with parser/VT emulation I mean the whole inner state handling including the terminal buffer itself (not only the parser). |
Yes it's technically the same, except for the data flow. Using a Web Worker means that all of the PTY stream data (backend) has to get to the render thread of the browser (usually via WebSockets or IPC), then from there it has to get to the WebWorker where the xterm.js core is running (usually via postMessage). I think it would be smarter if the PTY stream and the xterm.js core were living in the same backend process, where the stream data Buffers could be easily shared (by reference) without getting copied across multiple communication layers (as outlined above). The idea is very simple: The parser (xterm.js core) must consume the whole PTY stream without gaps to keep the internal Terminal Buffer state consistent. A renderer does not. It can at any point in time simply get a view on that Terminal Buffer state and it can skip frames if the updates come in too often. Imagine you were implementing a new Canvas renderer for xterm.js. With every render cycle, you iterate the Buffer lines that have changed since your last render cycle and create draw instructions from them. Now, instead of actually drawing them to the Canvas, you serialize the instructions as JSON. Now imagine this renderer was living in the xterm.js core and runs in the backend. Now in the Browser frontend, you create an implementation that can request a render from your backend, gets the serialized instructions and draws them to the real Canvas. That is basically what I am proposing, and what I've done with the Monaco Editor renderer. Now the PTY data stream never has to pass the Browser render thread. It can stay in the backend with minimal overhead. All that passes to the browser are drawing instructions whenever desirable. |
Some greenfield out-of-the-box thinking - ideally things could work like this: VT emulation worker:
mainthread:
The rendering parts needs a bit more thinking, whether we are fully SAB'ed or not. Again thinking it from an ideal world we have everything in SABs. Then it is quite easy:
This SAB model has one big advantage - it does not need any active messaging code on both ends, it can be orchestrated solely by some locks on the SAB itself for the rendering, thus no stocking up of messages in queues, no need to drop back to event loop at all on parser side. Technically even the locks are not needed (well we still would want them to not get totally screwed up output in between). Ofc it is not that easy in conjunction with key/mouse event delegation anymore, they prolly need some messaging to the worker again (more tricky to abstract with SAB, though not impossible), then the worker would have to ground back to top level event loop from time to time again. Well that's basically what I think could work and would be most performant solution (not lock-free, but almost without any data copying between mainthread/worker). As far I understand Mofux's solution, it uses hardcopy messaging of changed bufferlines, which creates the need to store things on both ends (waste of memory) and is much slower in the transfer.
Why do you think so? Imho websockets are fine within webworkers. |
It's technically possible, but I think impractical for most application architectures - they likely establish only one single websocket connection for the whole frontend session (dealing with authentication and stuff) and dispatch messages over that single connection. Creating additional websocket connections means that you have to expose additional websocket endpoints on your server, map them to sessions, make sure they are authenticated and what not. It's certainly possible, but impractical IMHO. Also, when looking into the electron world, I'm not so sure if a WebWorker can create an IPC communication directly with Main thread. That being put aside - my point was to keep the PTY data stream as close to the core as possible, whenever possible 😁 Of course, if the backend is not node.js, then you'll likely run the xterm core in frontend anyway. I'd love to carry these discussions forward via a video call. Unfortunately my vacation starts tomorrow and I'll not be back on my PC until mid July 🥺 |
The biggest issue imo is that other actions are bound to input latency like scrolling, also a model like this break some assumptions made and end up breaking complex integrations like task problem matchers and live share in VS Code.
I'd like to support the ability to do this via the node target of xterm.js #2749 and the serialize addon (this could build smarter diffs too). I don't think it should be the default or recommended setup though because it's much more complicated to setup and comes with its own set of problems for little pay off imo. Having this as a it's possible to do thing seems like enough imo. Those caveats seem like improvements could be done in the lib so you don't need the hacks.
Another idea is to just allow almost all of xterm.js to live inside the worker, use
I don't think this is possible, I'd love it if it was though microsoft/vscode#113705 (comment)
We can always set one up 👍, it's just tricky aligning everyone's schedule. |
Kinda dont get it - wouldn't that be an argument against your own idea, to move the chunk IO closer to ??? - sorry, I currently dont know what ??? is for you, because you say worker is it not. But workers are the only option for a browser env, that dont involve really nasty mutli-page/iframe hacks (which is a clear nogo).
This topic is not pressing at all, we should take all the time we need to get through the cumbersome details. |
Doing the VT emulation in the backend, like on the server for browser based apps? 😱 Imho thats not a good idea as general purpose solution, kinda all web driven cluster orchestrators with potentially tons of terminals attached will fall short here. Beside the really high costs for the server side emulation, constantly pushing screen updates over the wire will create much more traffic and show bad latency. With the PTY data stream we kinda already have the best deal in terms of packrate/latency. What would be possible: Once we got the browser refs completely out of common, we can create several build targets:
This might be important to note: For me electron is the special case here and should not be the driving force behind all that, there are many other frameworks with different semantics, and I see a vanilla browser as the default integration env. You might tend to disagree here, I know that you all are very electron centric, but imho moving xterm.js from a pure browser frontend target to more electron specific needs by default would be a big mistake. |
(Slightly offtopic and more a reminder for later on ...) There is another possibility linked to the parser worker isolation - a WASM implementation of the utf32 converter + the parser class (inner sequence parsing logic). Not sure yet if that is a viable goal as it raises the maintenance complexity, yet it would be easy possible with a C or Rust replacement, since both classes are very C-ish operating on fixed borrowed memory. Some rough estimations of possible gains here:
For this 119 MB pile of data I see the following runtime numbers in Chrome and Firefox (all self time in ms):
Ofc most work load is created by image decoding on the worker, which is not further considered here. The interesting bits on main thread are the self time of These numbers are surprising for 2 reasons:
And thats where a WASM implementation can help to level out the field. With WASM there is no JS JIT involved anymore, the code runs from call one on pre-optimized. Also those big engine difference should go away (it still depends on the engine's WASM speed, which I dont know for Firefox). Ofc, whether we can gain anything for these tasks remains uncertain until we actually tried. Going back to my Rust/WASM experiments during the buffer reshaping 2ys ago, I remember that the parser logic was ~30% faster compared to JS numbers in nodejs/Chrome. I have no real numbers for Firefox here, but if the promise of WASM, to perform fairly equal, is true, that would speed up the inner parsing ~3 times on Firefox. TL;DR Edit: Edit2: const src = new Uint8Array(FIXED_SIZE);
const dst = new Uint32Array(FIXED_SIZE); and trying to pump the 119MB from above chunkified as 64kB blocks (all Chrome numbers):
Thats really surprising, both WASM impls are way faster than "native" JS loops, emscripten being quite ahead. Would not have thought, that a loop scope/context is that cheap in WASM (or this expensive in JS). Update: Oh dear - with Summary of the short WASM interlude (wont further hijack the thread): |
Out of scope for me, thus closing. |
Once again some considerations about the idea to separate the parser / VT emulation from the main thread, this time with its now dedicated issue.
Where are we at?
Currently xterm.js is 100% main thread bound, thus everything from reading chunks in
term.write
to rendering happens on the main thread.The Good...
This design works pretty nicely for casual integrations, where the terminal does not do much. Furthermore it is quite convenient to deal with the terminal object and its API for custom extensions, everything is at your hands in the main thread context.
The Bad...
The picture changes for more complex integrations, where the main thread is already under high pressure for whatever reason, where any further computational stress might cause lag spikes or make things feel sluggish. This scenario can be triggered quite easy with xterm.js - just try to run multiple terminal instances within one embedding document (like multiplexing) and give the terminals some IO. Here the terminals will behave greedy with the main thread runtime as limited resource, every instance will try to saturate the main thread on its own (based on a 60 fps approximation, thus 2 instances --> 30 fps, 4 --> 15 fps and so on) not taking outer stress into the equation. Any outer stress will simply prolong the update cycles further (up to lag spikes), multiple terminals in one page will slow down each other.
The Ugly...
Workers for the rescue? As we all know the real limiting resource beneath all that are the horse powers of your machine's CPU (plus a few more like memory, cache sizes, bus speed, ...). But JS with its strong single thread affiliation does not do a good job in utilizing CPU cores per se, we are basically bound to one core with cooperative multitasking in JS, while the underlying OS offers measures to saturate more cores with preemptive style.
Dont want to go into the "Why is it that bad in JS" rant, I understand that the language's flexibility creates a global context burden, thats hard to parallelize, still I wonder about the general future of JS here, as CPU designs clearly go the multicore path (Time to get an Erlang clone into the browser?)
Well in JS we got workers as a rather expensive way to still benefit from multicore design. Can we use it for xterm.js to lower the main thread stress? Well yes and no - welcome to the Ugly.
What could be "workered"?
The main workload in xterm.js for high input comes from the parsing and state changes itself, which is synchronous blocking code with none to a few references to the outer browser (main thread) env. With some rework of the internal interfaces this would be a hot candidate for a worker.
The second highest in the workload books is the rendering task with quite some differences across selected renderer type (as parsing:rendering ratio):
By its nature rendering has to deal with much more browser env main thread references, and it is almost impossible to offload this solely into a worker (beside going full OffscreenCanvas, which I dont consider a viable goal atm as the support is still pretty lousy across browser engines).
The remaining tasks are user input / event handling, which are typically <1% in workload, thus not worth to be delegated to a worker. (Some actions like resize still create very high workload, but are buffer bound, thus would automatically happen in a worker if we moved parsing there.)
As an upper approximation - we could relieve the mainthread of 60% - 90% workload created by xterm.js depending on the selected renderer.
Why "The Ugly"?
Offloading the parser/VT emulation introduces several hard to swallow downsides:
Whatever data transfer will be suitable here, it adds additional runtime on top lowering the relief numbers above again.
Atomics
, to coordinate the render update cycles (can be pretty lightweight). Still this raises tons of detail questions to avoid common lock/sync issues (like thread starvation).Is it worth the trouble?
I am not sure about that. The promise to lower main thread occupation by 60% - 90% is a really big one, on the other hand the coding efforts to get something reliable with SABs done are huge as well (baby steps into that direction are already outlined by those nodejs build target PRs). Furthermore we prolly would need a main thread fallback as SABs cannot be used everywhere due their security constraints (Do we need different build targets for SAB-worker based?). In any case a worker solution always creates higher integration needs due to separated JS resources.
Personally I like challenges like that and I am up for trying at least. But from a more realistic perspective I think this will involve tons of changes to xterm.js both internally and its API (>30% rewrite), and I am not sure if we can manage to maintain the same convenience for integrators in the end. So any attempt into the worker direction might turn out as a dead end for not being easy usable/customizable.
Up for discussion. 😸
The text was updated successfully, but these errors were encountered: