Use a time-based limit in Terminal._innerWrite #1818

juancampa · 2018-12-08T06:21:51Z

I'm opening up this PR mostly to hear what you guys think this proposal

Problem

Currently WRITE_BATCH_SIZE limits the number of writes to allow the renderer to draw a frame this allows the terminal to update but framerate varies widely

Solution

The approach I'm proposing looks at the clock to determine when to stop as opposed to a fixed number of iterations. Effectively this allocates some time for writing, while the rest can be used for rendering.

Tests run

From my tests this change makes the terminal feel a lot smoother when processing a lot of output with the drawback that it takes a bit longer to print it all. I ran tests with different limits and here are the results:

Test setup:

Using hyper
165 rows x 295 cols @ 5px font
command: find ~

LIMIT	FPS	Duration
`WRITE_TIMEOUT_MS=12ms`	33~40	11.7s
`WRITE_TIMEOUT_MS=24ms`	25~30	11.5s
`WRITE_TIMEOUT_MS=30ms`	20~25	10s
`WRITE_TIMEOUT_MS=MAX_INT`	9~13	8.6s
Existing approach (300 iterations)	8~18	8.8

First column is the value of the new constant, the approach to limit writing
Second column is the framerate as measured by electron's dev tools
Third column is the time it takes for the command to finish

As you can see, it takes a bit longer to display everything, which is expected because more frames are being drawn, but IMO it actually feels faster because you see a lot more of the output, and the terminal doesn't look like it's choking.

Caveats

This was only tested with the webgl renderer by @Tyriar which is still WIP (and also working amazing btw)

The idea is that it should run for a bit and then let the renderer draw a frame so that the terminal look responsive. The existing approach limits the work done using a fixed number elements from the write buffer so the duration of a frame can vary widely. This approach looks at the clock to determine when to stop, we basically allocate an amount of time each frame to write, while the rest can be used for rendering. From my tests this change makes the terminal feel a lot smoother.

juancampa · 2018-12-08T06:24:45Z

I could also put this behind a flag so users can choose between: smoothness vs. quick to finish.

jerch · 2018-12-08T12:55:22Z

@juancampa Yeah the current buffer write time estimation needs some refinement, I like the idea to do it time based. Problem is that there are 3 major timing factors, that are under heavy changes atm:

time to store incoming data in the terminal buffer
With the new TypedArray buffer the average input rate doubles (goes from ~7 MB/s up to ~18 MB/s on my machine). There are changes pending that will further speedup things up to >25 MB/s (Deprecate and remove JS array buffer version #1811).
time to draw one terminal page
This depends on the used renderer, their speed is Webgl > canvas > DOM, where the DOM renderer will take 5 to 10 times longer than the Webgl renderer.
GC
If it pops up it easily takes many milliseconds to finish. We cannot control it by any means, but the transition to the TypedArray buffer already reduced the GC pressure by a great amount (dropped from ~20% to <3% runtime during input flow).

Input flow hiccups occur whenever the sum of the 3 factors differ to much from the average sum for one rendered frame. I phrased it this vagely because users' perception works more with differences than absolute numbers - means a constantly low framerate is typically seen as a better experience than a very high framerate with random hiccups. Its all about difference, the smoother the flow the better.
We currently work on making all 3 factors less time consuming, thus the framerate will go up (I have a playground branch which shows 40-60 fps without patching Terminal._innerWrite). But the faster we make the average fps the more obvious will be a slightly longer taking single frame - a hiccup. This can happen when either the input chunk is much bigger than average and/or a GC event occurs. The render time is typically pretty stable. Furthermore the exp will go downhill with a less powerful machine (devtools has this great feature to simulate a 4x slower CPU, if set the input starts to "flicker").

To sum this up: I think we need a tiny benchmark that tracks runtimes of typical input data chunks. Based on that data we then can adjust the chunks size to keep the fps at a certain level. We would have to leave some room for "random" GC events, something like this:

// some fps
// note to guarantee a certain fps this has to be doubled due to sampling theorem
FPS = 60;
RENDER_TIME = 2; // in msec, depends on renderer
DATA_THROUGHPUT = 20000; // in bytes/ms, benchmarked
GC_GAP = 5; // in msec, some room for GC
...
time_to_spend = 1000 / FPS;  // in msec
max_chunk_size = (time_to_spend - GC_GAP - RENDER_TIME) * DATA_THROUGHPUT;
// now fix chunk size and parse
parse(chunk.slice(0, max_chunk_size);

Now what happens if the incoming data chunks are rather small and many? Yeah this is bigger problem than it might seem - its actually what yes does (due its small chunks and heavy context switches its a good burn in test for any data pumping and other emulators as well). With a logic like above the next small data chunk handling competes with the renderer in a weird way - in the JS event loop. They will delay each other which leads to poor perf over all. Therefore the incoming chunks should be aggregated to some sane sizes as well, and better done beforehand (not within the critical event loop, see #1744). Another way to deal with this would be to offload the input handling into a webworker.

Tyriar

Thanks for the PR, I've been wanting this for a while. We may need to hold off on merging for a bit as it will probably mess up all our test numbers for the big PRs but i definitely want this to go in 😃

src/Terminal.ts

Tyriar · 2018-12-09T15:58:32Z

src/Terminal.ts

-      const data = writeBatch.shift();
+    const time = Date.now();
+    while (this.writeBuffer.length > 0) {
+      const data = this.writeBuffer.shift();


When #1796 goes in we can tweak this further to avoid the shift call (pass in a code(s) or start/offset).

src/Terminal.ts

jerch · 2018-12-09T22:09:27Z

👍 for holding it off abit until we are done with the big PRs.

Imho we should also establish some way to split data chunks to avoid the hiccup problem as indicated above. Currently a really big chunk would simply "stall" the terminal.

juancampa · 2018-12-09T23:09:33Z

Imho we should also establish some way to split data chunks to avoid the hiccup problem as indicated above. Currently a really big chunk would simply "stall" the terminal.

Agreed. Big chunks are uninterruptible. We should probably tackle that in a follow-up PR.

juancampa · 2018-12-09T23:16:45Z

Also, @jerch thanks for the detailed explanation! I agree with your proposal. This solution is not final as it doesn't achieve the desired FPS perfectly, but it does take us most of the way there. I'm down for refining it as a follow-up task.

Tyriar · 2018-12-10T14:55:31Z

Currently a really big chunk would simply "stall" the terminal.

@jerch can you clarify what you mean here? Doesn't this PR solve this provided rendering is fast enough?

jerch · 2018-12-10T15:04:02Z

@Tyriar
Sure. Currently the measurement is done on the number of chunks which leads to a good estimate when the chunks dont differ to much in the size. The input runtime though scales with bytes, thus the sum of chunk sizes worked on. If there is one really big chunk the input handling will take much longer - a hiccup.

We can easily avoid this by inspecting the chunks sizes and do the estimate based on that instead of the number of chunks.

juancampa · 2018-12-10T19:47:22Z

Here's a screenshot that illustrate how a single call to parse can take a long time (in this case 19ms, already too much for 60FPS). I'm gonna try the mini-benchmark approach where we measure how long it's taking to process, say, 1K, and then call pass with ~16ms worth of bytes at the most.

jerch · 2018-12-10T19:54:12Z

@juancampa I had a playground branch in summer that did the mini benchmark approach with good results (was not consuming much of its own). It basically shimmed in from time to time and tracked the runtime for the bunch of chunks at hand and created a throughtput number from it. Only tricky thing was the time resolution of the browser due to spectre, most chunks were to short/quickly handled to get a reliable number.
The rest was like the stuff above, but without the GC gap.

juancampa · 2018-12-10T20:01:46Z

@jerch Mind sharing it? :)

jerch · 2018-12-11T20:44:04Z

@juancampa Sorray cant find it anymore, was around the testbed PRs (#1528, #1529, #1530) but did not ended up in one of em.

jerch · 2018-12-15T01:29:37Z

@juancampa
Slightly offtopic: Since you come from hyper - you might want to check whether hyper would benefit from a tiny buffer between pty and browser engine. We had the same problem in the demo app, the event loop got bombed by tons of small chunks by the websocket and made testing totally unreliable. #1744 solved it.

(Hint: It will double the throughput speed of hyper, but fps will go down slightly without a big chunk fix as proposed above.)

juancampa · 2018-12-15T02:39:46Z

I smiled while reading this because It's *exactly* the changes I have on my branch. I'm buffering for 16ms (at most 200k) before sending it to the renderer process (I tested multiple values and these seemed to produce good results on my laptop). Glad to have your validation. A system that benchmarks and dynamically adjusts these values might be, again, a follow up improvement.

…

On Fri, Dec 14, 2018, 8:29 PM jerch ***@***.*** wrote: @juancampa <https://github.com/juancampa> Slightly offtopic: Since you come from hyper - you might want to check whether hyper would benefit from a tiny buffer between pty and browser engine. We had the same problem in the demo app, the event loop got bombed by tons of small chunks by the websocket and made testing totally unreliable. #1744 <#1744> solved it. (Hint: It will double the throughput speed of hyper, but fps will go down slightly without a big chunk fix as proposed above.) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1818 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABWF2K2UuRfeNgx155sjTmJFGNxH5Vvuks5u5FCEgaJpZM4ZJfU6> .

…limit in Terminal._innerWrite

Tyriar

Most of the buffer-related changes are in and WebGL will probably be an addon when it gets merged (which we can re-test), I say let's push this in for 3.12.0 🎉

Tyriar · 2019-03-04T18:52:25Z

Created #1955 to follow up the removal of .shift. Thanks again @juancampa 👌

juancampa changed the title ~~Use a time-based limit to Terminal._innerWrite~~ Use a time-based limit in Terminal._innerWrite Dec 8, 2018

Tyriar requested changes Dec 9, 2018

View reviewed changes

Removing redundant condition. Clearer variable name

3d2ae2b

Merge branch 'master' into perf3

dc5c238

This was referenced Dec 17, 2018

Performance - Data batching vercel/hyper#3336

Merged

RFC: Generate back-pressure to keep a steady framerate vercel/hyper#3339

Open

Merge branch 'master' into perf3

60bc1bb

juancampa added a commit to vercel/hyper that referenced this pull request Dec 29, 2018

Upgrading xterm version to include xtermjs/xterm.js#1818 (time-based …

3c3ea73

…limit in Terminal._innerWrite

juancampa mentioned this pull request Jan 4, 2019

Time-based limit for Terminal._innerWrite vercel/xterm.js#4

Merged

Tyriar added 2 commits January 26, 2019 13:20

Merge branch 'master' into perf3

7e3b315

Merge branch 'master' into perf3

55f3aee

Tyriar approved these changes Mar 4, 2019

View reviewed changes

Tyriar merged commit ce1c45f into xtermjs:master Mar 4, 2019

Tyriar mentioned this pull request Mar 4, 2019

Avoid shift call in Terminal._innerWrite #1955

Closed

Tyriar added this to the 3.12.0 milestone Mar 5, 2019

Tyriar mentioned this pull request Mar 5, 2019

whole system freezes after running an infinite loop in Ruby on VS Code, using Ubuntu 18.04 microsoft/vscode#68885

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a time-based limit in Terminal._innerWrite #1818

Use a time-based limit in Terminal._innerWrite #1818

juancampa commented Dec 8, 2018

juancampa commented Dec 8, 2018

jerch commented Dec 8, 2018

Tyriar left a comment

Tyriar Dec 9, 2018

jerch commented Dec 9, 2018

juancampa commented Dec 9, 2018

juancampa commented Dec 9, 2018

Tyriar commented Dec 10, 2018

jerch commented Dec 10, 2018

juancampa commented Dec 10, 2018

jerch commented Dec 10, 2018

juancampa commented Dec 10, 2018

jerch commented Dec 11, 2018

jerch commented Dec 15, 2018

juancampa commented Dec 15, 2018 via email

Tyriar left a comment

Tyriar commented Mar 4, 2019

Use a time-based limit in Terminal._innerWrite #1818

Use a time-based limit in Terminal._innerWrite #1818

Conversation

juancampa commented Dec 8, 2018

Problem

Solution

Tests run

Caveats

juancampa commented Dec 8, 2018

jerch commented Dec 8, 2018

Tyriar left a comment

Choose a reason for hiding this comment

Tyriar Dec 9, 2018

Choose a reason for hiding this comment

jerch commented Dec 9, 2018

juancampa commented Dec 9, 2018

juancampa commented Dec 9, 2018

Tyriar commented Dec 10, 2018

jerch commented Dec 10, 2018

juancampa commented Dec 10, 2018

jerch commented Dec 10, 2018

juancampa commented Dec 10, 2018

jerch commented Dec 11, 2018

jerch commented Dec 15, 2018

juancampa commented Dec 15, 2018 via email

Tyriar left a comment

Choose a reason for hiding this comment

Tyriar commented Mar 4, 2019