Vectorize ROW initialization #15501

lhecker · 2023-06-02T01:41:44Z

Performance of printing enwik8.txt at the following block sizes:
4KiB (printf): 51MB/s -> 54MB/s
128KiB (cat): 92MB/s -> 103MB/s

Validation Steps Performed

Rows are properly filled with whitespace at various
window sizes as observed under a debugger ✅

src/buffer/out/Row.cpp

DHowett · 2023-06-05T18:56:04Z

src/buffer/out/Row.hpp

+    // implement Reset() efficiently via SIMD and the latter is used to store the past-the-end offset
+    // into the `charsBuffer`. Even though the `charsBuffer` could be only `rowWidth` large we need them
+    // to be the same size so that the SIMD code can process both arrays in the same loop simultaneously.
+    // This wastes up to 5.8% memory but increases overall scrolling performance by around 40%.


Aye. SIMD is free real estate in our CPUs - Let's use it.

DHowett · 2023-06-09T00:08:04Z

Why does vectorization improve performance printing text so much? I thought that all rows were eagerly initialized before your recent work.

Is this literally all from reinitializing individual rows as we recycle/circle them?

lhecker · 2023-06-09T00:24:55Z

Is this literally all from reinitializing individual rows as we recycle/circle them?

Yes, pretty much. It reduces the row initialization cost from around 80ns per 120 columns down to 5ns. OpenConsole with all these recent changes included processes something around 1.7M rows per second, so that's why it has such a big impact. 1.7M sounds like a lot, but it runs fairly close to spending an entire millisecond just initializing the text buffer on startup, whereas this new code won't even really show up in perf traces anymore. (And that other PR will make it a non-issue.)
Not clearing rows at all before writing them would also be nice, for instance by maintaining the "end of the row" column, and simply leaving the rest of the row as uninitialized memory. But I believe that doing this in a robust way is a long way out whereas this was fairly easy to implement, tune and test within about an hour.

DHowett · 2023-06-12T17:55:34Z

src/buffer/out/Row.cpp

+
+    // Fills _charsBuffer with whitespace and correspondingly _charOffsets
+    // with successive numbers from 0 to _columnCount+1.
+#if defined(TIL_SSE_INTRINSICS)


fwiw nobody ever sets this to true?

That is part of #15498. I can pull that change into this branch and rebase it on main so we can merge it immediately.

I'd love that! I'm personally OK merging these out of order, even if it means that the numbers in the perf discussion part are incorrect.

DHowett · 2023-06-13T18:04:09Z

src/buffer/out/Row.cpp

+            } while (chars < charsEndLoop);
+        }
+
+        _mm256_storeu_si256(reinterpret_cast<__m256i*>(charsEndLoop), whitespace);


so wait, this will write up to 15 things off the end of the buffer? and there's no risk that this is going to stomp anything important?

Like, if the buffer is 17 columns wide... the char offsets buffer starts at alignment 16 from the end of the chars buffer, and the next ROW starts at alignment 16 from the end of the char offsets buffer?

It writes up to 15 bytes off the end of the buffer, which at a granularity of wchar_t is up to 7 items. It won't stomp anything due to our alignment guarantees in the buffer, which ensures that all buffers start at a 16-byte aligned offset and end on one. If we ever determine that this alignment is unneeded for our performance goals, there's a few techniques we can use to avoid writing outside of the buffer, the most common being that you write the remaining N items in a simple for loop.

lhecker · 2023-06-13T23:36:06Z

It was a bit of a messy rebase, but I think it's ready now.

DHowett · 2023-06-14T19:54:17Z

src/buffer/out/Row.cpp

+            offsets = _mm_add_epi16(offsets, increment);
+            chars += 8;
+            charOffsets += 8;
+            // If _columnCount is something like 120, the actual backing buffer for charOffsets is 121 items large.


see, i guess this is the part that scares me. every time we talk about the width of the backing buffers we're like, "YUP it's always +1" when in truth it is up to +16 or +32 or something.

I could add an "at least" in there when I merge main in. (It's only up to +8 btw.)

lhecker added Product-Conhost For issues in the Console codebase Area-Performance Performance-related issue labels Jun 2, 2023

lhecker force-pushed the dev/lhecker/vt-perf5 branch from 6f1e61e to 9fbe3a6 Compare June 3, 2023 13:04

This comment has been minimized.

Sign in to view

lhecker force-pushed the dev/lhecker/vt-perf5 branch 2 times, most recently from b1321d2 to a919562 Compare June 5, 2023 16:38

lhecker changed the title ~~Initialize rows lazily~~ Vectorize ROW initialization Jun 5, 2023

This comment has been minimized.

Sign in to view

lhecker force-pushed the dev/lhecker/vt-perf5 branch from a919562 to b5c5804 Compare June 5, 2023 16:40

carlos-zamora approved these changes Jun 5, 2023

View reviewed changes

DHowett reviewed Jun 5, 2023

View reviewed changes

lhecker mentioned this pull request Jun 8, 2023

Add support for Erase Color Mode (DECECM) #15469

Merged

DHowett reviewed Jun 12, 2023

View reviewed changes

DHowett reviewed Jun 13, 2023

View reviewed changes

lhecker changed the base branch from dev/lhecker/vt-perf4 to main June 13, 2023 23:16

lhecker force-pushed the dev/lhecker/vt-perf5 branch from e83ef45 to 1aba49d Compare June 13, 2023 23:23

This comment has been minimized.

Sign in to view

lhecker force-pushed the dev/lhecker/vt-perf5 branch from 1aba49d to 9deecc9 Compare June 13, 2023 23:29

This comment has been minimized.

Sign in to view

Vectorize ROW initialization

a9763f7

lhecker force-pushed the dev/lhecker/vt-perf5 branch from 9deecc9 to a9763f7 Compare June 13, 2023 23:33

lhecker marked this pull request as ready for review June 13, 2023 23:34

DHowett reviewed Jun 14, 2023

View reviewed changes

DHowett approved these changes Jun 14, 2023

View reviewed changes

Merge remote-tracking branch 'origin/main' into dev/lhecker/vt-perf5

e1f9c14

lhecker added the AutoMerge Marked for automatic merge by the bot when requirements are met label Jun 15, 2023

microsoft-github-policy-service bot enabled auto-merge (squash) June 15, 2023 14:09

microsoft-github-policy-service bot merged commit f3e2890 into main Jun 15, 2023

microsoft-github-policy-service bot deleted the dev/lhecker/vt-perf5 branch June 15, 2023 14:45

zadjii-msft mentioned this pull request Sep 5, 2023

Very slow rendering of colored text #4129

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize ROW initialization #15501

Vectorize ROW initialization #15501

lhecker commented Jun 2, 2023 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

DHowett Jun 5, 2023

lhecker Jun 5, 2023

DHowett commented Jun 9, 2023 •

edited

Loading

lhecker commented Jun 9, 2023 •

edited

Loading

DHowett Jun 12, 2023

lhecker Jun 12, 2023

DHowett Jun 13, 2023

DHowett Jun 13, 2023

lhecker Jun 13, 2023

This comment has been minimized.

This comment has been minimized.

lhecker commented Jun 13, 2023

DHowett Jun 14, 2023

lhecker Jun 14, 2023

Vectorize ROW initialization #15501

Vectorize ROW initialization #15501

Conversation

lhecker commented Jun 2, 2023 • edited Loading

Validation Steps Performed

This comment has been minimized.

This comment has been minimized.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DHowett commented Jun 9, 2023 • edited Loading

lhecker commented Jun 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment has been minimized.

This comment has been minimized.

lhecker commented Jun 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lhecker commented Jun 2, 2023 •

edited

Loading

DHowett commented Jun 9, 2023 •

edited

Loading

lhecker commented Jun 9, 2023 •

edited

Loading