Perf: save text buffer size for frequent access #3351

skyline75489 · 2019-10-28T04:00:02Z

Summary of the Pull Request

Save buffer size as a property. Update it when necessary.

References

PR Checklist

Closes #xxx
CLA signed. If not, go over here and sign the CLA
Tests added/passed
Requires documentation to be updated
I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx

Detailed Description of the Pull Request / Additional comments

The GetSize in text buffer is called intensively across the entire code base. The buffer size is frequently accessed but seldom modified. This is an attempt to make use of that.

Validation Steps Performed

zadjii-msft

This seems safe to me. IIRC, we don't ever really resize a single TextBuffer, we usually just make a new one, and copy the contents over. So this seems like it'll work to me.

In fact, we probably don't even need _UpdateSize at all, since we've only ever got a rectangular buffer, every row should be the same width. Thoughts 🤔?

As much as I'm usually in favor of having less redundant copies of information laying around, this seems like a scenario where caching this information would be beneficial. Plus, I won't have to drill 3 layers past the buffer in the debugger anymore to check it's size :P

skyline75489 · 2019-10-28T13:15:44Z

The static analysis is killing me. This noexcept propagation is lika a virus.

skyline75489 · 2019-10-28T13:41:37Z

Well the good thing is that, this insane noexcept propagation illustrates how widely GetSize is used and how much this PR can help simplify the code.

skyline75489 · 2019-10-28T14:15:48Z

Is there a way to cast function pointer with noexcept to one without noexcept? The code inside UiaTextRangeBase::Move is using a function pointer that I can't cast to the proper type:

    auto moveFunc = &_moveByDocument;
    if (unit == TextUnit::TextUnit_Character)
    {
        moveFunc = &_moveByCharacter;
    }
    else if (unit <= TextUnit::TextUnit_Line)
    {
        moveFunc = &_moveByLine;
    }

Here _moveByDocument should be annotated with noexcept but the _moveByCharacter cannot be annotated with noexcept because it may throw exceptions.

skyline75489 · 2019-10-29T05:08:35Z

src/types/UiaTextRangeBase.cpp

@@ -579,7 +579,7 @@ IFACEMETHODIMP UiaTextRangeBase::Move(_In_ TextUnit unit,
    _outputRowConversions();
 #endif

-    auto moveFunc = &_moveByDocument;
+    std::pair<unsigned, unsigned> (*moveFunc)(gsl::not_null<IUiaData*>, int, MoveState, gsl::not_null<int*>) = &_moveByDocument;


OK I have no idea it would end up like this. If anyone has better way to do this, please enlighten me.

miniksa · 2019-10-31T17:12:31Z

This seems safe to me. IIRC, we don't ever really resize a single TextBuffer, we usually just make a new one, and copy the contents over. So this seems like it'll work to me.

TextBuffer::ResizeTraditional does resize it in place.

As much as I'm usually in favor of having less redundant copies of information laying around, this seems like a scenario where caching this information would be beneficial. Plus, I won't have to drill 3 layers past the buffer in the debugger anymore to check it's size :P

We used to have 12 copies of the buffer size that got out of sync. I very intentionally didn't store it in a property here, so I'm not totally happy with this PR overall.

@skyline75489, do you have a performance test that shows this change is actually resulting in a performance improvement? A WPR trace? Or is this "by inspection" only?

@zadjii-msft, if it's hard for you to see it in the debugger, we should file an issue to add a .natvis rule for it, not store another copy.

I would like this PR blocked until I see actual proof of the performance gain. We have been burned SO HARD in the past by storing the size somewhere that isn't the inherent structural size that I really do NOT want to do this unless justified fully.

skyline75489 · 2019-11-01T00:31:28Z

@miniksa The GetSize implementation originally caught my eye because it shows up about 4% of CPU usage in profiling result. But I think you could say that it's not a obvious performance penalty. I'm OK to leave it like this for a better code structure.

Speaking of profiling, my previous PRs are also based on profiling result. Recently #3262 is filed. Just like this PR, it aims to reduce unnecessary vector access.

For safety reason, vectors and deques are used across the entire codebase. But the random access performance is not good enough, especially under very intensive calls. The trade-off, though, is still reasonable to me. I'd also prefer more PRs like #3262 that reduces the access, rather than reduing the usage of stl containers.

miniksa · 2019-11-01T17:28:13Z

@miniksa Michael Niksa FTE The GetSize implementation originally caught my eye because it shows up about 4% of CPU usage in profiling result. But I think you could say that it's not a obvious performance penalty. I'm OK to leave it like this for a better code structure.

Can you show me the profiling results here? Like the hot frames off the stack or whatnot? Or attach the ETW traces?

If it's 4% penalty, then maybe we should do this and just make sure it's well-guarded inside the TextBuffer. It feels worth the additional extra memory to gain that much CPU back. I'm just overly paranoid of how many copies of size there were in the past that I went full-force into not keeping any copies. This probably justifies the 1 extra copy.

Speaking of profiling, my previous PRs are also based on profiling result. Recently #3262 is filed. Just like this PR, it aims to reduce unnecessary vector access.

For safety reason, vectors and deques are used across the entire codebase. But the random access performance is not good enough, especially under very intensive calls. The trade-off, though, is still reasonable to me. I'd also prefer more PRs like #3262 that reduces the access, rather than reducing the usage of stl containers.

std::vector should be fine for random access performance. They should be contiguous. If we're having random access performance issues, I do agree that fewer calls = better, but we should also reconsider using a structure that can improve the call performance as well. We could use gsl::span<T>s or std::basic_string_view<T> over a memory space to get most of the safety benefits of the standard collections. (We could write a support collection of our own in the Types library that abstracts these details).

If you are seeing massive random access problems, please show me the proof and let me know so we can consider.

miniksa · 2019-11-01T17:29:05Z

Also, it was my understanding that the .size() property of the STL structures didn't require calculation.... it should just be there statically as a part of the structure overhead. So I'm somewhat surprised there's a penalty.

skyline75489 · 2019-11-02T03:52:05Z

@miniksa There you go, the screenshot.

Also, it was my understanding that the .size() property of the STL structures didn't require calculation.... it should just be there statically as a part of the structure overhead. So I'm somewhat surprised there's a penalty.

Come to think about it, perhaps you are right. It could be that the super intensive access that causes the CPU usage. For example in #3262 the _storage.at call may not be very expensive, but InsertAttrRuns is called very frequently.

My guess it that, GetSize is called from various places and for many many times, which makes slightly performance drawback of containers magnified to a visible 3.42% usage.

skyline75489 · 2019-11-04T05:04:31Z

With this patch the CPU usage dropped to somewhere around 0.10%.

skyline75489 · 2019-11-04T05:14:14Z

I'm using the Debug build for profiling. What I did was simply just cating a very long file (/etc/services in my case). With Release build the CPU usage may not be this high since the compiler would optimize C++ code way more aggresively.

If with Release build the usage drops to a noise level then maybe it's not worth the effort to go through this patching thing.

DHowett-MSFT · 2019-11-04T05:16:14Z

Oh heck yeah, the STL is absolutely loaded with runtime checks in debug builds. I wouldn't use debug builds for any type of perf-sensitive measurements, especially where the STL is involved 😄

skyline75489 · 2019-11-04T05:20:59Z

@DHowett-MSFT That's a relief. The noexcept nightmare still haunts me, believe it or not.

miniksa · 2019-11-04T17:12:28Z

So @skyline75489, is there still a dramatic difference between the before-your-fix and after-your-fix with a Release build?

Also, sorry about the noexcept thing. I experienced that too when I was setting up the static analysis.

skyline75489 · 2019-11-04T22:38:49Z

@miniksa The performance differences doses drop to a noise level with a Release build. I’m gonna close this for now. I’m just really glad the noexcept things will not be merged.

beviu · 2020-06-13T19:20:10Z

@skyline75489 @miniksa

I am testing this on 19fcbce, but also with the patches from #6493 and #6492 applied, while priting a file with a lot of text and colors, in Release mode.

Before (it's using 7% of the CPU):

After (it's gone):

miniksa · 2020-06-13T19:36:05Z

Yes, we need to revisit this. This is another of the things I saw while working on #6483 on Friday. It's harder to see because it happens in multiple places and I'm really not sure why it's such a pig at pulling these out of that function, but we should bring this back and get it gone.

skyline75489 · 2020-06-16T10:14:37Z

It's been too long since this PR is born. I can not find my local branch anymore. I'm still fighting my techinical issues with my PC and I'm not able to offer help for now. (sorry about that 😢 )

miniksa · 2020-06-16T15:27:18Z

It's been too long since this PR is born. I can not find my local branch anymore. I'm still fighting my techinical issues with my PC and I'm not able to offer help for now. (sorry about that 😢 )

No problem, I'll make my own version. Thanks!

skyline75489 force-pushed the fix/do-not-calculate-size branch 3 times, most recently from dd10a15 to ca02845 Compare October 28, 2019 05:10

skyline75489 changed the title ~~Perf: save text buffer size to prevent frequent vector access~~ Perf: save text buffer size for frequent access Oct 28, 2019

skyline75489 force-pushed the fix/do-not-calculate-size branch 3 times, most recently from aa791b0 to b2fe70d Compare October 28, 2019 13:08

zadjii-msft reviewed Oct 28, 2019

View reviewed changes

zadjii-msft requested review from DHowett-MSFT, miniksa and carlos-zamora October 28, 2019 13:11

skyline75489 force-pushed the fix/do-not-calculate-size branch 2 times, most recently from 8d2a3a2 to c6f12ff Compare October 28, 2019 13:38

skyline75489 force-pushed the fix/do-not-calculate-size branch from c6f12ff to e7b1cc5 Compare October 28, 2019 13:57

skyline75489 force-pushed the fix/do-not-calculate-size branch from e7b1cc5 to b890846 Compare October 29, 2019 02:45

Perf: save text buffer size for frequent access

ef7df97

skyline75489 force-pushed the fix/do-not-calculate-size branch from b890846 to ef7df97 Compare October 29, 2019 03:01

skyline75489 commented Oct 29, 2019

View reviewed changes

skyline75489 closed this Nov 4, 2019

skyline75489 deleted the fix/do-not-calculate-size branch November 5, 2019 03:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf: save text buffer size for frequent access #3351

Perf: save text buffer size for frequent access #3351

skyline75489 commented Oct 28, 2019

zadjii-msft left a comment

skyline75489 commented Oct 28, 2019

skyline75489 commented Oct 28, 2019

skyline75489 commented Oct 28, 2019

skyline75489 Oct 29, 2019

miniksa commented Oct 31, 2019

skyline75489 commented Nov 1, 2019

miniksa commented Nov 1, 2019

miniksa commented Nov 1, 2019

skyline75489 commented Nov 2, 2019

skyline75489 commented Nov 4, 2019

skyline75489 commented Nov 4, 2019

DHowett-MSFT commented Nov 4, 2019

skyline75489 commented Nov 4, 2019

miniksa commented Nov 4, 2019

skyline75489 commented Nov 4, 2019

beviu commented Jun 13, 2020

miniksa commented Jun 13, 2020

skyline75489 commented Jun 16, 2020

miniksa commented Jun 16, 2020

Perf: save text buffer size for frequent access #3351

Perf: save text buffer size for frequent access #3351

Conversation

skyline75489 commented Oct 28, 2019

Summary of the Pull Request

References

PR Checklist

Detailed Description of the Pull Request / Additional comments

Validation Steps Performed

zadjii-msft left a comment

Choose a reason for hiding this comment

skyline75489 commented Oct 28, 2019

skyline75489 commented Oct 28, 2019

skyline75489 commented Oct 28, 2019

skyline75489 Oct 29, 2019

Choose a reason for hiding this comment

miniksa commented Oct 31, 2019

skyline75489 commented Nov 1, 2019

miniksa commented Nov 1, 2019

miniksa commented Nov 1, 2019

skyline75489 commented Nov 2, 2019

skyline75489 commented Nov 4, 2019

skyline75489 commented Nov 4, 2019

DHowett-MSFT commented Nov 4, 2019

skyline75489 commented Nov 4, 2019

miniksa commented Nov 4, 2019

skyline75489 commented Nov 4, 2019

beviu commented Jun 13, 2020

miniksa commented Jun 13, 2020

skyline75489 commented Jun 16, 2020

miniksa commented Jun 16, 2020