Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf: save text buffer size for frequent access #3351

Closed

Conversation

skyline75489
Copy link
Collaborator

Summary of the Pull Request

Save buffer size as a property. Update it when necessary.

References

PR Checklist

  • Closes #xxx
  • CLA signed. If not, go over here and sign the CLA
  • Tests added/passed
  • Requires documentation to be updated
  • I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx

Detailed Description of the Pull Request / Additional comments

The GetSize in text buffer is called intensively across the entire code base. The buffer size is frequently accessed but seldom modified. This is an attempt to make use of that.

Validation Steps Performed

@skyline75489 skyline75489 force-pushed the fix/do-not-calculate-size branch 3 times, most recently from dd10a15 to ca02845 Compare October 28, 2019 05:10
@skyline75489 skyline75489 changed the title Perf: save text buffer size to prevent frequent vector access Perf: save text buffer size for frequent access Oct 28, 2019
@skyline75489 skyline75489 force-pushed the fix/do-not-calculate-size branch 3 times, most recently from aa791b0 to b2fe70d Compare October 28, 2019 13:08
Copy link
Member

@zadjii-msft zadjii-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems safe to me. IIRC, we don't ever really resize a single TextBuffer, we usually just make a new one, and copy the contents over. So this seems like it'll work to me.

In fact, we probably don't even need _UpdateSize at all, since we've only ever got a rectangular buffer, every row should be the same width. Thoughts 🤔?

As much as I'm usually in favor of having less redundant copies of information laying around, this seems like a scenario where caching this information would be beneficial. Plus, I won't have to drill 3 layers past the buffer in the debugger anymore to check it's size :P

@skyline75489
Copy link
Collaborator Author

The static analysis is killing me. This noexcept propagation is lika a virus.

@skyline75489 skyline75489 force-pushed the fix/do-not-calculate-size branch 2 times, most recently from 8d2a3a2 to c6f12ff Compare October 28, 2019 13:38
@skyline75489
Copy link
Collaborator Author

Well the good thing is that, this insane noexcept propagation illustrates how widely GetSize is used and how much this PR can help simplify the code.

@skyline75489
Copy link
Collaborator Author

Is there a way to cast function pointer with noexcept to one without noexcept? The code inside UiaTextRangeBase::Move is using a function pointer that I can't cast to the proper type:

    auto moveFunc = &_moveByDocument;
    if (unit == TextUnit::TextUnit_Character)
    {
        moveFunc = &_moveByCharacter;
    }
    else if (unit <= TextUnit::TextUnit_Line)
    {
        moveFunc = &_moveByLine;
    }

Here _moveByDocument should be annotated with noexcept but the _moveByCharacter cannot be annotated with noexcept because it may throw exceptions.

@@ -579,7 +579,7 @@ IFACEMETHODIMP UiaTextRangeBase::Move(_In_ TextUnit unit,
_outputRowConversions();
#endif

auto moveFunc = &_moveByDocument;
std::pair<unsigned, unsigned> (*moveFunc)(gsl::not_null<IUiaData*>, int, MoveState, gsl::not_null<int*>) = &_moveByDocument;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I have no idea it would end up like this. If anyone has better way to do this, please enlighten me.

@miniksa
Copy link
Member

miniksa commented Oct 31, 2019

This seems safe to me. IIRC, we don't ever really resize a single TextBuffer, we usually just make a new one, and copy the contents over. So this seems like it'll work to me.

TextBuffer::ResizeTraditional does resize it in place.

As much as I'm usually in favor of having less redundant copies of information laying around, this seems like a scenario where caching this information would be beneficial. Plus, I won't have to drill 3 layers past the buffer in the debugger anymore to check it's size :P

We used to have 12 copies of the buffer size that got out of sync. I very intentionally didn't store it in a property here, so I'm not totally happy with this PR overall.

@skyline75489, do you have a performance test that shows this change is actually resulting in a performance improvement? A WPR trace? Or is this "by inspection" only?

@zadjii-msft, if it's hard for you to see it in the debugger, we should file an issue to add a .natvis rule for it, not store another copy.

I would like this PR blocked until I see actual proof of the performance gain. We have been burned SO HARD in the past by storing the size somewhere that isn't the inherent structural size that I really do NOT want to do this unless justified fully.

@skyline75489
Copy link
Collaborator Author

@miniksa The GetSize implementation originally caught my eye because it shows up about 4% of CPU usage in profiling result. But I think you could say that it's not a obvious performance penalty. I'm OK to leave it like this for a better code structure.

Speaking of profiling, my previous PRs are also based on profiling result. Recently #3262 is filed. Just like this PR, it aims to reduce unnecessary vector access.

For safety reason, vectors and deques are used across the entire codebase. But the random access performance is not good enough, especially under very intensive calls. The trade-off, though, is still reasonable to me. I'd also prefer more PRs like #3262 that reduces the access, rather than reduing the usage of stl containers.

@miniksa
Copy link
Member

miniksa commented Nov 1, 2019

@miniksa Michael Niksa FTE The GetSize implementation originally caught my eye because it shows up about 4% of CPU usage in profiling result. But I think you could say that it's not a obvious performance penalty. I'm OK to leave it like this for a better code structure.

Can you show me the profiling results here? Like the hot frames off the stack or whatnot? Or attach the ETW traces?

If it's 4% penalty, then maybe we should do this and just make sure it's well-guarded inside the TextBuffer. It feels worth the additional extra memory to gain that much CPU back. I'm just overly paranoid of how many copies of size there were in the past that I went full-force into not keeping any copies. This probably justifies the 1 extra copy.

Speaking of profiling, my previous PRs are also based on profiling result. Recently #3262 is filed. Just like this PR, it aims to reduce unnecessary vector access.

For safety reason, vectors and deques are used across the entire codebase. But the random access performance is not good enough, especially under very intensive calls. The trade-off, though, is still reasonable to me. I'd also prefer more PRs like #3262 that reduces the access, rather than reducing the usage of stl containers.

std::vector should be fine for random access performance. They should be contiguous. If we're having random access performance issues, I do agree that fewer calls = better, but we should also reconsider using a structure that can improve the call performance as well. We could use gsl::span<T>s or std::basic_string_view<T> over a memory space to get most of the safety benefits of the standard collections. (We could write a support collection of our own in the Types library that abstracts these details).

If you are seeing massive random access problems, please show me the proof and let me know so we can consider.

@miniksa
Copy link
Member

miniksa commented Nov 1, 2019

Also, it was my understanding that the .size() property of the STL structures didn't require calculation.... it should just be there statically as a part of the structure overhead. So I'm somewhat surprised there's a penalty.

@skyline75489
Copy link
Collaborator Author

@miniksa There you go, the screenshot.

screenshot

Also, it was my understanding that the .size() property of the STL structures didn't require calculation.... it should just be there statically as a part of the structure overhead. So I'm somewhat surprised there's a penalty.

Come to think about it, perhaps you are right. It could be that the super intensive access that causes the CPU usage. For example in #3262 the _storage.at call may not be very expensive, but InsertAttrRuns is called very frequently.

My guess it that, GetSize is called from various places and for many many times, which makes slightly performance drawback of containers magnified to a visible 3.42% usage.

@skyline75489
Copy link
Collaborator Author

With this patch the CPU usage dropped to somewhere around 0.10%.

图片

@skyline75489
Copy link
Collaborator Author

I'm using the Debug build for profiling. What I did was simply just cating a very long file (/etc/services in my case). With Release build the CPU usage may not be this high since the compiler would optimize C++ code way more aggresively.

If with Release build the usage drops to a noise level then maybe it's not worth the effort to go through this patching thing.

@DHowett-MSFT
Copy link
Contributor

Oh heck yeah, the STL is absolutely loaded with runtime checks in debug builds. I wouldn't use debug builds for any type of perf-sensitive measurements, especially where the STL is involved 😄

@skyline75489
Copy link
Collaborator Author

@DHowett-MSFT That's a relief. The noexcept nightmare still haunts me, believe it or not.

@miniksa
Copy link
Member

miniksa commented Nov 4, 2019

So @skyline75489, is there still a dramatic difference between the before-your-fix and after-your-fix with a Release build?

Also, sorry about the noexcept thing. I experienced that too when I was setting up the static analysis.

@skyline75489
Copy link
Collaborator Author

@miniksa The performance differences doses drop to a noise level with a Release build. I’m gonna close this for now. I’m just really glad the noexcept things will not be merged.

@skyline75489 skyline75489 deleted the fix/do-not-calculate-size branch November 5, 2019 03:17
@beviu
Copy link
Contributor

beviu commented Jun 13, 2020

@skyline75489 @miniksa

I am testing this on 19fcbce, but also with the patches from #6493 and #6492 applied, while priting a file with a lot of text and colors, in Release mode.

Before (it's using 7% of the CPU):
image

After (it's gone):
image

@miniksa
Copy link
Member

miniksa commented Jun 13, 2020

Yes, we need to revisit this. This is another of the things I saw while working on #6483 on Friday. It's harder to see because it happens in multiple places and I'm really not sure why it's such a pig at pulling these out of that function, but we should bring this back and get it gone.

@skyline75489
Copy link
Collaborator Author

It's been too long since this PR is born. I can not find my local branch anymore. I'm still fighting my techinical issues with my PC and I'm not able to offer help for now. (sorry about that 😢 )

@miniksa
Copy link
Member

miniksa commented Jun 16, 2020

It's been too long since this PR is born. I can not find my local branch anymore. I'm still fighting my techinical issues with my PC and I'm not able to offer help for now. (sorry about that 😢 )

No problem, I'll make my own version. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants