Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PR 4071 resulted ~100ms perf regression (time to first idle) #4419

Closed
zhangyiatmicrosoft opened this issue Jul 17, 2024 · 11 comments · Fixed by #4599
Closed

PR 4071 resulted ~100ms perf regression (time to first idle) #4419

zhangyiatmicrosoft opened this issue Jul 17, 2024 · 11 comments · Fixed by #4599
Labels
need more info Further information is requested

Comments

@zhangyiatmicrosoft
Copy link
Contributor

In an effort to upgrade from MapLibre 4.1.2 to 4.3.2, while measuring time to first idle we observed roughly 100ms (or 2%) perf regression at 90 percentile.
After extensive search and experiments, we narrowed it down to PR 4071: #4071
The above data were based on 3 million user hits on Bing maps over the period of 14 days.

maplibre-gl-js version: 4.3.2

browser: edge/chrome/ff/safari

Tile to first idle is affected by many factors such as the tiles, styles and individual CPU/GPU, that is probably why benchmark testing in MapLibre was not able to catch.

We have eliminated at least two factors:

  1. Shader compilation: the time it takes to compile shader did not change much (less than 10 ms).
  2. Shader run time performance measured in FPS (frames per second) is not affected.

This PR is quite sophisticated. Many changes we do not understand and need help. Specifically, while the goal is to fix bugs related to variable-anchors and collision boxes, can it maintain the same performance when the above two conditions do not apply?

@kubapelc would you please help review how many more ops were added per symbol, and give some suggestions regarding how to cut them down as much as possible?

Thanks a lot!

@HarelM
Copy link
Collaborator

HarelM commented Jul 17, 2024

Thanks for the input @zhangyiatmicrosoft!
The PR is big no doubt, but we'll need a way to test these performace changes, otherwise we will not be able to locally develop and test, which will make the process of fixing this very long or even impossible.
I would also reccomend trying out the globe branch before we merge it if you want the performace to be kept similar to today.

Bettom line, we need a better way to test performace impact before we ship a version and not afterwards, otherwise you'll get stuck with trying to fix perfromance issues after we release a version...

@HarelM HarelM added the need more info Further information is requested label Jul 17, 2024
@zhangyiatmicrosoft
Copy link
Contributor Author

Thanks for the input @zhangyiatmicrosoft! The PR is big no doubt, but we'll need a way to test these performace changes, otherwise we will not be able to locally develop and test, which will make the process of fixing this very long or even impossible. I would also reccomend trying out the globe branch before we merge it if you want the performace to be kept similar to today.

Bettom line, we need a better way to test performace impact before we ship a version and not afterwards, otherwise you'll get stuck with trying to fix perfromance issues after we release a version...

TBH in every codebase, we struggle on how to "gate" the perf of each PR, because it is way too tricky and affected by so many factors at runtime, on top of the difficulties to statistically measure it.
This one in particular, I was not able to repo on any of dev machines and only observed after A/B testing of real-world users.

We need continue to seek for solutions for sure; meanwhile this request is not a complaint, just asking guidance and help to dissect the changes in PR 4071. THANKS!

@HarelM
Copy link
Collaborator

HarelM commented Jul 18, 2024

Can you clarify how we can help then?

@oberhamsi
Copy link
Contributor

oberhamsi commented Jul 18, 2024

Can you maybe produce/provide a (minimal?) style which triggers a strong performance degradation? That could help narrow it down

i also noticed the slowdown

@kubapelc
Copy link
Collaborator

Hi @zhangyiatmicrosoft I think the most impactful change in the PR in terms of ops per symbol is in collision_index.ts, especially the _projectCollisionBox function which computes accurate symbol collision boxes by projecting 8 point along the symbol's bounding box, and computing the AABB for those projected points. It projects 4 corners and 4 edge midpoints of the symbol's box, which can potentially be rotated or pitched, and in case of globe projection otherwise deformed.

Maybe a fast path can be added for the most common type of symbols? I would assume the default to be the most common, which is symbol-placement: point, text-rotation-alignment viewport (or auto) and text-pitch-alignment: viewport (or auto), and those are just a screenspace box, so that 8 point projection things isn't needed for them. Probably just the projected and translated anchor point is enough, and that is already computed in placeCollisionBox, see projectedPoint.

@zhangyiatmicrosoft
Copy link
Contributor Author

@kubapelc thanks for the tip. We found that the said the above roughly accounted for 50% of the hit. Is there anything else?

@kubapelc
Copy link
Collaborator

Hi @zhangyiatmicrosoft looking at the diffs and profiler data now, sadly I don't know where else to look. It seems that placeCollisionBox is still the culprit. But I am looking at profiler data without implementing the fast path I suggested.

What exactly did you measure to find the 50% figure? Have you implemented the fast path for common symbols?

@zhangyiatmicrosoft
Copy link
Contributor Author

zhangyiatmicrosoft commented Aug 12, 2024

Hi @zhangyiatmicrosoft looking at the diffs and profiler data now, sadly I don't know where else to look. It seems that placeCollisionBox is still the culprit. But I am looking at profiler data without implementing the fast path I suggested.

What exactly did you measure to find the 50% figure? Have you implemented the fast path for common symbols?

@kubapelc I have not implemented anything; just simply commented out _projectCollisionBox and did profiling test --- it won't render properly of course, but it is the first step.
Our profile test is conducted in lab environment and measure the time spent from page load to first map idle event. Each round is 25 repeated tests and take the mean value.

After comparing (1) before PR4071, (2) after 4071, and (3) 4071 with above changes, looks like (3) performed halfway between the first two.
Can you implement the fast path please? Draft PR is ok, and we can take it to go through more lab tests. Thanks

@zhangyiatmicrosoft
Copy link
Contributor Author

@HarelM @kubapelc any update please?
If 4071 is ported from Globe branch, it means we have perf issues there as well, and more important to fix it.

@kubapelc
Copy link
Collaborator

I'll probably get around to it next week. Globe branch definitely has this as well, and I also noticed other potential perf problems in its symbol changes. Globe will definitely have much worse time to first idle though, due to extra processing needed per tile, but only when globe projection is enabled.

@HarelM
Copy link
Collaborator

HarelM commented Aug 14, 2024

@zhangyiatmicrosoft since we have a hard time seeing the performance issues, I would advise you to take a more active approach in solving them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need more info Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants