-
Notifications
You must be signed in to change notification settings - Fork 1.3k
[core] Partially restore render performance #15231
[core] Partially restore render performance #15231
Conversation
…by source or if sourceless. Remove orchestrator property vector of layer propertes which was intended to avoid reallocation per frame, but the cost of resizing reduced frame rate by half, and even so, without per frame resizing is no faster than local instantiation per frame. Only evaluate zoom state on a sourceless layer after first verifying the layer is sourceless and visible.
f96e587
to
76f6fd4
Compare
@johnkassebaum Thanks for report and PR! We are looking into it at the moment. Our benchmarks do not show any regressions, therefore, performance drop could be related to the style being used. It would be helpful if you could provide more information, so that we can reproduce an issue locally. The best option would be to provide the style, if that is not possible, would be good to know how many sources and layers style has and how they interact while map is panned / zoomed (e.g., what are the expressions used for layer visibility, source zoom extents, etc.) Another possibility is that there is a custom layer whose overridden |
@johnkassebaum Thanks for your analysis and the PR! We've just made a patch, which moves shrinking of As for |
And, no, we have not overriden it anywhere. We are using the framework as is.
Our stylesheet defines: We also add 16 runtime In the measurements recorded above, I limited the visible sources and style layers to our most common use case:
Compared to:
As of release 5.1., the removal of the recomputation of |
It is totally understandable, however having a benchmark with a mock style sheet would prevent us from experiencing similar issues in future. Would it be an option to add a test to https://github.com/mapbox/mapbox-gl-native/blob/master/benchmark/api/render.benchmark.cpp that illustrates performance benefit of the proposed changes? Think the key point is just having multiple sources with large amount of layers. As I can see from the proposed changes, the mock style sheet should not depend on particular type of either layers or sources, therefore layers could be easily added using runtime style API in a cycle (as many as required). |
No promise on a timeline to provide you with tests, but I can assure you that your refactored code is causing a significant performance hit on iOS and, as reported elsewhere, on Android. In the meantime, besides removing the memory re-allocations within the render pass loop, perhaps you could investigate removing repeated layer lookup work that is more often than not unnecessary due to the layer not being associated with the source of the outer loop iteration. |
I also repeat the request for means of conducting and the results gathered on your own internal benchmarks (" Our benchmarks do not show any regressions") that showed no performance hit. The iOS API has no current way of accurately measuring frame rate. |
I triggered the bots for the PR. |
@johnkassebaum I made the following benchmark to try the proposed changes
Unfortunately, it did not show any significant performance changes. Could you please check whether this test shows performance benefit in your environment, or fix the test, so that it better simulates your style data.
I can assure you that we take performance and UI responsiveness very seriously and use various metrics to check it on all the supported platforms. However, the engine performance depends a lot on the tile data and styles being used and this might explain the difference of what we experience. |
Performance for the styles that contain lots of sources had been improved with #15756. Please feel free to rebase&reopen this PR if you've measurements showing that it still makes a performance benefit. |
On iOS, recent changes (since 4.10.0) to refactor
renderer_impl.cpp
into RenderTree and RenderOrchestrator have resulted in:The frame rate reduction has also been noted on Android #14902.
To detect the frame rate reduction on iOS, we added this code (per #14277) to our
MGLMapViewDelegate
:This approach reduces the cost of recording the measurement. Linked against a release build and run in Xcode until the buffer overflows,
timings
can be output in the console, copied, and analyzed to compute 999 frame durations. To assure that rendering is triggered as often as possible, the device is put into compass mode and continuously waved from side to side.Compass mode + waving also reveals the presentation lag: After ~20 seconds of continuous waving and then stopping all movement, the map will very noticeably continue to rotate for what feels like more than one second. The presentation lag is also apparent from a visible accumulating incongruence between displayed orientation and device orientation.
Besides a handful of outliers during launch, the resulting frame durations are very consistent:
Here are the mulitple-run-averaged median frame durations inverted to get FPS:
This PR proposes 3 changes that raise median frame rate to 45 fps (measured as described above) and removes the presentation lag on iOS.
layerIsVisible
andzoomFitsLayer
if the layer is required by the source.filteredLayersForSource
property and return to local allocation in the source loop.layerIsVisible
andzoomFitsLayer
for a sourceless layer after verifying it is sourceless.The first change we tested in 5.1.0 (before the introduction of the
filteredLayersForSource
property and before much more refactoring for RenderOrchestrator) and the result was a full return to 58 fps.After the introduction of
filteredLayersForSource
and other changes for render orchestration, tho, the first change no longer resulted in an any improvement in frame rate, i.e., it remained at 29 fps. We then removed the per source loop iterationshrink_to_fit
onfilteredLayersForSource
and this resulted in the 45 fps measurement — but as the point offilteredLayersForSource
is memory use optimization which is nullified without the costlyshrink_to_fit
, the property all but loses its value, which is why we removed it. Using a local, per source loop iteration createdfilteredLayersForSource
yields the same frame rate as keeping the property without theshrink_to_fit
.Finally we have a question as to whether a 4th change is possible: In 4.10.0, sourceless layer processing was outside of the source loop/layer loop. With it moved inside, the same sourceless layers are reconsidered for each source. An efficient approach to include sourceless processing in the source loop nested layer loop but to only process each sourceless layer once is unclear to us, and we welcome your suggestions.