Skip to content
This repository has been archived by the owner on Aug 8, 2023. It is now read-only.

[core] Partially restore render performance #15231

Conversation

johnkassebaum
Copy link

On iOS, recent changes (since 4.10.0) to refactor renderer_impl.cpp into RenderTree and RenderOrchestrator have resulted in:

  • frame rate cut in half: from ~58fps to ~29fps
  • a very noticeable presentation lag

The frame rate reduction has also been noted on Android #14902.

To detect the frame rate reduction on iOS, we added this code (per #14277) to our MGLMapViewDelegate:

    private let df: DateFormatter = {
        let df = DateFormatter()
        df.dateFormat = "H:m:ss.SSSS"
        return df
    }()

    private var timings = ContiguousArray<String>(repeating: "", count: 1000)
    private var timingIdx: Int = 0

    func mapViewDidFinishRenderingFrame(_ mapView: MGLMapView, fullyRendered: Bool) {
        timings[timingIdx] = df.string(from: Date())
        timingIdx += 1
    }

This approach reduces the cost of recording the measurement. Linked against a release build and run in Xcode until the buffer overflows, timings can be output in the console, copied, and analyzed to compute 999 frame durations. To assure that rendering is triggered as often as possible, the device is put into compass mode and continuously waved from side to side.

Compass mode + waving also reveals the presentation lag: After ~20 seconds of continuous waving and then stopping all movement, the map will very noticeably continue to rotate for what feels like more than one second. The presentation lag is also apparent from a visible accumulating incongruence between displayed orientation and device orientation.

Besides a handful of outliers during launch, the resulting frame durations are very consistent:

image

Here are the mulitple-run-averaged median frame durations inverted to get FPS:

  • 4.10.0: 58 fps
  • 5.1.0: 29 fps
  • Current master: 29 fps

This PR proposes 3 changes that raise median frame rate to 45 fps (measured as described above) and removes the presentation lag on iOS.

  • in the loop over layers nested inside the loop over sources, only compute layerIsVisible and zoomFitsLayer if the layer is required by the source.
  • remove the filteredLayersForSource property and return to local allocation in the source loop.
  • in the loop over layers nested inside the loop over sources, only compute layerIsVisible and zoomFitsLayer for a sourceless layer after verifying it is sourceless.

The first change we tested in 5.1.0 (before the introduction of the filteredLayersForSource property and before much more refactoring for RenderOrchestrator) and the result was a full return to 58 fps.

After the introduction of filteredLayersForSource and other changes for render orchestration, tho, the first change no longer resulted in an any improvement in frame rate, i.e., it remained at 29 fps. We then removed the per source loop iteration shrink_to_fit on filteredLayersForSource and this resulted in the 45 fps measurement — but as the point of filteredLayersForSource is memory use optimization which is nullified without the costly shrink_to_fit, the property all but loses its value, which is why we removed it. Using a local, per source loop iteration created filteredLayersForSource yields the same frame rate as keeping the property without the shrink_to_fit.

Finally we have a question as to whether a 4th change is possible: In 4.10.0, sourceless layer processing was outside of the source loop/layer loop. With it moved inside, the same sourceless layers are reconsidered for each source. An efficient approach to include sourceless processing in the source loop nested layer loop but to only process each sourceless layer once is unclear to us, and we welcome your suggestions.

…by source or if sourceless.

Remove orchestrator property vector of layer propertes which was intended to avoid reallocation per frame, but the cost of resizing reduced frame rate by half, and even so, without per frame resizing is no faster than local instantiation per frame.
Only evaluate zoom state on a sourceless layer after first verifying the layer is sourceless and visible.
@johnkassebaum johnkassebaum force-pushed the jkassebaum-half_restore_render_performance branch from f96e587 to 76f6fd4 Compare July 26, 2019 19:25
@julianrex julianrex added the Core The cross-platform C++ core, aka mbgl label Jul 26, 2019
@astojilj astojilj requested review from alexshalamov and pozdnyakov and removed request for alexshalamov July 27, 2019 16:50
@alexshalamov
Copy link
Contributor

alexshalamov commented Jul 29, 2019

@johnkassebaum Thanks for report and PR! We are looking into it at the moment. Our benchmarks do not show any regressions, therefore, performance drop could be related to the style being used. It would be helpful if you could provide more information, so that we can reproduce an issue locally.

The best option would be to provide the style, if that is not possible, would be good to know how many sources and layers style has and how they interact while map is panned / zoomed (e.g., what are the expressions used for layer visibility, source zoom extents, etc.)

Another possibility is that there is a custom layer whose overridden supportsZoom(...) is quite heavy.

@pozdnyakov
Copy link
Contributor

@johnkassebaum Thanks for your analysis and the PR! We've just made a patch, which moves shrinking of filteredLayersForSource out of the rendering call chain (#15240). It might worth testing on top of it.

As for layerIsVisible and zoomFitsLayer computation, both are just numeric comparison, which makes me unsure how they could cause so significant FPS drop. It would be great to see the style that reproduces the issue. Also, this style could be part of our benchmarks and that would help to avoid regressions in future.

@johnkassebaum
Copy link
Author

johnkassebaum commented Jul 29, 2019

@pozdnyakov, @alexshalamov

  • What are the contributing factors to a 'heavy' supportsZoom? The implementation seems very light:
bool RenderLayer::supportsZoom(float zoom) const {
    // TODO: shall we use rounding or epsilon comparisons?
    return baseImpl->minZoom <= zoom && baseImpl->maxZoom >= zoom;
}

And, no, we have not overriden it anywhere. We are using the framework as is.

  • The numerical computations for layerIsVisible and zoomFitsLayer, no matter how lightweight, are needlessly being computed and recomputed for every source, and this is unnecessary for sourced layers that are not associated with the source in the particular source loop iteration.

  • How did you gauge that there was no performance hit on iOS? The framework does not offer any tool to measure FPS. According to the issue cited above, the framework measures the firing of the CADisplayLink. That is a constant and not an indication of rendered frame rate.

  • We wish we could share our stylesheet but it represents a considerable investment. I can provide this information:

Our stylesheet defines:
- 14 tile sources (10 raster, 4 vector)
- At any given time, 2-6 raster tile sources are visible, but a majority of the time it is only 2-3.
- The 4 vector tile sources are always visible, and have 1062, 1950, 63, and 182 style layers. The larger style layer sets are further broken down into 50 visible subsets each, of which generally only 1 subset is visible at a time.
- Only 1 raster tile source is visible thru all zoom levels we present, otherwise they are smaller subsets of on average 10 zoom levels, generally in the range 7-17.
- The vector tile sources have zoom extents of 5-15, but their style layers process the tile data generally up to level 17, and have a paint function for fill and line that interpolates opacity and line width, respectively.
- None of the 3257 style layers define a zoom extent in order to exploit under- and over-zooming.

We also add 16 runtime MGLShapeSources for annotations. All shape sources are always visible, but only 6 regularly have content. The other 10 are for temporary annotations and unless specifically and temporarily invoked, the source's shape property is nil. Each shape source has an associated fill style layer and 1-3 line style layers, only 1 of which is visible at a time.

In the measurements recorded above, I limited the visible sources and style layers to our most common use case:
- 2 visible raster tile sources
- 4 visible vector tile sources each with 10-15 visible style layers
- 6 visible MGLShapeSources and their associated style layers. Four had content in frame.

  • Consider the original simplicity of the render loop:
    for (const auto& source : *sourceImpls) {		
         std::vector<Immutable<Layer::Impl>> filteredLayers;		
         bool needsRendering = false;		
         bool needsRelayout = false;		

          for (const auto& layer : *layerImpls) {		

              if (layer->getTypeInfo()->source == LayerTypeInfo::Source::NotRequired		
                     || layer->source != source->id) {		
                 continue;		
             }		

              if (!needsRendering && getRenderLayer(layer->id)->needsRendering(zoomHistory.lastZoom)) {		
                 needsRendering = true;		
             }		

              if (!needsRelayout && (hasImageDiff || hasLayoutDifference(layerDiff, layer->id))) {		
                 needsRelayout = true;		
             }		

              filteredLayers.push_back(layer);		
         }		

          renderSources.at(source->id)->update(source,		
                                              filteredLayers,		
                                              needsRendering,		
                                              needsRelayout,		
                                              tileParameters);		
     }

Compared to:

    for (const auto& sourceImpl : *sourceImpls) {
         RenderSource* source = renderSources.at(sourceImpl->id).get();
         std::vector<Immutable<Layer::Impl>> filteredLayersForSource;
         filteredLayersForSource.reserve(layerImpls->size());
         bool sourceNeedsRendering = false;
         bool sourceNeedsRelayout = false;       

          for (const auto& layerImpl : *layerImpls) {
             RenderLayer* layer = getRenderLayer(layerImpl->id);
             const auto* layerInfo = layerImpl->getTypeInfo();
             bool layerNeedsRendering = layer->needsRendering(zoomHistory.lastZoom);

              parameters.staticData.has3D = (parameters.staticData.has3D || layerInfo->pass3d == LayerTypeInfo::Pass3D::Required);

              if (layerInfo->source != LayerTypeInfo::Source::NotRequired) {
                 if (layerImpl->source == sourceImpl->id) {
                     sourceNeedsRendering |= layerNeedsRendering;
                     sourceNeedsRelayout = (sourceNeedsRelayout || hasImageDiff || hasLayoutDifference(layerDiff, layerImpl->id));
                     filteredLayersForSource.push_back(layerImpl);

                      if (layerNeedsRendering) {
                         renderItems.emplace_back(*layer, source);
                     }
                 }
                 continue;
             } 

              // Handle layers without source.
             if (layerNeedsRendering && sourceImpl.get() == sourceImpls->at(0).get()) {            
                 if (parameters.contextMode == GLContextMode::Unique
                     && layerImpl.get() == layerImpls->at(0).get()) {
                     const auto& solidBackground = layer->getSolidBackground();
                     if (solidBackground) {
                         backgroundColor = *solidBackground;
                         continue; // This layer is shown with background color, and it shall not be added to render items. 
                     }
                 }
                 renderItems.emplace_back(*layer, nullptr);
             }
         }
         source->update(sourceImpl,
                        filteredLayersForSource,
                        sourceNeedsRendering,
                        sourceNeedsRelayout,
                        tileParameters);
     }

As of release 5.1., the removal of the recomputation of layerIsVisible and zoomFitsLayer restored frame rate to ~58 fps.

@pozdnyakov
Copy link
Contributor

We wish we could share our stylesheet but it represents a considerable investment.

It is totally understandable, however having a benchmark with a mock style sheet would prevent us from experiencing similar issues in future. Would it be an option to add a test to https://github.com/mapbox/mapbox-gl-native/blob/master/benchmark/api/render.benchmark.cpp that illustrates performance benefit of the proposed changes? Think the key point is just having multiple sources with large amount of layers.

As I can see from the proposed changes, the mock style sheet should not depend on particular type of either layers or sources, therefore layers could be easily added using runtime style API in a cycle (as many as required).

@chloekraw chloekraw added the needs changelog Indicates PR needs a changelog entry prior to merging. label Jul 30, 2019
@johnkassebaum
Copy link
Author

No promise on a timeline to provide you with tests, but I can assure you that your refactored code is causing a significant performance hit on iOS and, as reported elsewhere, on Android.

In the meantime, besides removing the memory re-allocations within the render pass loop, perhaps you could investigate removing repeated layer lookup work that is more often than not unnecessary due to the layer not being associated with the source of the outer loop iteration.

@johnkassebaum
Copy link
Author

johnkassebaum commented Aug 6, 2019

I also repeat the request for means of conducting and the results gathered on your own internal benchmarks (" Our benchmarks do not show any regressions") that showed no performance hit. The iOS API has no current way of accurately measuring frame rate.

@tmpsantos
Copy link
Contributor

I triggered the bots for the PR.

@pozdnyakov
Copy link
Contributor

@johnkassebaum I made the following benchmark to try the proposed changes

static void API_renderStill_many_invisible_layers(::benchmark::State& state) {
    using namespace mbgl::style;
    RenderBenchmark bench;
    HeadlessFrontend frontend { size, pixelRatio };
    Map map { frontend, MapObserver::nullObserver(),
              MapOptions().withMapMode(MapMode::Static).withSize(size).withPixelRatio(pixelRatio),
              ResourceOptions().withCachePath(cachePath).withAccessToken("foobar") };
    prepare(map);
    auto& style = map.getStyle();
    auto source = std::make_unique<GeoJSONSource>("GeoJSONSource");
    style.addSource(std::move(source));
    for (int i = 0; i < 10000; ++i) {
        auto layer = std::make_unique<SymbolLayer>(std::to_string(i), "GeoJSONSource");
        layer->setVisibility(VisibilityType::None);
        style.addLayer(std::move(layer));
    }

    while (state.KeepRunning()) {
        frontend.render(map);
    }
}

Unfortunately, it did not show any significant performance changes. Could you please check whether this test shows performance benefit in your environment, or fix the test, so that it better simulates your style data.

I also repeat the request for means of conducting and the results gathered on your own internal benchmarks (" Our benchmarks do not show any regressions") that showed no performance hit.

I can assure you that we take performance and UI responsiveness very seriously and use various metrics to check it on all the supported platforms. However, the engine performance depends a lot on the tile data and styles being used and this might explain the difference of what we experience.

@pozdnyakov
Copy link
Contributor

Performance for the styles that contain lots of sources had been improved with #15756. Please feel free to rebase&reopen this PR if you've measurements showing that it still makes a performance benefit.

@pozdnyakov pozdnyakov closed this Feb 6, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Core The cross-platform C++ core, aka mbgl needs changelog Indicates PR needs a changelog entry prior to merging. needs tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants