Style benchmarks #7047

ryanhamley · 2018-07-30T19:25:49Z

Launch Checklist

briefly describe the changes in this PR
- Adds the ability to run benchmark tests against Mapbox styles by supplying a styleURL
- Does not change functionality of existing benchmark suites that run on various versions of the code
document any changes to public APIs

Layout and Paint tests run once per tile over a list of tiles/locations (center and zoom) supplied by @nickidlugash and the Carto team to exercise a number of different relevant areas of style performance (CJK text, urban, rural, zoom levels, etc).

QueryPoint, QueryBox, StyleLayerCreate and StyleLayerValidate run the same as the existing non-style benchmark suite with the exception that the two Query tests use the list of locations that Carto defined. The other benchmarks are not used by the style suite.

ChrisLoer · 2018-07-31T18:04:28Z

bench/lib/benchmark.js

+    styleURL: string;
+    locations: ?Location;
+
+    constructor(styleURL: string, locations: ?Location) {


I'm confused by the plurality/singularity of locations: ?Location. The type and the way it's used in paint/layout makes it seem like this should be a singular "location" member. Where it's plural (non-paint/layout), it seems like the type would actually be Array<Location> (although I guess there's also the separate case of one "location" with multiple tiles, that doesn't look like what gets returned from import { locations } from './lib/style_locations';

I tried yesterday to define the type as ?Location | ?Array<Location> and kept getting various Flow errors. I can revisit this and see if I can standardize the language or perhaps split a single Location and an array of them into separate properties to make things clearer.

I would guess that it's easy to get flow errors because if it can be singular or plural that has to be explicitly checked all the time before you work with it. If it can be one or more, standardizing on plural seems good -- probably "plural but only one item" doesn't need to be handled as a special case?

By "standardizing on plural", do you mean making this.locations always be an array of Location objects, even if it's just an array with a length of 1? That seems like the most logical way to square this circle in my brain.

ChrisLoer · 2018-07-31T18:07:39Z

bench/lib/parameters.js

@@ -0,0 +1,33 @@
+const isStyleBench = process.env.STYLE_BENCHMARK;


Why do we need two separate "isStyleBench" flags set? (process.env.STYLE_BENCHMARK and also ?style_bench=true) Shouldn't one of them be enough?

Basically, because this flag needs to be available both in index.html and all the benchmark .js files. You can see in index.html that we need to somehow determine the type of test in order to know how to set the tests up. The only way I could think to make this available in a script tag on an html page was to put it in the URL. I recognized this redundancy and I hate it, so I will gladly change it if we can think of a simpler way to make the test type parameter available in both places.

mapbox-gl-js/bench/index.html

Line 20 in 6933576

if (params.get('style_bench')) {

Hmm yeah I see the challenge. I guess you could wait for the local script to resolve and then look up the isStyleBench used there, but that would be a weird timing dependency? Or you could make index.html into something templated and have the watch script populate the parameter accordingly? It's probably not too big a deal if it's hard...

This is one of those things that seems like a huge issue when you've been deep in the weeds for a couple weeks. Then you take a step back and realize the solution was obvious. Just using the URL parameter works fine and simplifies the API considerably. D'oh!

It looks like there's more divergent branches than shared code in some of these files, so rather than trying to share all the same files between the two benchmarking modes, what about an approach where we use different paths/index.html/benchmarks.js files?

For example, we could have the version-comparing benchmarks at /bench/versions/ and the style-comparing benchmarks at /bench/styles/.

Where we do want to share code between the modes, we can do that by extracting components or functions and importing them into both.

ChrisLoer · 2018-07-31T18:23:27Z

bench/benchmarks_view.js

+                    }
+                });
+
+                promise = promise.then(() => {


It looks like there's a lot of near-duplicate code here between the isStyleBench && name === 'Layout' || name === 'Paint' case and the regular case. Is there a way to collapse it to one inner loop and just factor out the style-specific version lookup logic?

This seems to be in tension with @jfirebaugh's suggestion above.

It looks like there's more divergent branches than shared code in some of these files, so rather than trying to share all the same files between the two benchmarking modes, what about an approach where we use different paths/index.html/benchmarks.js files?

The biggest changes are this section and the register function in bench/benchmarks.js. For the individual benchmark tests, the simplest thing from a code standpoint might be to just standardize on the style_locations as input. We don't have to run the tests individually since that takes a long time, but we could still use the style_locations as opposed to a simple array of zoom numbers with a hardcoded center all averaged together into a single test. You can see this in action in the QueryBox and QueryPoint benches because that's exactly what they do in a style test. This might give us better real-world tests in version benchmarking while simplifying the code base some.

As for this section, is it better to attempt to factor out the similar code here or break it apart the way John suggests?

🤔 Yeah there is some tension, but maybe not that much? To me it makes sense that specifying the set of benchmarks (e.g. benchmarks.js) would have different implementations, but rendering a benchmark (e.g. benchmarks_view.js) would have a shared/generalized implementation.

What do you think about standardizing on the locations/tiles in style_locations.js? I updated the handful of tests that rely on iterating over a few different zoom levels to use the ones in style_locations and it all still works. The code is simpler as well. The setup method in Paint, for example, went from 25 to 11 lines of code. The tradeoff is that the tests take longer now, which you'd expect from benching 15 maps as opposed to 4 or 5. It's most dramatic in Layout.

I like the idea of using a standardized code pathway/definition format for specifying multiple locations. I think we should probably have separate location definitions for style benchmarking vs. the regular benchmarks because:

We'll probably want to change the style locations more often

In style benchmarking we'll typically care about more locations and be willing to spend more time testing them

Changing the location set isn't a big deal when you're comparing multiple styles with the same code version, but since it's baked into the benchmark, changing the location set is a big deal for comparing between code versions (as shown in your screenshot above: it breaks comparisons for Layout whenever different versions have different location sets).

ChrisLoer · 2018-07-31T20:29:50Z

bench/lib/parameters.js

@@ -1,4 +1,4 @@
-const isStyleBench = process.env.STYLE_BENCHMARK;
+const isStyleBench = getURLParameter('style_bench');


ryanhamley · 2018-08-13T22:11:26Z

Everything still works as before, but I've done some significant refactoring to create separate pathways for style and version tests. It's no longer required to add any additional parameter to the URL which makes switching between the types of tests easier because it's dependent on the npm command you run. The newest commits also refactor the business logic from the view.

jfirebaugh · 2018-08-14T18:56:49Z

package.json

    "watch-benchmarks-view": "BENCHMARK_VERSION=${BENCHMARK_VERSION:-\"$(git rev-parse --abbrev-ref HEAD) $(git rev-parse --short=7 HEAD)\"} rollup -c bench/rollup_config_benchmarks_view.js --watch",
    "start-server": "st --no-cache -H 0.0.0.0 --port 9966 --index index.html .",
+    "start-bench-server": "st --no-cache -H 0.0.0.0 --port 9966 --index index.versions.html .",


Instead of separate script commands, could we have http://0.0.0.0:9966/bench/versions/ and http://0.0.0.0:9966/bench/styles/?

I can anticipate situations in which we want to switch quickly between the two, plus we already have a lot of script commands.

I've reduced the scripts to just start-server and start-bench but switching between bench/versions/ and bench/styles/ doesn't rebuild the generated files because Rollup isn't run again. Is there a straightforward way to run Rollup when you change URLs? Or is it ok to have to run start-bench again?

I guess bench/versions/ and bench/styles/ should load different bundles, both of which are watched by start-bench?

jfirebaugh · 2018-08-14T18:59:00Z

bench/index.styles.html

+            });
+        }
+
+        ['/bench/benchmarks_generated.js', '/bench/benchmarks_view_generated.js'].reduce((p, script) => p.then(() => loadScript(script)), Promise.resolve());


Since there's no dynamic URL generation necessary, these can use plain old <script> tags.

jfirebaugh · 2018-08-14T19:02:25Z

bench/lib/benchmark.js

+    styleURL: string;
+    locations: ?Array<Location>;
+
+    constructor(styleURL: string, locations: ?Array<Location>) {


Putting this in the base class feels a little over-generalized to me -- not all benchmarks have a style URL. I'd rather have concrete subclasses initialize the properties they use. There'll be a bit of repetition, but it's more explicit and flexible.

jfirebaugh · 2018-08-14T19:03:39Z

bench/version-tests/benchmarks.js

 mapboxgl.accessToken = accessToken;

 window.mapboxglVersions = window.mapboxglVersions || [];
 window.mapboxglBenchmarks = window.mapboxglBenchmarks || {};

+const url = styleURL();
 const version = process.env.BENCHMARK_VERSION;
 window.mapboxglVersions.push(version);

 function register(Benchmark) {


Feel free to change the parameter type from a class to an instance, so that different subclasses can accept different constructor parameters. And keep pulling out things that differ between version and style benchmarks.

const style = 'mapbox://styles/mapbox/streets-v10'; const center = [-77.032194, 38.912753]; const locations = [4, 8, 11, 13, 15, 17].map(zoom => ({zoom, center, style}); register(new Layout(url)); register(new Paint(locations)); register(new QueryPoint(locations));

jfirebaugh

Getting there! Thanks for plugging along with this.

jfirebaugh · 2018-08-17T22:25:33Z

bench/lib/benchmark.js

@@ -1,5 +1,6 @@
 // @flow
-
+import { OverscaledTileID } from '../../src/source/tile_id';
+import type { LngLatLike } from '../../src/geo/lng_lat';


These changes are no longer necessary.

jfirebaugh · 2018-08-17T22:35:22Z

bench/versions/benchmarks.js

+const version = process.env.BENCHMARK_VERSION;
+window.mapboxglVersions.push(version);
+
+function register(Benchmark) {


function register(benchmark) { const name = benchmark.constructor.name; window.mapboxglBenchmarks[name] = window.mapboxglBenchmarks[name] || {}; window.mapboxglBenchmarks[name][version] = benchmark; } import Layout from '../benchmarks/layout'; import LayoutDDS from '../benchmarks/layout_dds'; ... register(new Layout(style, locations)); register(new LayoutDDS()); ...

jfirebaugh · 2018-08-17T22:37:49Z

bench/benchmarks/layout.js

@@ -46,7 +52,8 @@ export default class Layout extends Benchmark {
        return fetch(normalizeSourceURL(sourceURL))
            .then(response => response.json())
            .then((tileJSON: TileJSON) => {
-                return Promise.all(this.tileIDs().map(tileID => {
+                const tileIDs = this.locations && this.locations[0].tileID ? this.locations[0].tileID : this.tileIDs();


Is there a way we can parameterize this via the constructor and avoid the conditionals here...

jfirebaugh · 2018-08-17T22:38:00Z

bench/benchmarks/layout.js

@@ -102,9 +109,10 @@ export default class Layout extends Benchmark {

        for (const {tileID, buffer} of this.tiles) {
            promise = promise.then(() => {
+                const zoom = this.locations ? this.locations[0].zoom : tileID.overscaledZ;


...and here?

jfirebaugh · 2018-08-17T22:45:23Z

bench/benchmarks_shared_viewmodel.js

+export const benchmarks = [];
+
+let finished = false;
+let promise = Promise.resolve();


Seems like these should go inside setupTestRun.

jfirebaugh · 2018-08-17T22:59:31Z

bench/styles/benchmarks.js

+
+mapboxgl.accessToken = accessToken;
+
+window.mapboxglVersions = window.mapboxglVersions || [];


For style benchmarks, I think we use a simpler registration mechanism. Using window.mapboxglVersions as a global variable is needed for the version benchmarks because they load remote scripts which must register themselves at a particular version but correlate named benchmarks across versions. For style benchmarks this isn't necessary, so we could use normal JS variables and combine styles/benchmarks.js and styles/benchmarks_viewmodel.js, which in turn should simplify the conditional logic unique to Layout and Paint.

jfirebaugh · 2018-08-23T14:36:46Z

bench/benchmarks/paint.js

@@ -1,24 +1,36 @@
+// @flow


jfirebaugh · 2018-08-23T16:11:43Z

bench/styles/benchmarks.js

+    if (filter && name !== filter)
+        return;
+
+    switch (name) {


Do something similar to what you did for versions/benchmarks.js:

function register(Benchmark, locations, options) { const name = Benchmark.name; if (filter && name !== filter) return; // inline createBenchmark here } locations.forEach(location => { register(Layout, location.tileID, {location}); register(Paint, [location], {location}); }); register(QueryPoint, locations); register(QueryBox, locations); register(StyleLayerCreate); register(Validate);

Doing locations.forEach(location => { register(Layout...); register(Paint...); }); works but the result is that you when you run all the tests, you get a pattern in the UI of Layout, Paint, Layout, Paint.... This could be useful since you can see the Layout and Paint tests for each tile grouped together. Or would we expect that all Layout and all Paint tests are grouped together?

If you want to keep the existing order, this is fine too:

locations.forEach(location => register(Layout...)); locations.forEach(location => register(Paint...));

jfirebaugh · 2018-08-23T16:14:20Z

bench/styles/benchmarks.js

+
+import getWorkerPool from '../../src/util/global_worker_pool';
+
+setTimeout(() => {


I don't know if it's guaranteed that this will run before the first promise. Let's put it at the beginning of the promise chain so the order is clear:

let promise = Promise.resolve() then(() => { // Ensure the global worker pool is never drained. ... // ... });

jfirebaugh · 2018-08-23T16:14:39Z

bench/styles/benchmarks.js

+    getWorkerPool().acquire(-1);
+}, 0);
+
+export default mapboxgl;


Probably not necessary.

Removing this causes an error in the console Cannot set property 'workerUrl' of undefined that's thrown from benchmarks_generated.js, specifically a line mapboxgl.workerUrl = window.URL.createObjectURL(new Blob([workerBundleString], { type: 'text/javascript' }));. This doesn't actually seem to cause any issues in running the tests though so I'm not sure what this error is really about.

EDIT: I take it back. It does seem to break certain tests so I think this has to stay.

jfirebaugh · 2018-08-23T16:15:07Z

bench/styles/benchmarks.js

+register(QueryPoint);
+register(QueryBox);
+
+runBenchmarks();


Inline the function body here.

jfirebaugh · 2018-08-23T16:19:21Z

bench/versions/benchmarks.js

+register(new FilterCreate());
+register(new FilterEvaluate());
+
+runBenchmarks();


Won't this cause duplicate runs, once the benchmarks_generated.js on S3 for prior versions also contain this change? The version benchmarks need to differ from styles here -- maybe index.html should execute the run?

jfirebaugh

Nice work @ryanhamley! The ability to benchmark style variations will be valuable for both the cartography team and the GL team. Thanks for working through the subsequent refactoring -- the result is clean code that we'll be able to extend easily in the future.

ryanhamley requested a review from ChrisLoer July 30, 2018 22:23

ChrisLoer reviewed Jul 31, 2018

View reviewed changes

jfirebaugh reviewed Aug 14, 2018

View reviewed changes

jfirebaugh reviewed Aug 17, 2018

View reviewed changes

jfirebaugh reviewed Aug 23, 2018

View reviewed changes

jfirebaugh approved these changes Aug 27, 2018

View reviewed changes

ryanhamley force-pushed the style-benchmarks branch 2 times, most recently from 32e0d22 to 075d950 Compare August 27, 2018 22:49

Refactor benchmark test suite and add style benchmark tests

b3eb7da

ryanhamley force-pushed the style-benchmarks branch from 0be97e1 to b3eb7da Compare August 27, 2018 23:00

ryanhamley merged commit b3eb7da into master Aug 27, 2018

ryanhamley deleted the style-benchmarks branch August 27, 2018 23:16

		@@ -0,0 +1,33 @@
		const isStyleBench = process.env.STYLE_BENCHMARK;

		@@ -1,4 +1,4 @@
		const isStyleBench = process.env.STYLE_BENCHMARK;
		const isStyleBench = getURLParameter('style_bench');


		mapboxgl.accessToken = accessToken;

		window.mapboxglVersions = window.mapboxglVersions \|\| [];


		import getWorkerPool from '../../src/util/global_worker_pool';

		setTimeout(() => {

Style benchmarks #7047

Style benchmarks #7047

Conversation

ryanhamley commented Jul 30, 2018

Launch Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryanhamley commented Aug 13, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jfirebaugh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ryanhamley Aug 24, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jfirebaugh left a comment

Choose a reason for hiding this comment

ryanhamley commented Aug 13, 2018 •

edited

Loading

ryanhamley Aug 24, 2018 •

edited

Loading