Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Style benchmarks #7047

Merged
merged 1 commit into from
Aug 27, 2018
Merged

Style benchmarks #7047

merged 1 commit into from
Aug 27, 2018

Conversation

ryanhamley
Copy link
Contributor

Launch Checklist

  • briefly describe the changes in this PR
    • Adds the ability to run benchmark tests against Mapbox styles by supplying a styleURL
    • Does not change functionality of existing benchmark suites that run on various versions of the code
  • document any changes to public APIs

Layout and Paint tests run once per tile over a list of tiles/locations (center and zoom) supplied by @nickidlugash and the Carto team to exercise a number of different relevant areas of style performance (CJK text, urban, rural, zoom levels, etc).
screen shot 2018-07-30 at 12 20 07 pm
screen shot 2018-07-30 at 12 20 38 pm

QueryPoint, QueryBox, StyleLayerCreate and StyleLayerValidate run the same as the existing non-style benchmark suite with the exception that the two Query tests use the list of locations that Carto defined. The other benchmarks are not used by the style suite.
screen shot 2018-07-30 at 12 20 58 pm

styleURL: string;
locations: ?Location;

constructor(styleURL: string, locations: ?Location) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by the plurality/singularity of locations: ?Location. The type and the way it's used in paint/layout makes it seem like this should be a singular "location" member. Where it's plural (non-paint/layout), it seems like the type would actually be Array<Location> (although I guess there's also the separate case of one "location" with multiple tiles, that doesn't look like what gets returned from import { locations } from './lib/style_locations';

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried yesterday to define the type as ?Location | ?Array<Location> and kept getting various Flow errors. I can revisit this and see if I can standardize the language or perhaps split a single Location and an array of them into separate properties to make things clearer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would guess that it's easy to get flow errors because if it can be singular or plural that has to be explicitly checked all the time before you work with it. If it can be one or more, standardizing on plural seems good -- probably "plural but only one item" doesn't need to be handled as a special case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "standardizing on plural", do you mean making this.locations always be an array of Location objects, even if it's just an array with a length of 1? That seems like the most logical way to square this circle in my brain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. 👍

@@ -0,0 +1,33 @@
const isStyleBench = process.env.STYLE_BENCHMARK;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need two separate "isStyleBench" flags set? (process.env.STYLE_BENCHMARK and also ?style_bench=true) Shouldn't one of them be enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, because this flag needs to be available both in index.html and all the benchmark .js files. You can see in index.html that we need to somehow determine the type of test in order to know how to set the tests up. The only way I could think to make this available in a script tag on an html page was to put it in the URL. I recognized this redundancy and I hate it, so I will gladly change it if we can think of a simpler way to make the test type parameter available in both places.

if (params.get('style_bench')) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yeah I see the challenge. I guess you could wait for the local script to resolve and then look up the isStyleBench used there, but that would be a weird timing dependency? Or you could make index.html into something templated and have the watch script populate the parameter accordingly? It's probably not too big a deal if it's hard...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of those things that seems like a huge issue when you've been deep in the weeds for a couple weeks. Then you take a step back and realize the solution was obvious. Just using the URL parameter works fine and simplifies the API considerably. D'oh!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there's more divergent branches than shared code in some of these files, so rather than trying to share all the same files between the two benchmarking modes, what about an approach where we use different paths/index.html/benchmarks.js files?

For example, we could have the version-comparing benchmarks at /bench/versions/ and the style-comparing benchmarks at /bench/styles/.

Where we do want to share code between the modes, we can do that by extracting components or functions and importing them into both.

}
});

promise = promise.then(() => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there's a lot of near-duplicate code here between the isStyleBench && name === 'Layout' || name === 'Paint' case and the regular case. Is there a way to collapse it to one inner loop and just factor out the style-specific version lookup logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be in tension with @jfirebaugh's suggestion above.

It looks like there's more divergent branches than shared code in some of these files, so rather than trying to share all the same files between the two benchmarking modes, what about an approach where we use different paths/index.html/benchmarks.js files?

The biggest changes are this section and the register function in bench/benchmarks.js. For the individual benchmark tests, the simplest thing from a code standpoint might be to just standardize on the style_locations as input. We don't have to run the tests individually since that takes a long time, but we could still use the style_locations as opposed to a simple array of zoom numbers with a hardcoded center all averaged together into a single test. You can see this in action in the QueryBox and QueryPoint benches because that's exactly what they do in a style test. This might give us better real-world tests in version benchmarking while simplifying the code base some.

As for this section, is it better to attempt to factor out the similar code here or break it apart the way John suggests?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Yeah there is some tension, but maybe not that much? To me it makes sense that specifying the set of benchmarks (e.g. benchmarks.js) would have different implementations, but rendering a benchmark (e.g. benchmarks_view.js) would have a shared/generalized implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about standardizing on the locations/tiles in style_locations.js? I updated the handful of tests that rely on iterating over a few different zoom levels to use the ones in style_locations and it all still works. The code is simpler as well. The setup method in Paint, for example, went from 25 to 11 lines of code. The tradeoff is that the tests take longer now, which you'd expect from benching 15 maps as opposed to 4 or 5. It's most dramatic in Layout.

screen shot 2018-07-31 at 3 25 55 pm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of using a standardized code pathway/definition format for specifying multiple locations. I think we should probably have separate location definitions for style benchmarking vs. the regular benchmarks because:

  • We'll probably want to change the style locations more often
  • In style benchmarking we'll typically care about more locations and be willing to spend more time testing them
  • Changing the location set isn't a big deal when you're comparing multiple styles with the same code version, but since it's baked into the benchmark, changing the location set is a big deal for comparing between code versions (as shown in your screenshot above: it breaks comparisons for Layout whenever different versions have different location sets).

@@ -1,4 +1,4 @@
const isStyleBench = process.env.STYLE_BENCHMARK;
const isStyleBench = getURLParameter('style_bench');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. 🙂

@ryanhamley
Copy link
Contributor Author

ryanhamley commented Aug 13, 2018

Everything still works as before, but I've done some significant refactoring to create separate pathways for style and version tests. It's no longer required to add any additional parameter to the URL which makes switching between the types of tests easier because it's dependent on the npm command you run. The newest commits also refactor the business logic from the view.

package.json Outdated
"watch-benchmarks-view": "BENCHMARK_VERSION=${BENCHMARK_VERSION:-\"$(git rev-parse --abbrev-ref HEAD) $(git rev-parse --short=7 HEAD)\"} rollup -c bench/rollup_config_benchmarks_view.js --watch",
"start-server": "st --no-cache -H 0.0.0.0 --port 9966 --index index.html .",
"start-bench-server": "st --no-cache -H 0.0.0.0 --port 9966 --index index.versions.html .",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of separate script commands, could we have http://0.0.0.0:9966/bench/versions/ and http://0.0.0.0:9966/bench/styles/?

I can anticipate situations in which we want to switch quickly between the two, plus we already have a lot of script commands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reduced the scripts to just start-server and start-bench but switching between bench/versions/ and bench/styles/ doesn't rebuild the generated files because Rollup isn't run again. Is there a straightforward way to run Rollup when you change URLs? Or is it ok to have to run start-bench again?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess bench/versions/ and bench/styles/ should load different bundles, both of which are watched by start-bench?

});
}

['/bench/benchmarks_generated.js', '/bench/benchmarks_view_generated.js'].reduce((p, script) => p.then(() => loadScript(script)), Promise.resolve());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there's no dynamic URL generation necessary, these can use plain old <script> tags.

styleURL: string;
locations: ?Array<Location>;

constructor(styleURL: string, locations: ?Array<Location>) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting this in the base class feels a little over-generalized to me -- not all benchmarks have a style URL. I'd rather have concrete subclasses initialize the properties they use. There'll be a bit of repetition, but it's more explicit and flexible.

mapboxgl.accessToken = accessToken;

window.mapboxglVersions = window.mapboxglVersions || [];
window.mapboxglBenchmarks = window.mapboxglBenchmarks || {};

const url = styleURL();
const version = process.env.BENCHMARK_VERSION;
window.mapboxglVersions.push(version);

function register(Benchmark) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to change the parameter type from a class to an instance, so that different subclasses can accept different constructor parameters. And keep pulling out things that differ between version and style benchmarks.

const style = 'mapbox://styles/mapbox/streets-v10';
const center = [-77.032194, 38.912753];
const locations = [4, 8, 11, 13, 15, 17].map(zoom => ({zoom, center, style});
register(new Layout(url));
register(new Paint(locations));
register(new QueryPoint(locations));

Copy link
Contributor

@jfirebaugh jfirebaugh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting there! Thanks for plugging along with this.

@@ -1,5 +1,6 @@
// @flow

import { OverscaledTileID } from '../../src/source/tile_id';
import type { LngLatLike } from '../../src/geo/lng_lat';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are no longer necessary.

const version = process.env.BENCHMARK_VERSION;
window.mapboxglVersions.push(version);

function register(Benchmark) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function register(benchmark) {
    const name = benchmark.constructor.name;
    window.mapboxglBenchmarks[name] = window.mapboxglBenchmarks[name] || {};
    window.mapboxglBenchmarks[name][version] = benchmark;
}

import Layout from '../benchmarks/layout';
import LayoutDDS from '../benchmarks/layout_dds';
...

register(new Layout(style, locations));
register(new LayoutDDS());
...

@@ -46,7 +52,8 @@ export default class Layout extends Benchmark {
return fetch(normalizeSourceURL(sourceURL))
.then(response => response.json())
.then((tileJSON: TileJSON) => {
return Promise.all(this.tileIDs().map(tileID => {
const tileIDs = this.locations && this.locations[0].tileID ? this.locations[0].tileID : this.tileIDs();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we can parameterize this via the constructor and avoid the conditionals here...

@@ -102,9 +109,10 @@ export default class Layout extends Benchmark {

for (const {tileID, buffer} of this.tiles) {
promise = promise.then(() => {
const zoom = this.locations ? this.locations[0].zoom : tileID.overscaledZ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...and here?

export const benchmarks = [];

let finished = false;
let promise = Promise.resolve();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like these should go inside setupTestRun.


mapboxgl.accessToken = accessToken;

window.mapboxglVersions = window.mapboxglVersions || [];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For style benchmarks, I think we use a simpler registration mechanism. Using window.mapboxglVersions as a global variable is needed for the version benchmarks because they load remote scripts which must register themselves at a particular version but correlate named benchmarks across versions. For style benchmarks this isn't necessary, so we could use normal JS variables and combine styles/benchmarks.js and styles/benchmarks_viewmodel.js, which in turn should simplify the conditional logic unique to Layout and Paint.

@@ -1,24 +1,36 @@
// @flow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

if (filter && name !== filter)
return;

switch (name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do something similar to what you did for versions/benchmarks.js:

function register(Benchmark, locations, options) {
    const name = Benchmark.name;

    if (filter && name !== filter)
        return;

    // inline createBenchmark here
}

locations.forEach(location => {
    register(Layout, location.tileID, {location});
    register(Paint, [location], {location});
});

register(QueryPoint, locations);
register(QueryBox, locations);

register(StyleLayerCreate);
register(Validate);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing locations.forEach(location => { register(Layout...); register(Paint...); }); works but the result is that you when you run all the tests, you get a pattern in the UI of Layout, Paint, Layout, Paint.... This could be useful since you can see the Layout and Paint tests for each tile grouped together. Or would we expect that all Layout and all Paint tests are grouped together?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to keep the existing order, this is fine too:

locations.forEach(location => register(Layout...));
locations.forEach(location => register(Paint...));


import getWorkerPool from '../../src/util/global_worker_pool';

setTimeout(() => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if it's guaranteed that this will run before the first promise. Let's put it at the beginning of the promise chain so the order is clear:

let promise = Promise.resolve()
    then(() => {
        // Ensure the global worker pool is never drained. ...
        // ...
    });

getWorkerPool().acquire(-1);
}, 0);

export default mapboxgl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not necessary.

Copy link
Contributor Author

@ryanhamley ryanhamley Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this causes an error in the console Cannot set property 'workerUrl' of undefined that's thrown from benchmarks_generated.js, specifically a line mapboxgl.workerUrl = window.URL.createObjectURL(new Blob([workerBundleString], { type: 'text/javascript' }));. This doesn't actually seem to cause any issues in running the tests though so I'm not sure what this error is really about.

EDIT: I take it back. It does seem to break certain tests so I think this has to stay.

register(QueryPoint);
register(QueryBox);

runBenchmarks();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inline the function body here.

register(new FilterCreate());
register(new FilterEvaluate());

runBenchmarks();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this cause duplicate runs, once the benchmarks_generated.js on S3 for prior versions also contain this change? The version benchmarks need to differ from styles here -- maybe index.html should execute the run?

Copy link
Contributor

@jfirebaugh jfirebaugh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @ryanhamley! The ability to benchmark style variations will be valuable for both the cartography team and the GL team. Thanks for working through the subsequent refactoring -- the result is clean code that we'll be able to extend easily in the future.

@ryanhamley ryanhamley force-pushed the style-benchmarks branch 2 times, most recently from 32e0d22 to 075d950 Compare August 27, 2018 22:49
@ryanhamley ryanhamley merged commit b3eb7da into master Aug 27, 2018
@ryanhamley ryanhamley deleted the style-benchmarks branch August 27, 2018 23:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants