Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perf issues after m9 deployment #232

Closed
rafrombrc opened this issue Aug 29, 2017 · 15 comments
Closed

Perf issues after m9 deployment #232

rafrombrc opened this issue Aug 29, 2017 · 15 comments
Assignees
Milestone

Comments

@rafrombrc
Copy link
Member

Folks are reporting major lag and STMO performance issues that started conspicuously after the last deployment went out, in bugzilla (https://bugzilla.mozilla.org/show_bug.cgi?id=1394468) as well as multiple times in various IRC and slack channels. This is impacting people's work, so hopefully we can figure this out and get a 9.1 (or whatever the next minor rev is) out ASAP.

@alison985 alison985 added this to the 10 milestone Aug 29, 2017
@alison985 alison985 added the bug label Aug 29, 2017
@alison985 alison985 self-assigned this Aug 29, 2017
@madalincm
Copy link

The issue is most visible in the new query page and appears to be reproducing for all the data sources in production. In stage the issue is only visible for the Athena data source. Here is a link for a profile for a query written for against the Athena data source: http://bit.ly/2vrU6Ag

Also I have observed another issue that might be related. When writing a query the first letter introduced by the user is duplicated. This is only reproducing in production. Please view this screencast:
duplicatefirstletter

@alison985
Copy link

I can reproduce slowness in prod on OS X in Nightly and 55.0.3 (64-bit). Nightly is giving a javascript error, though it's a minified variable name. I'm going to create/update a local test environment on my nightly machine.

@alison985
Copy link

So... in nightly with postgres - current Mozilla master autocomplete works fine and has the drop-down on my local and production.

Production misses the dropdown and has a speed issue on Athena and Presto.

@alison985
Copy link

Just merged PR #233 to staging. Don't think it will fix the issue but it will be helpful for diagnostic purposes since Athena and Presto are the same on production and Athena is on staging and production.

@wlach
Copy link

wlach commented Aug 30, 2017

I did a quick profile yesterday of this, and it seems like the client-side lag is coming from some long-running javascript in angular's digest cycle: https://www.sitepoint.com/understanding-angulars-apply-digest/

Several possibilities here come to mind:

  1. We're processing more data in the digest cycle than we were before due to a server-side change
  2. The digest cycle is being triggered more often than before due to a client-side change
  3. We're doing more in the digest cycle due to a client-side change

@alison985
Copy link

So the JS in redash has a 5000 token limit set right now to turn the auto completer on and off with a note about performance. Also this isn't reproducible in nightly on staging or prod for the metadata data source.

I lowered the redash javascript token count check to 3000 as part of PR #234 and will see if that helps on staging. Given that the problem doesn't happen in Chrome (and at least 1 older version of Firefox), I currently think this was happening for a combination of:

  1. the list of fields/tables in presto and athena got longer.
  2. firefox latest/firefox nightly is doing something differently with javascript interpretation that was making it take longer.

@wlach
Copy link

wlach commented Aug 31, 2017

This problem is totally reproducible in Chrome, so I think we can rule out (2). The problem is less grave there, but it's still definitely present.

Losing autocompletion kind of sucks. :( Looking at a Firefox performance profile, it seems like much of the time is being spent in this function:

const schemaCompleter = {

(I'm not sure why, but Chrome's profiler is refusing to give me more detail on this machine...)

If you have time, it might be worth investigating where time is being spent using an unminified version of the site sources and see if there's anything that we could improve. In particular, I'm wondering if we're unnecessarily recomputing stuff every time the autocomplete callback is being executed -- in particular, why are we recreating $scope.autoCompleteSchema every time the function is called?

$scope.autoCompleteSchema = removeExtraSchemaInfo($scope.schema);

Could we not get away with only calling that function if it does not already exist? In addition to likely being expensive in itself, recreating it also requires us to rebuild the keywords property later in the function:

$scope.autoCompleteSchema.keywords = map(keywords, (v, k) =>

@alison985
Copy link

Hi @wlach. Thanks for continuing to investigate. This is what I ended up doing: https://github.com/mozilla/redash/compare/master@%7B1day%7D...master It seems to be working nicely on staging in Nightly with the Athena data source but it does eliminate autocomplete.

FWIW:
A) The getCompletions function is inside a const variable declaration (schemaCompleter) So it only should have gotten called once anyway. I had also tried moving it within the file but that's the only place $scope.schema wasn't undefined.
B) auto-complete was already lost at the 5000 mark for some data sources but it was still being called in the background, adding a lag without any benefit. Now there is no autocomplete but there is also no lag.
C) The next time I do a push to staging I'll try upping it to the 5000 mark again to see if autocomplete can come back when the schema browser toggle is turned on (to eliminate the _v tables).

@wlach
Copy link

wlach commented Aug 31, 2017

@alison985 have you verified that getCompletions is only being called once? It looked like it was being called multiple times from what I could tell in my profile.

@alison985
Copy link

@wlach I tested again and it is getting called multiple times. Firefox organizes the dev console differently than Chrome (aggregates repeating prints) so I missed it first time around.

However, if you print langTools before the langTools.addCompleter you see that it only stores the function call, not the values. I've tried changing that to no avail. I think this is the way the framework for autocomplete was built to work. It's also always been true.

The thing that changed within redash related to the autocomplete was adding logic to take out the [P] (indicating partition key) and (column_type) characters for the auto-complete list.

The things that would have changed in the data source was the number of tokens generated by the table and field count in a data source. I did find a line of code that was duplicating some tokens, but removing it still hasn't put us beneath the 5000 token mark for Athena.

@wlach
Copy link

wlach commented Aug 31, 2017

@alison985 So my suggestion was maybe we could cache some/all of what getCompletions returns, so subsequent calls to it don't have to regenerate a set of search suggestions from the schema. :) This might be as easy as putting in something like this logic in the function:

getCompletions(state, session, pos, prefix, callback) {
    if ($scope.autoCompleteSchema === undefined) {
       // make a variable for the auto completion in the query editor
       $scope.autoCompleteSchema = removeExtraSchemaInfo($scope.schema);
   }
   ...

Could you give that a try and see if it helps? It looks like whoever wrote this originally was trying to cache the result in $scope, but the extra call to removeExtraSchemaInfo totally defeated that effort.

@alison985
Copy link

I had actually already tried that if statement and many others. Regardless of what I do the framework stores the function call name, not the results.

Have you seen the latest code? https://github.com/mozilla/redash/blob/master/client/app/pages/queries/query-editor.js#L37

@wlach
Copy link

wlach commented Aug 31, 2017

Yeah, I think that's ok that the editor calls the function repeatedly -- that's probably just what it's designed to do. Your latest code looks like it should be fast though -- is the function still showing up in profiles?

@alison985
Copy link

@wlach I have absolutely no idea how to check on a profile. It's on staging if you want to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants