Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sqllab] Help sqllab forget query history #4833

Merged
merged 2 commits into from
May 7, 2018

Conversation

timifasubaa
Copy link
Contributor

@timifasubaa timifasubaa commented Apr 16, 2018

In the sqllab asynchronous polling logic, it looks through the query history and continues to poll if there is a query in the history that is not successful.

This PR makes it look only at the last 6 hours of queries when checking for their states. The rationale is that a query that was started 6 hours ago is probably dead or problematic and should be rerun.

@graceguo-supercat @mistercrunch @john-bodley

@timifasubaa timifasubaa force-pushed the help_sqllab_forget_the_past branch 6 times, most recently from 163ddda to 5ea9dd9 Compare April 16, 2018 22:30
@mistercrunch
Copy link
Member

I've been meaning to do something like this. I wonder if the state should be altered though, like setting the state to failed or timeout. Also build is failing...

@john-bodley
Copy link
Member

I agree with @mistercrunch it would be great to change the state in the queries table for clarity and auditing purposes. Note this would actually be a backend change as opposed to a frontend change.

@timifasubaa
Copy link
Contributor Author

timifasubaa commented Apr 18, 2018

Okay, I will change the logic to set all pending or running queries older than 6 hours old to failed.

@codecov-io
Copy link

codecov-io commented Apr 18, 2018

Codecov Report

Merging #4833 into master will decrease coverage by 0.01%.
The diff coverage is 85.71%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4833      +/-   ##
==========================================
- Coverage   77.09%   77.07%   -0.02%     
==========================================
  Files          44       44              
  Lines        8551     8558       +7     
==========================================
+ Hits         6592     6596       +4     
- Misses       1959     1962       +3
Impacted Files Coverage Δ
superset/db_engine_specs.py 52.47% <ø> (ø) ⬆️
superset/views/core.py 74.62% <85.71%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8933365...ab958c6. Read the comment docs.

@john-bodley
Copy link
Member

john-bodley commented Apr 18, 2018

@timifasubaa I wonder if timeout or maybe timedout(past tense) is more descriptive that failed since the later can mean multiple things.

@timifasubaa
Copy link
Contributor Author

There is a lot of repeated logic with stop_query for the new timedout query logic.

@timifasubaa timifasubaa force-pushed the help_sqllab_forget_the_past branch 6 times, most recently from 138dc75 to ab18e5f Compare April 20, 2018 22:20
@@ -16,12 +17,23 @@ class QueryAutoRefresh extends React.PureComponent {
componentWillUnmount() {
this.stopTimer();
}
timeoutOldQueries() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic should not be in front-end (browser-side).
Query table knows each query started time. it doesn't need front-end sends ajax call to set query state.
when front-end ask a query's status, server side should first check the query's start time, if

  • longer than 6 hours old: set it timeout and return timeout state.
  • < 6 hours: return its current status.

@timifasubaa timifasubaa force-pushed the help_sqllab_forget_the_past branch 8 times, most recently from 1ed8142 to 9d2e516 Compare April 24, 2018 01:09
@timifasubaa
Copy link
Contributor Author

I changed the logic to update the db's sate to timeout before it responds to polls.

@timifasubaa
Copy link
Contributor Author

@john-bodley @mistercrunch @graceguo-supercat ready for review

@timifasubaa timifasubaa force-pushed the help_sqllab_forget_the_past branch 5 times, most recently from dc1ef37 to a67faa0 Compare April 29, 2018 00:24
@timifasubaa
Copy link
Contributor Author

PING

Quick update on the state of the logic.
What is really happening is that the frontend ceases to poll for queries older than 6 hours old.
This is because on polls, it checks for all queries by that user whose state has changed after the last updated time. Meaning if a query is on pending for a very long time (> 6hours) and another query that's running at the same time changes state. We will no longer poll and also no longer receive any more updates.

Is this backend code worth having? Especially since it may trick people into believing we actually timeout the queries when the query will most likely not be updated to timeout.

@graceguo-supercat
Copy link

current auto query refresh component work like this:

  • in client-side, from all queries in local storage, it find out

    • status in 'running', 'started', 'pending', 'fetching', and
    • started in < 6 hours
  • if has any queries, polling with latest updated timestamp

  • if there is no such queries, stop polling.

  • in server-side, for every poll request, we get all queries

    • by user_id, and
    • changed after updated timestamp
  • if some query went wrong, and query status kept as running but never got to updated after 6 hours, when client-side polling with a newer timestamp (< 6 hours), this query will not have a chance to be set as timeout state. But it won't cause client-side keep polling because now client-side added timeout threshold.

this PR LGTM. but will not merge until @mistercrunch feel comfortable. Thank you!

@timifasubaa
Copy link
Contributor Author

PING @mistercrunch @john-bodley

return Object.values(queries)
.some(
q => ['running', 'started', 'pending', 'fetching'].indexOf(q.state) >= 0 &&
now - q.startDttm < 21600000,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No hard coded values please. Put it in src/SqlLab/constants.js

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@timifasubaa timifasubaa force-pushed the help_sqllab_forget_the_past branch from a67faa0 to ab958c6 Compare May 7, 2018 17:14
@timifasubaa timifasubaa merged commit d87504c into apache:master May 7, 2018
@mistercrunch
Copy link
Member

Any specific reason to not follow the default "Squash and merge" ?
screen shot 2018-05-07 at 11 19 50 pm

wenchma pushed a commit to wenchma/incubator-superset that referenced this pull request Nov 16, 2018
…e_past

[sqllab] Help sqllab forget query history
graceguo-supercat pushed a commit to graceguo-supercat/superset that referenced this pull request May 24, 2019
graceguo-supercat pushed a commit that referenced this pull request May 24, 2019
john-bodley pushed a commit that referenced this pull request May 24, 2019
(cherry picked from commit 2014329)
michellethomas pushed a commit to michellethomas/panoramix that referenced this pull request May 31, 2019
(cherry picked from commit 2014329)
michellethomas pushed a commit that referenced this pull request Jun 1, 2019
(cherry picked from commit 2014329)
cccs-rc pushed a commit to CybercentreCanada/superset that referenced this pull request Mar 6, 2024
(cherry picked from commit 03cae41)
cccs-rc pushed a commit to CybercentreCanada/superset that referenced this pull request Mar 6, 2024
(cherry picked from commit 2014329)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants