Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async support for the queries in the SQL Lab. #974

Merged
merged 3 commits into from
Aug 18, 2016

Conversation

bkyryliuk
Copy link
Member

Added async support:

  • Refactored the tasks.py to return the query id and added ability to run the code async.
  • Changed the celery_tests to be more end to end (calling run_sql endpoint instead of celery function)
  • Added implementation of async run of the hive and presto queries
  • Added dependencies required by pyhive
  • Implemented progress bar for the async queries
  • Added extra fields to the Query object to improve traceability: limit_used, select_as_cta_used, executed_sql, select_sql

Tackles backend of the: #858, #746, #886

Implement progress bar for the presto / hive queries

Implement remote query execution

It is a preliminary PR and some more things needed to be done.
TODO:

  • Unit test async queries (presto and hive engines)
  • Implement cancel_query endpoint
  • Add query_results endpoint

@bkyryliuk bkyryliuk force-pushed the bogdan/cta_as_separate_end_point branch 10 times, most recently from 37b5ba6 to 369c8e4 Compare August 18, 2016 02:21
@bkyryliuk bkyryliuk force-pushed the bogdan/cta_as_separate_end_point branch from 369c8e4 to 17a3044 Compare August 18, 2016 03:04
@bkyryliuk bkyryliuk merged this pull request into carapal Aug 18, 2016
mistercrunch pushed a commit that referenced this pull request Aug 18, 2016
* Refactor the query runner to enable async mode.

* Refactore the sql calling functions into the QueryRunner class.

* Clean up the celery tests.
@clayallsopp
Copy link

Hi, does this still work? The SQL Lab queries are running on the celery worker, but (as the code is now) the results aren't returned and I'm not quite seeing how they get persisted anywhere. I tried forking and always returning the result from get_sql_results, but no dice either =\

@mistercrunch mistercrunch deleted the bogdan/cta_as_separate_end_point branch October 14, 2016 06:51
@mistercrunch
Copy link
Member

@clayallsopp async works for "CREATE TABLE AS" queries only, when enabled.

We're planning on adding support for result-backends probably next week, where the results of queries are persisted in a key-value store of your choice, probably S3 for us but we may support more backends out of the box. I have to figure out if flask-cache is the right abstraction for it, something else perhaps or we write our own...

The challenge is that the web request scope is short, and that we want to provide support for long-running queries even when don't have write access to the DB.

@wyndhblb wyndhblb mentioned this pull request Oct 28, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants