fix: stop query in SQL Lab with impala engine (#20950) #22441

wanghong1314 · 2022-12-16T14:48:24Z

fix(sqllab): Stop button for queries doesn't work in SQL Lab when using SQL Lab with impala engine and adding Progress Information
Fix the bug:#20950

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

BEFORE

AFTER

progress info

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

Has associated issue:
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

wanghong1314 · 2022-12-16T15:27:29Z

@rusackas @eschutho @yousoph Have some time to help check whether it fits the solution

codecov · 2022-12-16T15:43:34Z

Codecov Report

Merging #22441 (55fbb6d) into master (b6d39d1) will decrease coverage by 0.05%.
The diff coverage is 25.71%.

@@            Coverage Diff             @@
##           master   #22441      +/-   ##
==========================================
- Coverage   66.91%   66.85%   -0.06%     
==========================================
  Files        1851     1850       -1     
  Lines       70715    70768      +53     
  Branches     7766     7750      -16     
==========================================
- Hits        47320    47315       -5     
- Misses      21373    21437      +64     
+ Partials     2022     2016       -6

Flag	Coverage Δ
hive	`52.40% <25.71%> (-0.06%)`	⬇️
mysql	`?`
postgres	`77.90% <25.71%> (-0.13%)`	⬇️
presto	`52.30% <25.71%> (-0.06%)`	⬇️
python	`81.11% <25.71%> (-0.16%)`	⬇️
sqlite	`76.39% <25.71%> (-0.11%)`	⬇️
unit	`51.13% <25.71%> (-0.06%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
superset-frontend/src/SqlLab/actions/sqlLab.js	`63.47% <ø> (-0.10%)`	⬇️
superset/views/core.py	`74.69% <0.00%> (-0.29%)`	⬇️
superset/db_engine_specs/impala.py	`40.21% <25.00%> (-43.12%)`	⬇️
superset/config.py	`91.51% <100.00%> (+0.02%)`	⬆️
superset/common/utils/dataframe_utils.py	`90.47% <0.00%> (-4.77%)`	⬇️
superset/db_engine_specs/mysql.py	`94.04% <0.00%> (-4.77%)`	⬇️
superset/reports/commands/log_prune.py	`85.71% <0.00%> (-3.58%)`	⬇️
...a/dataset/AddDataset/DatasetPanel/DatasetPanel.tsx	`87.23% <0.00%> (-2.97%)`	⬇️
...frontend/src/views/CRUD/welcome/DashboardTable.tsx	`54.09% <0.00%> (-1.68%)`	⬇️
superset/commands/importers/v1/utils.py	`92.20% <0.00%> (-1.30%)`	⬇️
... and 30 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

eschutho · 2022-12-16T20:50:27Z

superset-frontend/src/SqlLab/actions/sqlLab.js

@@ -229,6 +229,9 @@ export function startQuery(query) {

 export function querySuccess(query, results) {
  return function (dispatch) {
+    if (!results.query) {


is this line here to prevent an error on line 239? If so, I think it's still useful to store the query in line 245 and just use optional chaining in case results.query doesn't exist. results?.query?.sqlEditorId

yeah ,to prevent query.sqlEditorId is undefined, I have tested your method, it is OK, thank you for your suggestion, I need to submit a PR again?

the code has been changed, please kindly review it again

docker/pythonpath_dev/superset_config.py

eschutho · 2022-12-16T21:00:08Z

superset/db_engine_specs/impala.py

+    ) -> None:  # pylint: disable=arguments-differ
+        # kwargs = {"async": async_}
+        try:
+            cursor.execute_async(query)


is the intention here to run execute_async if _async=True is passed in? If so, it looks like we should still keep the option to run this query synchronously.

since the impala engine does not support async with execute, it is possible that my comments are ambiguous

because the impala engine is asynchronous, the query status can be obtained

Can you run execute if synchronous and execute_async only if _async is True? That way we can still run synchronously as well.

Because you can't get the status of the query synchronously, you can't really stop the query, and the code is executed asynchronously by default

eschutho · 2022-12-16T21:04:26Z

superset/db_engine_specs/impala.py

+        # Refresh session so that the `query.status` and `query.extra.get(is_stopped)`
+        # modified in stop_query in views / core.py is reflected  here.
+        # stop query
+        if cls.is_cancel_query(cls, query, session, query_id):


I'm not too familiar with impala, to be honest, but for the other databases, we usually handle the cancel query functionality when the cancel_query method is called. It looks like it would be more efficient to move this logic into cancel_query in this db engine spec so that it is only run when we know that a cancel query has been requested, instead of checking on each cursor operation.

the impala query cancellation needs to be obtained in the cursor, similar to the hive engine query cancellation

this is my custom method, not the cancel_query method, because this method is used repeatedly

@john-bodley or @betodealmeida this is the way that we cancel queries on hive, but I'm wondering if it would be more efficient to use the cancel_query method instead, providing that you can get the cursor. WDYT?

also cc @villebro

eschutho

Thanks for the contribution @wanghong1314! I left a few comments/questions. Do you mind also adding some tests? There are examples for my_sql, postgres, snowflake, etc..

eschutho · 2022-12-16T21:07:26Z

@wanghong1314 there are also some failing CI checks. Let us know if you need any help resolving them. cc @betodealmeida for another set of eyes on this review.

wanghong1314 · 2022-12-17T03:02:57Z

@wanghong1314 there are also some failing CI checks. Let us know if you need any help resolving them. cc @betodealmeida for another set of eyes on this review.

@baldoalessandro please check that I have not changed the code of mysql and postgress related engine. Why does ci report an error?

wanghong1314 · 2022-12-17T10:39:52Z

@eschutho @bolkedebruin @eschutho I saw that the ci test passed, please help review the code, thank you

eschutho · 2023-01-03T23:01:14Z

@eschutho @bolkedebruin @eschutho I saw that the ci test passed, please help review the code, thank you

Thanks @wanghong1314. I'm going to defer to @betodealmeida or @villebro on whether the hive pattern is still the best option here.

waiting for other feedback

villebro

Left a few comments. I would suggest looking at a very recently merged PR of mine #22498 that may already solve this problem for you. More specifically, you may be able to leverage that QUERY_EARLY_CANCEL_KEY and avoid having to introduce the is_cancel_query method. Please ping me on Slack if you want to discuss this sync (we can hop on a zoom or similar if needed).

superset-frontend/src/SqlLab/actions/sqlLab.js

villebro · 2023-01-05T08:15:18Z

superset/config.py

+# Interval between consecutive polls when using Impala Engine
+IMPALA_POLL_INTERVAL = int(timedelta(seconds=5).total_seconds())


To make sure we don't clutter config.py with too many disparate config keys, maybe we should remove these *_POLL_INTERVAL keys and refactor this to something like

DB_POLL_INTERVAL_SECONDS: Dict[str, int] = {}

This could be used to specify these per engine name in your superset_config.py (here I'd be overriding polling to 1 seconds for Hive):

DB_POLL_INTERVAL_SECONDS = { "hive": int(timedelta(seconds=1).total_seconds()), }

I know it's a breaking change, so we can probably fall back to HIVE_POLL_INTERVAL in the hive spec for now. So maybe change the following line

superset/superset/db_engine_specs/hive.py

Line 378 in 01671b9

time.sleep(current_app.config["HIVE_POLL_INTERVAL"])

to something like this (see how I've moved the default to the engine spec, rather than having it in config.py):

if sleep_interval := current_app.config.get("HIVE_POLL_INTERVAL"): logger.warning("HIVE_POLL_INTERVAL is deprecated and will be removed in 3.0. Please use DB_POLL_INTERVAL instead") else: sleep_interval = current_app.config["DB_POLL_INTERVAL_SECONDS"].get(cls.engine, 5) time.sleep(sleep_interval)

This is a good idea. I will try it locally and then submit the code

pull-request-size bot added the size/L label Dec 16, 2022

wanghong1314 marked this pull request as draft December 16, 2022 15:01

wanghong1314 marked this pull request as ready for review December 16, 2022 15:04

rusackas mentioned this pull request Dec 16, 2022

fix: Stop query in SQL Lab with impala engine (#20950) #22409

Closed

9 tasks

eschutho reviewed Dec 16, 2022

View reviewed changes

docker/pythonpath_dev/superset_config.py Outdated Show resolved Hide resolved

eschutho reviewed Dec 16, 2022

View reviewed changes

eschutho previously requested changes Dec 16, 2022

View reviewed changes

wanghong1314 requested a review from eschutho December 17, 2022 10:28

rusackas requested a review from villebro December 21, 2022 17:54

rusackas requested a review from betodealmeida January 5, 2023 03:57

villebro reviewed Jan 5, 2023

View reviewed changes

wanghong1314 closed this Jan 7, 2023

wanghong1314 deleted the fix/impala_stop_query branch January 7, 2023 14:39

wanghong1314 restored the fix/impala_stop_query branch January 7, 2023 14:40

wanghong1314 deleted the fix/impala_stop_query branch January 7, 2023 14:40

wanghong1314 restored the fix/impala_stop_query branch January 7, 2023 14:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: stop query in SQL Lab with impala engine (#20950) #22441

fix: stop query in SQL Lab with impala engine (#20950) #22441

wanghong1314 commented Dec 16, 2022

wanghong1314 commented Dec 16, 2022

codecov bot commented Dec 16, 2022 •

edited

Loading

eschutho Dec 16, 2022 •

edited

Loading

wanghong1314 Dec 17, 2022

wanghong1314 Dec 17, 2022

eschutho Dec 16, 2022

wanghong1314 Dec 17, 2022

eschutho Dec 21, 2022

wanghong1314 Dec 21, 2022

eschutho Dec 16, 2022

wanghong1314 Dec 17, 2022

wanghong1314 Dec 17, 2022

eschutho Dec 21, 2022

eschutho Dec 21, 2022

eschutho left a comment

eschutho commented Dec 16, 2022

wanghong1314 commented Dec 17, 2022

wanghong1314 commented Dec 17, 2022

eschutho commented Jan 3, 2023

villebro left a comment

villebro Jan 5, 2023

wanghong1314 Jan 5, 2023

		# Interval between consecutive polls when using Impala Engine
		IMPALA_POLL_INTERVAL = int(timedelta(seconds=5).total_seconds())

fix: stop query in SQL Lab with impala engine (#20950) #22441

fix: stop query in SQL Lab with impala engine (#20950) #22441

Conversation

wanghong1314 commented Dec 16, 2022

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

wanghong1314 commented Dec 16, 2022

codecov bot commented Dec 16, 2022 • edited Loading

Codecov Report

eschutho Dec 16, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eschutho left a comment

Choose a reason for hiding this comment

eschutho commented Dec 16, 2022

wanghong1314 commented Dec 17, 2022

wanghong1314 commented Dec 17, 2022

eschutho commented Jan 3, 2023

villebro left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Dec 16, 2022 •

edited

Loading

eschutho Dec 16, 2022 •

edited

Loading