-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Stop query in SQL Lab with impala engine #22635
Conversation
Codecov Report
@@ Coverage Diff @@
## master #22635 +/- ##
==========================================
- Coverage 67.14% 67.11% -0.04%
==========================================
Files 1869 1869
Lines 71523 71582 +59
Branches 7814 7814
==========================================
+ Hits 48022 48040 +18
- Misses 21460 21501 +41
Partials 2041 2041
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
superset/config.py
Outdated
# Interval between consecutive polls when using Hive Engine | ||
HIVE_POLL_INTERVAL = int(timedelta(seconds=5).total_seconds()) | ||
# customize the polling time of each engine. The default time is 5 seconds | ||
DB_POLL_INTERVAL_SECONDS = { | ||
"hive": int(timedelta(seconds=5).total_seconds()), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't really define defaults in config.py
. Imagine if I want to override the poll interval for "impala" and add the following to superset_config.py
:
DB_POLL_INTERVAL_SECONDS = {"impala": 1}
In this case the default for "hive" in config.py
would be lost. This is why I recommended not defining the defaults in config.py
, and rather placing them in their respective db engine specs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thank you for your suggestion. I can let the user set it without setting the default value. I want to change it to this?
DB_POLL_INTERVAL_SECONDS = {}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add the type as that's the convention in config.py
and mypy can't infer the type from an empty dict:
DB_POLL_INTERVAL_SECONDS: Dict[str, int] = {}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to this way. It's true that mypy failed
# customize the polling time of each engine. The default time is 5 seconds | ||
DB_POLL_INTERVAL_SECONDS = { | ||
"hive": int(timedelta(seconds=5).total_seconds()), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is needed, as it's not changing the defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can restore the previous code without changing this file
superset/db_engine_specs/hive.py
Outdated
time.sleep(current_app.config["HIVE_POLL_INTERVAL"]) | ||
if sleep_interval := current_app.config.get("HIVE_POLL_INTERVAL"): | ||
logger.warning( | ||
"HIVE_POLL_INTERVAL is deprecated and will be removed in 3.0. Please use DB_POLL_INTERVAL instead" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The deprecation warning is incorrect:
"HIVE_POLL_INTERVAL is deprecated and will be removed in 3.0. Please use DB_POLL_INTERVAL instead" | |
"HIVE_POLL_INTERVAL is deprecated and will be removed in 3.0. Please use DB_POLL_INTERVAL_SECONDS instead" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I will check it carefully before submitting next time, and I will change it
superset/db_engine_specs/impala.py
Outdated
|
||
class ImpalaEngineSpec(BaseEngineSpec): | ||
"""Engine spec for Cloudera's Impala""" | ||
|
||
engine = "impala" | ||
engine_name = "Apache Impala" | ||
# Query 5543ffdf692b7d02:f78a944000000000: 3% Complete (17 out of 547) | ||
query_progress_r = re.compile(r".*Query.*: (?P<query_progress>[0-9]+)%.*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do we really need the leading and trailing .*
here?
query_progress_r = re.compile(r".*Query.*: (?P<query_progress>[0-9]+)%.*") | |
query_progress_r = re.compile(r"Query.*: (?P<query_progress>[0-9]+)%") |
Also, I know this is in line with what's being done in hive.py
, but I would consider moving this outside the class into a constant QUERY_PROGRESS_REGEX
in impala.py
, as it's not defined in BaseEngineSpec
(defining it here makes it appear like we're overriding a class attribute in BaseEngineSpec
). While at it, maybe do the same for stage_progress_r
in HiveEngineSpec
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for all the iterations 👍
fix(sqllab): Stop button for queries doesn't work in SQL Lab when using SQL Lab with impala engine and adding Progress Information
SUMMARY
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
BEFORE
AFTER
progress info
TESTING INSTRUCTIONS
ADDITIONAL INFORMATION