Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ParsedQuery subselect edge case #13602

Merged
merged 1 commit into from
Mar 12, 2021

Conversation

etr2460
Copy link
Member

@etr2460 etr2460 commented Mar 12, 2021

SUMMARY

I found that aliased subselects would break the logic to extract table names from a query. When using sqlparse, it seemed like the query got expanded more than expected when the alias was added:

>>> import sqlparse
>>> sqlparse.parse("SELECT f1, (SELECT count(1) FROM t2) FROM t1")[0]._pprint_tree()
|- 0 DML 'SELECT'
|- 1 Whitespace ' '
|- 2 Identifier 'f1'
|  `- 0 Name 'f1'
|- 3 Punctuation ','
|- 4 Whitespace ' '
|- 5 Parenthesis '(SELEC...'
|  |- 0 Punctuation '('
|  |- 1 DML 'SELECT'
|  |- 2 Whitespace ' '
|  |- 3 Function 'count(...'
|  |  |- 0 Identifier 'count'
|  |  |  `- 0 Name 'count'
|  |  `- 1 Parenthesis '(1)'
|  |     |- 0 Punctuation '('
|  |     |- 1 Integer '1'
|  |     `- 2 Punctuation ')'
|  |- 4 Whitespace ' '
|  |- 5 Keyword 'FROM'
|  |- 6 Whitespace ' '
|  |- 7 Identifier 't2'
|  |  `- 0 Name 't2'
|  `- 8 Punctuation ')'
|- 6 Whitespace ' '
|- 7 Keyword 'FROM'
|- 8 Whitespace ' '
`- 9 Identifier 't1'
   `- 0 Name 't1'
>>> sqlparse.parse("SELECT f1, (SELECT count(1) FROM t2) as f2 FROM t1")[0]._pprint_tree()
|- 0 DML 'SELECT'
|- 1 Whitespace ' '
|- 2 IdentifierList 'f1, (S...'
|  |- 0 Identifier 'f1'
|  |  `- 0 Name 'f1'
|  |- 1 Punctuation ','
|  |- 2 Whitespace ' '
|  `- 3 Identifier '(SELEC...'
|     |- 0 Parenthesis '(SELEC...'
|     |  |- 0 Punctuation '('
|     |  |- 1 DML 'SELECT'
|     |  |- 2 Whitespace ' '
|     |  |- 3 Function 'count(...'
|     |  |  |- 0 Identifier 'count'
|     |  |  |  `- 0 Name 'count'
|     |  |  `- 1 Parenthesis '(1)'
|     |  |     |- 0 Punctuation '('
|     |  |     |- 1 Integer '1'
|     |  |     `- 2 Punctuation ')'
|     |  |- 4 Whitespace ' '
|     |  |- 5 Keyword 'FROM'
|     |  |- 6 Whitespace ' '
|     |  |- 7 Identifier 't2'
|     |  |  `- 0 Name 't2'
|     |  `- 8 Punctuation ')'
|     |- 1 Whitespace ' '
|     |- 2 Keyword 'as'
|     |- 3 Whitespace ' '
|     `- 4 Identifier 'f2'
|        `- 0 Name 'f2'
|- 3 Whitespace ' '
|- 4 Keyword 'FROM'
|- 5 Whitespace ' '
`- 6 Identifier 't1'
   `- 0 Name 't1'

This adds logic to recursively explore sections wrapped in parens, which seems safe and didn't break any of the existing tests while allowing mine to pass. That said, would love any thoughts on a better way to do this too

TEST PLAN

CI

to: @villebro @lilykuang @serenajiang @bkyryliuk @michellethomas

@codecov
Copy link

codecov bot commented Mar 12, 2021

Codecov Report

Merging #13602 (4cc8dc5) into master (de0c6c9) will decrease coverage by 9.97%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #13602      +/-   ##
==========================================
- Coverage   80.93%   70.96%   -9.98%     
==========================================
  Files         304      828     +524     
  Lines       24807    41443   +16636     
  Branches        0     4300    +4300     
==========================================
+ Hits        20077    29409    +9332     
- Misses       4730    12034    +7304     
Flag Coverage Δ
cypress 56.72% <ø> (?)
hive ?
mysql 80.37% <100.00%> (ø)
postgres 80.42% <100.00%> (ø)
presto ?
python 80.51% <100.00%> (-0.42%) ⬇️
sqlite 80.03% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/sql_parse.py 99.38% <100.00%> (ø)
superset/db_engines/hive.py 0.00% <0.00%> (-85.72%) ⬇️
superset/db_engine_specs/hive.py 74.23% <0.00%> (-16.54%) ⬇️
superset/db_engine_specs/presto.py 81.79% <0.00%> (-6.91%) ⬇️
superset/views/database/mixins.py 81.03% <0.00%> (-1.73%) ⬇️
superset/db_engine_specs/base.py 85.67% <0.00%> (-0.49%) ⬇️
superset/models/core.py 88.55% <0.00%> (-0.28%) ⬇️
...ontrols/DndColumnSelectControl/DndMetricSelect.tsx 4.42% <0.00%> (ø)
...rset-frontend/src/components/ListView/ListView.tsx 92.04% <0.00%> (ø)
...teFilterControl/components/DateFunctionTooltip.tsx 100.00% <0.00%> (ø)
... and 521 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update de0c6c9...4cc8dc5. Read the comment docs.

Copy link
Contributor

@serenajiang serenajiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thiiink this should be safe. Can you check to make sure other uses of parentheses like SELECT f1, (x + y) AS f2 FROM t1 work as expected?

@@ -278,7 +285,11 @@ def _extract_from_token( # pylint: disable=too-many-branches
table_name_preceding_token = False

for item in token.tokens:
if item.is_group and not self._is_identifier(item):
if (item.is_group and not self._is_identifier(item)) or (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

item.is_group and (not self.is_identifier(item) or isinstance(item.tokens[0], Parenthesis))?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's equivalent, do you think it's as readable?

@etr2460 etr2460 force-pushed the erik-ritter--sql-parse-subselect branch from 62d304b to 4cc8dc5 Compare March 12, 2021 20:54
@etr2460
Copy link
Member Author

etr2460 commented Mar 12, 2021

add the parens test you recommended

@etr2460 etr2460 merged commit 06d6d7f into apache:master Mar 12, 2021
allanco91 pushed a commit to allanco91/superset that referenced this pull request May 21, 2021
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.2.0 labels Mar 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/S 🚢 1.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants