fix: ParsedQuery subselect edge case #13602

etr2460 · 2021-03-12T17:04:59Z

SUMMARY

I found that aliased subselects would break the logic to extract table names from a query. When using sqlparse, it seemed like the query got expanded more than expected when the alias was added:

>>> import sqlparse
>>> sqlparse.parse("SELECT f1, (SELECT count(1) FROM t2) FROM t1")[0]._pprint_tree()
|- 0 DML 'SELECT'
|- 1 Whitespace ' '
|- 2 Identifier 'f1'
|  `- 0 Name 'f1'
|- 3 Punctuation ','
|- 4 Whitespace ' '
|- 5 Parenthesis '(SELEC...'
|  |- 0 Punctuation '('
|  |- 1 DML 'SELECT'
|  |- 2 Whitespace ' '
|  |- 3 Function 'count(...'
|  |  |- 0 Identifier 'count'
|  |  |  `- 0 Name 'count'
|  |  `- 1 Parenthesis '(1)'
|  |     |- 0 Punctuation '('
|  |     |- 1 Integer '1'
|  |     `- 2 Punctuation ')'
|  |- 4 Whitespace ' '
|  |- 5 Keyword 'FROM'
|  |- 6 Whitespace ' '
|  |- 7 Identifier 't2'
|  |  `- 0 Name 't2'
|  `- 8 Punctuation ')'
|- 6 Whitespace ' '
|- 7 Keyword 'FROM'
|- 8 Whitespace ' '
`- 9 Identifier 't1'
   `- 0 Name 't1'
>>> sqlparse.parse("SELECT f1, (SELECT count(1) FROM t2) as f2 FROM t1")[0]._pprint_tree()
|- 0 DML 'SELECT'
|- 1 Whitespace ' '
|- 2 IdentifierList 'f1, (S...'
|  |- 0 Identifier 'f1'
|  |  `- 0 Name 'f1'
|  |- 1 Punctuation ','
|  |- 2 Whitespace ' '
|  `- 3 Identifier '(SELEC...'
|     |- 0 Parenthesis '(SELEC...'
|     |  |- 0 Punctuation '('
|     |  |- 1 DML 'SELECT'
|     |  |- 2 Whitespace ' '
|     |  |- 3 Function 'count(...'
|     |  |  |- 0 Identifier 'count'
|     |  |  |  `- 0 Name 'count'
|     |  |  `- 1 Parenthesis '(1)'
|     |  |     |- 0 Punctuation '('
|     |  |     |- 1 Integer '1'
|     |  |     `- 2 Punctuation ')'
|     |  |- 4 Whitespace ' '
|     |  |- 5 Keyword 'FROM'
|     |  |- 6 Whitespace ' '
|     |  |- 7 Identifier 't2'
|     |  |  `- 0 Name 't2'
|     |  `- 8 Punctuation ')'
|     |- 1 Whitespace ' '
|     |- 2 Keyword 'as'
|     |- 3 Whitespace ' '
|     `- 4 Identifier 'f2'
|        `- 0 Name 'f2'
|- 3 Whitespace ' '
|- 4 Keyword 'FROM'
|- 5 Whitespace ' '
`- 6 Identifier 't1'
   `- 0 Name 't1'

This adds logic to recursively explore sections wrapped in parens, which seems safe and didn't break any of the existing tests while allowing mine to pass. That said, would love any thoughts on a better way to do this too

TEST PLAN

CI

to: @villebro @lilykuang @serenajiang @bkyryliuk @michellethomas

codecov · 2021-03-12T17:22:15Z

Codecov Report

Merging #13602 (4cc8dc5) into master (de0c6c9) will decrease coverage by 9.97%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master   #13602      +/-   ##
==========================================
- Coverage   80.93%   70.96%   -9.98%     
==========================================
  Files         304      828     +524     
  Lines       24807    41443   +16636     
  Branches        0     4300    +4300     
==========================================
+ Hits        20077    29409    +9332     
- Misses       4730    12034    +7304

Flag	Coverage Δ
cypress	`56.72% <ø> (?)`
hive	`?`
mysql	`80.37% <100.00%> (ø)`
postgres	`80.42% <100.00%> (ø)`
presto	`?`
python	`80.51% <100.00%> (-0.42%)`	⬇️
sqlite	`80.03% <100.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
superset/sql_parse.py	`99.38% <100.00%> (ø)`
superset/db_engines/hive.py	`0.00% <0.00%> (-85.72%)`	⬇️
superset/db_engine_specs/hive.py	`74.23% <0.00%> (-16.54%)`	⬇️
superset/db_engine_specs/presto.py	`81.79% <0.00%> (-6.91%)`	⬇️
superset/views/database/mixins.py	`81.03% <0.00%> (-1.73%)`	⬇️
superset/db_engine_specs/base.py	`85.67% <0.00%> (-0.49%)`	⬇️
superset/models/core.py	`88.55% <0.00%> (-0.28%)`	⬇️
...ontrols/DndColumnSelectControl/DndMetricSelect.tsx	`4.42% <0.00%> (ø)`
...rset-frontend/src/components/ListView/ListView.tsx	`92.04% <0.00%> (ø)`
...teFilterControl/components/DateFunctionTooltip.tsx	`100.00% <0.00%> (ø)`
... and 521 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update de0c6c9...4cc8dc5. Read the comment docs.

serenajiang

I thiiink this should be safe. Can you check to make sure other uses of parentheses like SELECT f1, (x + y) AS f2 FROM t1 work as expected?

serenajiang · 2021-03-12T19:07:45Z

superset/sql_parse.py

@@ -278,7 +285,11 @@ def _extract_from_token(  # pylint: disable=too-many-branches
        table_name_preceding_token = False

        for item in token.tokens:
-            if item.is_group and not self._is_identifier(item):
+            if (item.is_group and not self._is_identifier(item)) or (


item.is_group and (not self.is_identifier(item) or isinstance(item.tokens[0], Parenthesis))?

i think it's equivalent, do you think it's as readable?

etr2460 · 2021-03-12T20:55:09Z

add the parens test you recommended

pull-request-size bot added the size/S label Mar 12, 2021

serenajiang approved these changes Mar 12, 2021

View reviewed changes

fix: ParsedQuery subselect edge case

4cc8dc5

etr2460 force-pushed the erik-ritter--sql-parse-subselect branch from 62d304b to 4cc8dc5 Compare March 12, 2021 20:54

etr2460 merged commit 06d6d7f into apache:master Mar 12, 2021

allanco91 pushed a commit to allanco91/superset that referenced this pull request May 21, 2021

fix: ParsedQuery subselect edge case (apache#13602)

c3072ce

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.2.0 labels Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: ParsedQuery subselect edge case #13602

fix: ParsedQuery subselect edge case #13602

etr2460 commented Mar 12, 2021 •

edited

Loading

codecov bot commented Mar 12, 2021 •

edited

Loading

serenajiang left a comment

serenajiang Mar 12, 2021

etr2460 Mar 12, 2021

etr2460 commented Mar 12, 2021

fix: ParsedQuery subselect edge case #13602

fix: ParsedQuery subselect edge case #13602

Conversation

etr2460 commented Mar 12, 2021 • edited Loading

SUMMARY

TEST PLAN

codecov bot commented Mar 12, 2021 • edited Loading

Codecov Report

serenajiang left a comment

Choose a reason for hiding this comment

serenajiang Mar 12, 2021

Choose a reason for hiding this comment

etr2460 Mar 12, 2021

Choose a reason for hiding this comment

etr2460 commented Mar 12, 2021

etr2460 commented Mar 12, 2021 •

edited

Loading

codecov bot commented Mar 12, 2021 •

edited

Loading