Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: count(distinct column_name) in metrics #19842

Merged
merged 1 commit into from
Apr 26, 2022

Conversation

zhaoyongjie
Copy link
Member

@zhaoyongjie zhaoyongjie commented Apr 26, 2022

SUMMARY

The aggregate function count_distinct isn't ANSI SQL. Currently, when the user uses simple metric, the AdhocMetric control will generate a count_distinct(column) for the label. If the user switches to the SQL tab, this label will apply to SQL. It isn't ANSI SQL, so directly use this SQL snippet, the error will appear on many databases.

This PR transform count distinct metric from count_distinct(column) to count(distinct column), but does not change the original metric label(verbose name).

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

After

count.distinct.mov

TESTING INSTRUCTIONS

  1. select birth_name dataset.
  2. select count_distinct as AGGREGATE, and num as COLUMN in metric popover.
  3. query
  4. open count_distinct metric, and switch to custom SQL, verify the SQL is COUNT(DISTINCT num), but the label is still count_distinct(num).
  5. copy and paste the COUNT(DISTINCT num) in custom SQL, verify the label has been changed.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@codecov
Copy link

codecov bot commented Apr 26, 2022

Codecov Report

Merging #19842 (d6369fc) into master (523bd8b) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master   #19842   +/-   ##
=======================================
  Coverage   66.55%   66.55%           
=======================================
  Files        1692     1692           
  Lines       64802    64804    +2     
  Branches     6657     6657           
=======================================
+ Hits        43129    43131    +2     
  Misses      19973    19973           
  Partials     1700     1700           
Flag Coverage Δ
javascript 51.25% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...e/components/controls/MetricControl/AdhocMetric.js 95.65% <100.00%> (+0.19%) ⬆️
...ols/MetricControl/AdhocMetricEditPopover/index.jsx 76.28% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 523bd8b...d6369fc. Read the comment docs.

@zhaoyongjie zhaoyongjie requested review from kgabryje and ktmud April 26, 2022 05:32
Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing! I propose making a minor refactor of the code to make it more approachable for new developers, other than that LGTM 👍

Comment on lines 102 to +115
const column =
useVerboseName && this.column?.verbose_name
params.useVerboseName && this.column?.verbose_name
? `(${this.column.verbose_name})`
: this.column?.column_name
? `(${this.column.column_name})`
: '';
// transform from `count_distinct(column)` to `count(distinct column)`
if (
params.transformCountDistinct &&
aggregate === AGGREGATES.COUNT_DISTINCT &&
/^\(.*\)$/.test(column)
) {
return `COUNT(DISTINCT ${column.slice(1, -1)})`;
}
Copy link
Member

@villebro villebro Apr 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to reread these lines a few times to understand what was going on (mostly there from before this PR so not your fault!). IMO it would be more readable if we could first do something like

const column =
  params.useVerboseName && this.column?.verbose_name
    this.column.verbose_name
    : this.column?.column_name
    ? this.column.column_name
    : '';

and then something like

if (params.transformCountDistinct && aggregate === AGGREGATES.COUNT_DISTINCT) {
  return `COUNT(DISTINCT ${column})`;
}
return `${aggregate}($column)`;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous logic may not have been covered by UT, so I skipped these very carefully. I think it would be better to add some UTs to this part of the code and then modify it all together.

Of course, if this needs to be done in this PR, I'm all for it. What do you think about this?

Copy link
Member

@villebro villebro Apr 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, if you think this is dangerous to refactor let's do it in a separate PR 👍

Copy link
Member

@kgabryje kgabryje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm! 1 suggestion - should we rename COUNT_DISTINCT to COUNT DISTINCT on our list of aggregates in Simple tab?

@zhaoyongjie
Copy link
Member Author

Lgtm! 1 suggestion - should we rename COUNT_DISTINCT to COUNT DISTINCT on our list of aggregates in Simple tab?

Thanks @kgabryje! the count_distinct is used for a lot of places, and COUNT_DISTINCT also affects label(verbose name in query), so I didn't modify it.

@zhaoyongjie zhaoyongjie merged commit 25e572a into apache:master Apr 26, 2022
sadpandajoe pushed a commit to preset-io/superset that referenced this pull request Apr 27, 2022
@sadpandajoe
Copy link
Member

🏷️ preset:2022.17

hughhhh pushed a commit to hve-labs/superset that referenced this pull request May 11, 2022
philipher29 pushed a commit to ValtechMobility/superset that referenced this pull request Jun 9, 2022
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.0.0 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels preset:2022.17 size/M 🚢 2.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants