Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(db_engine_specs): big query cost estimation #21325

Merged
merged 19 commits into from
Jan 9, 2023

Conversation

zamar-roura
Copy link
Contributor

SUMMARY

We are adding four methods inside superset/db_engine_specs/bigquery.py

For now only Postgres and Presto had cost estimation capabilities. For that they used the cursor object.
Bigquery needs to use a dry run to get the query cost estimation. Sadly, you can't use the cursor to execute dry runs (I have tried). And that's why there is a need to override the two functions. estimate_statement_cost and estimate_query_cost. The other two methods get_allow_cost_estimate and query_cost_formatter are just added because the base class doesn't have functionality or gives a False.

Also the QueryEditor.py object doesn't have sqlEditorId anymore in it's variables and the Cost Estimation Reducer is looking for it, thus giving error, that is also changed.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

BEFORE

image

After
image

TESTING INSTRUCTIONS

Add ESTIMATE_QUERY_COST = True inside superset/config.py.
Add a valid BigQuery database, activate option Enable query cost estimation inside advanced options.
Go to SQL Lab and put a valid query, then press Estimate Cost button.

ADDITIONAL INFORMATION

  • [ X] Has associated issue: [SIP-88] BigQuery Estimate Cost Button for SQL Lab #20832
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • [ X] Introduces new feature or API
  • Removes existing feature or API

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congrats on making your first PR and thank you for contributing to Superset! 🎉 ❤️
We hope to see you in our Slack community too!

@villebro villebro self-requested a review September 6, 2022 07:52
@zamar-roura
Copy link
Contributor Author

Thanks for running the tests.

I'll check on the lint errors and try to see why the cypress test is showing the "cant create chart from query error. Maybe the estimate cost button is changing the selector of the other button? Lets see.

@zamar-roura zamar-roura changed the title Feat(db_engine_specs) big query cost estimation Feat(db_engine_specs): Big query cost estimation Sep 27, 2022
@zamar-roura zamar-roura changed the title Feat(db_engine_specs): Big query cost estimation feat(db_engine_specs): Big query cost estimation Sep 27, 2022
@codecov
Copy link

codecov bot commented Sep 27, 2022

Codecov Report

Merging #21325 (ef5c3b1) into master (9cfbc22) will decrease coverage by 0.03%.
The diff coverage is 18.75%.

@@            Coverage Diff             @@
##           master   #21325      +/-   ##
==========================================
- Coverage   67.03%   66.99%   -0.04%     
==========================================
  Files        1859     1859              
  Lines       71043    71090      +47     
  Branches     7776     7776              
==========================================
+ Hits        47622    47630       +8     
- Misses      21397    21436      +39     
  Partials     2024     2024              
Flag Coverage Δ
hive 52.43% <18.75%> (-0.05%) ⬇️
javascript 53.85% <ø> (ø)
mysql 77.96% <18.75%> (-0.09%) ⬇️
postgres 78.03% <18.75%> (-0.09%) ⬇️
presto 52.33% <18.75%> (-0.05%) ⬇️
python 81.29% <18.75%> (-0.09%) ⬇️
sqlite 76.41% <18.75%> (-0.09%) ⬇️
unit 51.43% <18.75%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset-frontend/src/SqlLab/reducers/sqlLab.js 36.87% <ø> (ø)
.../CRUD/data/database/DatabaseModal/ExtraOptions.tsx 70.37% <ø> (ø)
superset/config.py 91.64% <ø> (-0.03%) ⬇️
superset/db_engine_specs/bigquery.py 69.60% <18.75%> (-12.09%) ⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@zamar-roura zamar-roura changed the title feat(db_engine_specs): Big query cost estimation feat(db_engine_specs): big query cost estimation Sep 29, 2022
@zamar-roura
Copy link
Contributor Author

@villebro is it okay like this? The cypress E2E tests sometimes work sometimes don't, but everything else is fixed and working.

@zamar-roura
Copy link
Contributor Author

Do I need to do something else to merge this?

@fernandofernandezgonzalez

Wowwww!!! This is a so, so much needed feature!!

@villebro
Copy link
Member

@zamar-roura would you be able to rebase the PR to resolve the conflicts?

@zamar-roura zamar-roura force-pushed the feat--Big-Query-Cost-Estimation branch from f855f45 to 58705fa Compare November 26, 2022 23:33
@zamar-roura
Copy link
Contributor Author

@villebro Tell me if you need anything else. I'll be

The reducer/SqlLab.js needed some fixes in the ESTIMATE QUERY action as action.query didn't have any sqlEditorId parameter. It always comes as action.query.id not action.query.sqlEditorId.

@villebro
Copy link
Member

Thanks @zamar-roura - I'll do a review + test pass tomorrow 👍

@villebro villebro self-assigned this Nov 28, 2022
@zamar-roura
Copy link
Contributor Author

Hi @villebro, I'm at your full disposal to improve the merge as much as needed, it's my first merge in superset, if anything is not as it should be tell me and I'll put my full effort into it.

@villebro
Copy link
Member

villebro commented Dec 7, 2022

@zamar-roura I haven't forgotten about this, I've just been insanely busy. But I'll do my absolute best to review this today!

@zamar-roura
Copy link
Contributor Author

@villebro I totally understand, thank you so much for the quick reply!

@villebro
Copy link
Member

villebro commented Dec 8, 2022

Restarted CI, the cypress workflow seems to be having issues

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution and for fixing this, as this feature appears to be broken right now 🙁 Added a few comments after my first pass review and test.

Comment on lines 876 to 877
# The feature is off by default, and currently only supported in Presto and Postgres,
# and Bigquery.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current config, docs and the examples in config.py are in a pretty horrible state right now. I observed the following:

  1. The ESTIMATE_QUERY_COST config is in fact a feature flag and needs to be moved to DEFAULT_FEATURE_FLAGS:
    ESTIMATE_QUERY_COST = False
    This should be updated
  2. This example is broken:
    # "QUERY_COST_FORMATTERS_BY_ENGINE": {"postgresql": postgres_query_cost_formatter},
    . It should in fact be # QUERY_COST_FORMATTERS_BY_ENGINE = {"postgresql": postgres_query_cost_formatter}. This should also be updated.
  3. There are no docs for this feature. In the deprecated docs there's a SQL Lab section that seems to have been lost over the years, and the content is also incorrect: https://apache-superset.readthedocs.io/en/latest/sqllab.html I don't expect you to add docs for this, but just mentioning this here to call attention to it. FYI @rusackas

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the callout... putting it on my very long to-do list!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the things and added some more comments.

About the docs I would love to fix the content. Only thing is that the superset/blob/master/docs/sqllab.rst file doesnt even exist now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zamar-roura let's leave the docs for a follow-up PR 👍

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few last minute comments/questions

superset/config.py Outdated Show resolved Hide resolved
Comment on lines 427 to 446
# Format Bytes.
# TODO: In case more db engine specs need to be added,
# this should be made a function outside this scope.
byte_division = 1024
if hasattr(query_job, "total_bytes_processed"):
query_bytes_processed = query_job.total_bytes_processed
if query_bytes_processed // byte_division == 0:
byte_type = "B"
total_bytes_processed = query_bytes_processed
elif query_bytes_processed // (byte_division**2) == 0:
byte_type = "KB"
total_bytes_processed = round(query_bytes_processed / byte_division, 2)
elif query_bytes_processed // (byte_division**3) == 0:
byte_type = "MB"
total_bytes_processed = round(
query_bytes_processed / (byte_division**2), 2
)
else:
byte_type = "GB"
total_bytes_processed = round(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a future cleanup PR, we should really harmonize thees across specs (I seem to recall someone already did this, but I was unable to find it..)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, after pushing this I saw that there was an early PR of 2022 that did humanize it #18694

superset/db_engine_specs/bigquery.py Outdated Show resolved Hide resolved
@zamar-roura
Copy link
Contributor Author

Hi @villebro. I don't remember if there was anything else we needed to do. Its a busy end of the year and also full of festivities so if you want we can see what´s needed after all this ends, cheers!

@zamar-roura
Copy link
Contributor Author

@villebro I'll be checking from time to time to resolve merge conflicts

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for your patience with this PR. Also, thanks for bringing up #18694 which I appear to have dropped the ball on 🙁

@villebro villebro merged commit 001100d into apache:master Jan 9, 2023
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.1.0 and removed 🚢 2.1.3 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 2.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants