Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: the samples endpoint supports filters and pagination #20683

Merged
merged 12 commits into from
Jul 22, 2022

Conversation

zhaoyongjie
Copy link
Member

@zhaoyongjie zhaoyongjie commented Jul 12, 2022

SUMMARY

The original Superset had been using a query API(/api/v1/chart/data) to get data samples, but was refactored to use a separate API for getting data samples. The purpose of the refactoring is to get samples by passing only dataset/datasouce id.

The recent PR introduced multiple datasouce type and duplicate samples codes.

The purpose of this PR is to merge all samples logic and to provide filtering and pagination on top of the original samples API.

We can remove the original sample function from the /api/v1/chart/data, In the separate PR.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

pure backend feature, so that there aren't screenshots.

TESTING INSTRUCTIONS

  1. query without post body
curl -X 'POST' \
  'http://localhost:8088/datasource/samples?force=false&datasource_id=2&datasource_type=table' \
  -H 'accept: application/json' \
  -H 'Authorization: some-token'
  1. query with the post body
curl -X 'POST' \
  'http://localhost:8088/datasource/samples?force=false&datasource_id=2&datasource_type=table' \
  -H 'accept: application/json' \
  -H 'Authorization: some-token' \
  -H 'Content-Type: application/json' \
  -d '{"filters": [{"col": "foo", "op": "==", "val": "foo2"}]}'
  1. query with the post body and pagination
curl -X 'POST' \
  'http://localhost:8088/datasource/samples?force=false&datasource_id=2&datasource_type=table&page=1&per_page=100' \
  -H 'accept: application/json' \
  -H 'Authorization: some-token' \
  -H 'Content-Type: application/json' \
  -d '{"filters": [{"col": "foo", "op": "==", "val": "foo2"}]}'

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@zhaoyongjie zhaoyongjie marked this pull request as ready for review July 12, 2022 14:20
@zhaoyongjie zhaoyongjie changed the title feat: the samples endpoint support filters feat: the samples endpoint supports filters Jul 12, 2022
@codecov
Copy link

codecov bot commented Jul 12, 2022

Codecov Report

Merging #20683 (2677af5) into master (922b4b8) will decrease coverage by 0.15%.
The diff coverage is 97.53%.

❗ Current head 2677af5 differs from pull request most recent head 3d8cd90. Consider uploading reports for the commit 3d8cd90 to get more accurate results

@@            Coverage Diff             @@
##           master   #20683      +/-   ##
==========================================
- Coverage   66.29%   66.14%   -0.16%     
==========================================
  Files        1758     1757       -1     
  Lines       66801    66742      -59     
  Branches     7055     7055              
==========================================
- Hits        44288    44148     -140     
- Misses      20713    20794      +81     
  Partials     1800     1800              
Flag Coverage Δ
hive ?
mysql ?
postgres 81.15% <97.46%> (+0.10%) ⬆️
presto ?
python 81.21% <97.46%> (-0.29%) ⬇️
sqlite 79.77% <97.46%> (+0.22%) ⬆️
unit ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/views/datasource/utils.py 95.34% <95.34%> (ø)
...erset-frontend/src/components/Chart/chartAction.js 55.92% <100.00%> (ø)
superset/datasets/api.py 88.34% <100.00%> (-0.19%) ⬇️
superset/explore/api.py 86.36% <100.00%> (+15.15%) ⬆️
superset/views/datasource/schemas.py 100.00% <100.00%> (ø)
superset/views/datasource/views.py 97.08% <100.00%> (+0.42%) ⬆️
superset/db_engines/hive.py 0.00% <0.00%> (-85.19%) ⬇️
superset/db_engine_specs/hive.py 70.22% <0.00%> (-15.65%) ⬇️
superset/db_engine_specs/presto.py 83.64% <0.00%> (-5.39%) ⬇️
superset/common/utils/dataframe_utils.py 90.47% <0.00%> (-4.77%) ⬇️
... and 7 more

@zhaoyongjie zhaoyongjie requested a review from ktmud July 12, 2022 15:27
@zhaoyongjie zhaoyongjie force-pushed the feat/samples_filter branch 3 times, most recently from 594745a to 6dca2b9 Compare July 13, 2022 08:56
@zhaoyongjie zhaoyongjie force-pushed the feat/samples_filter branch from 6dca2b9 to b7edacf Compare July 14, 2022 08:01
@zhaoyongjie zhaoyongjie changed the title feat: the samples endpoint supports filters feat: the samples endpoint supports filters and pagination Jul 14, 2022
Comment on lines 130 to 132
if limit < 0 or limit > samples_row_limit:
# reset limit value if input is invalid
limit = samples_row_limit
Copy link
Member Author

@zhaoyongjie zhaoyongjie Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be easily refactored to raise an exception if needed

Comment on lines 135 to 137
if offset < 0:
# reset offset value if input is invalid
offset = 0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same before

Copy link
Member

@michael-s-molina michael-s-molina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @zhaoyongjie. I left some first-pass comments.

superset/datasets/api.py Outdated Show resolved Hide resolved
superset/datasets/commands/samples.py Outdated Show resolved Hide resolved
superset/datasets/commands/samples.py Outdated Show resolved Hide resolved
superset/datasets/commands/samples.py Outdated Show resolved Hide resolved
# reset limit value if input is invalid
limit = samples_row_limit

offset = (int(page) - 1) * limit
Copy link
Member

@codyml codyml Jul 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the frontend defaults to 0-indexed page numbers.

Copy link
Member Author

@zhaoyongjie zhaoyongjie Jul 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The page number starts from 1 instead of 0.

Copy link
Member

@geido geido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits. Would love to test this out once CI passes

superset/views/datasource/utils.py Show resolved Hide resolved
tests/integration_tests/datasource_tests.py Outdated Show resolved Hide resolved
@zhaoyongjie zhaoyongjie force-pushed the feat/samples_filter branch from db8382b to ea4aa7f Compare July 20, 2022 13:14
@zhaoyongjie zhaoyongjie force-pushed the feat/samples_filter branch from ea4aa7f to 1a7b381 Compare July 21, 2022 01:19
@codyml
Copy link
Member

codyml commented Jul 21, 2022

@zhaoyongjie This looks good to me! One question, are we not considering this part of the public API, and that's why it's not under the /v1 prefix?

@michael-s-molina
Copy link
Member

@zhaoyongjie Thank you so much for cleaning the duplicated samples endpoint in datasets/api and explore/api. May I suggest creating the endpoint under api/v1/dataset/samples instead of superset/views? This would help with the new API migration.

@zhaoyongjie
Copy link
Member Author

zhaoyongjie commented Jul 22, 2022

@zhaoyongjie Thank you so much for cleaning the duplicated samples endpoint in datasets/api and explore/api. May I suggest creating the endpoint under api/v1/dataset/samples instead of superset/views? This would help with the new API migration.

@codyml @michael-s-molina this is an interesting question, because now it is not dataset samples but different types of datasouce samples, so it should not be put under dataset, ----- introduced different datasource from PR ---- That's why I moved it to datasource.

v1 API still has a lot of work to do, for example:

  1. a lot of error handles(try...except statement) v1 are not well solved
  2. /api/v1/chart/data has lots of duplicated samples code
  3. the datasource has many api's waiting to be migrated

So the view and v1 api will coexist for a long time. I will address these issues in the seprated PR.

@michael-s-molina
Copy link
Member

@codyml @michael-s-molina this is an interesting question, because now it is not dataset samples but different types of datasouce samples, so it should not be put under dataset, ----- introduced different datasource from PR ---- That's why I moved it to datasource.

Thank you for the explanation @zhaoyongjie. It seems we're missing this new concept in v1 as you said.

About deleting the old endpoints. I did something slightly different for the old /superset/explore endpoint in views/core. I added a warning indicating that the endpoint will be deprecated in 3.0 and redirected the user to the new endpoint. This is debatable because I'm not sure we're considering changes to endpoints that are not under v1 breaking changes. I took the safest approach but I leave the decision to you if you also want to follow a similar pattern or if you don't consider this a breaking change.

@zhaoyongjie zhaoyongjie merged commit f011aba into apache:master Jul 22, 2022
@AAfghahi AAfghahi added the logging Creates a UI or API endpoint that could benefit from logging. label Aug 5, 2022
@Smallhi
Copy link

Smallhi commented Dec 10, 2023

Does the superset-fronted support sample filters when request sample ? I've tested the v3.1, it did't work. For hive, presto, this may cause all partitions scan.

@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.1.0 and removed 🚢 2.1.3 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels logging Creates a UI or API endpoint that could benefit from logging. size/XL 🚢 2.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants