Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: row limits & row count labels are confusing #27700

Merged
merged 7 commits into from
Apr 2, 2024
Merged

Conversation

mistercrunch
Copy link
Member

@mistercrunch mistercrunch commented Mar 27, 2024

SUMMARY

Currently, in explore & dashboard, we usually apply a row limit
on the query we issue against the database.

Following this, some visualization backend code does some
post-processing on data frames using pandas, typically pivoting
some things, which affects the result set's row count in
intricate capacities.

Now from a UX standpoint the user is exposed with:

  • row limits in the control panels
  • row count in various areas of the UI where visualization, preview,
    samples and raw results are shown

Currently we show the rowcount that's from row-processing. So maybe
a hard limit was applied at 1000 rows on the database, but we
pivot and it goes does to say 532, and we show the user that
we haven't hit the limit when we actually did.

Also note that the component that shows rowcount is supposed
to turn red if limit is hit, letting the user know they are looking at
truncated data.

This PR carries the right row count to the RowCountLabel above the visualization in explore so that it can turn red if the limit is reached reliably.

For now, this PR does not do anything new around the other RowCountLabel located in the RESULTS (south) pane. This area show the post-processed rows from cache, and seem like is has to show its matching post-processed row count. Note that the row count here is not guaranteed to match the one above.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Passing the actual row count returned from the database and using it to surface the [improved] tooltip
Screenshot 2024-03-28 at 5 43 43 PM

TESTING INSTRUCTIONS

@github-actions github-actions bot added the api Related to the REST API label Mar 27, 2024
Copy link

codecov bot commented Mar 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 69.82%. Comparing base (5b1d6b2) to head (e5de7e1).

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #27700      +/-   ##
==========================================
+ Coverage   67.38%   69.82%   +2.43%     
==========================================
  Files        1920     1920              
  Lines       75242    75247       +5     
  Branches     8423     8423              
==========================================
+ Hits        50705    52544    +1839     
+ Misses      22476    20642    -1834     
  Partials     2061     2061              
Flag Coverage Δ
hive 48.92% <0.00%> (?)
javascript 57.40% <100.00%> (-0.02%) ⬇️
mysql 77.88% <100.00%> (?)
postgres 78.00% <100.00%> (+<0.01%) ⬆️
presto 53.64% <100.00%> (?)
python 83.15% <100.00%> (+5.07%) ⬆️
sqlite 77.43% <100.00%> (+<0.01%) ⬆️
unit 56.77% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@villebro
Copy link
Member

This has definitely caused confusion over the years. Great initiative @mistercrunch !

@mistercrunch mistercrunch force-pushed the sql_rowcount branch 2 times, most recently from 6a1f175 to ac1d5f7 Compare March 29, 2024 00:07
@pull-request-size pull-request-size bot added size/L and removed size/M labels Mar 29, 2024
@mistercrunch mistercrunch marked this pull request as ready for review March 29, 2024 17:05
@eschutho
Copy link
Member

eschutho commented Apr 1, 2024

/testenv up

Copy link
Contributor

github-actions bot commented Apr 1, 2024

@eschutho Ephemeral environment spinning up at http://35.162.20.252:8080. Credentials are admin/admin. Please allow several minutes for bootstrapping and startup.

Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and it's a great improvement!

I wonder if we should standardize the name, though, using sql_rowcount in the backend and something like sqlRowCount in the frontend? (instead of just rowcount)

@mistercrunch
Copy link
Member Author

Yeah I wasn't sure either and there was not clear pattern established for me to follow. Looked that the person who wrote the code before went out of their way to go flat-case (coltypes, colnames, rowcount, ....).

I'm not sure what best practices are and what the most common on our repo is. If I was to take on more refactoring I think i'd tackle something around grouping objects under say a queryMetadata object and passing the object around as much as possible instead of spelling out all the attributes at every step of the way... That would allow say augmenting queryMetadata in the backend with an extra attribute and having it flow all the way through a bunch of objects. Merging for now, and we could revisit the camel-case-on-frontend-policy

Currently, in explore & dashboard, we usually apply a row limit
on the query we issue against the database.

Following this, some visualization backend code does some
post-processing on data frames using pandas, typically pivoting
some things, which affects the result set's row count in
intricate capacities.

Now from a UX standpoint the user is exposed with:
- row limits in the control panels
- row count in various areas of the UI where visualiztion, preview,
  samples and raw results are shown

Currently we show the rowcount that's from row-processing. So maybe
a hard limit was applied at 1000 rows on the database, but we
pivot and it goes does to say 532, and we show the user that
we haven't hit the limit when we actually did.

Also note that the component that shows rowcount is supposed
to turn red if limit is hit, letting the user know they are looking at
truncated data.
@github-actions github-actions bot added the github_actions Pull requests that update GitHub Actions code label Apr 2, 2024
@mistercrunch mistercrunch merged commit 12fe292 into master Apr 2, 2024
42 checks passed
@mistercrunch mistercrunch deleted the sql_rowcount branch April 2, 2024 20:58
Copy link
Contributor

github-actions bot commented Apr 2, 2024

Ephemeral environment shutdown and build artifacts deleted.

@michael-s-molina michael-s-molina added the v4.0 Label added by the release manager to track PRs to be included in the 4.0 branch label Apr 3, 2024
jzhao62 pushed a commit to jzhao62/superset that referenced this pull request Apr 4, 2024
EandrewJones pushed a commit to UMD-ARLIS/superset that referenced this pull request Apr 5, 2024
michael-s-molina pushed a commit that referenced this pull request Apr 8, 2024
EnxDev pushed a commit to EnxDev/superset that referenced this pull request Apr 12, 2024
qleroy pushed a commit to qleroy/superset that referenced this pull request Apr 28, 2024
@dosubot dosubot bot mentioned this pull request May 23, 2024
3 tasks
@mistercrunch mistercrunch added 🍒 4.0.1 🍒 4.0.2 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels labels Jul 24, 2024
vinothkumar66 pushed a commit to vinothkumar66/superset that referenced this pull request Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Related to the REST API 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels github_actions Pull requests that update GitHub Actions code packages plugins preset-io size/L v4.0 Label added by the release manager to track PRs to be included in the 4.0 branch 🍒 4.0.1 🍒 4.0.2 🚢 4.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants