Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: pandas bug when data is blank on post-processing #20629

Merged
merged 2 commits into from
Jul 8, 2022

Conversation

eschutho
Copy link
Member

@eschutho eschutho commented Jul 6, 2022

SUMMARY

There's a bug in post processing for tables and pivot-table charts when the data is empty for both json and csv formats. We'll now just return the original results instead of trying to apply any post-processing on it.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Currently raises a 500

TESTING INSTRUCTIONS

Create a chart with no results for a pivot table and then try to export pivoted results as csv. This would also break on alert/reports when formatting the report as a csv.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

try:
df = pd.DataFrame.from_dict(query["data"])
except ValueError: # no data error
return result
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to continue here and in line 336, since other queries might have data (and if they also don't we'll end up returning result unmodified).

Copy link
Member

@john-bodley john-bodley Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is query["data"] in this case? Would it be preferable to first check whether query["data"] is valid? Per your unit tests it seems like this might be an empty string—the worst of the worst—and maybe we could/should fix this upstream and have it be None, i.e., the following works:

>>> import pandas as pd
>>> pd.DataFrame.from_dict(None)
Empty DataFrame
Columns: []
Index: []

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@betodealmeida if the data is None or '', is there any value in continuing this process rather than returning early? AFAICT we'll continue to get more errors down below as well. Per @john-bodley's point, I can do a nullish check instead of the try/except if we want to be more specific to these errors.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you continue here it would skip to the next step of the for query in result["queries"] loop, so it wouldn't get more errors. There could be other queries in result["queries"] that have non-blank data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see what you're saying. I also added a new test for multiple queries.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on @john-bodley's comment, i added a few more tests when data is None and found a case where it errors later in the code, so I put a nullish check like he suggested instead of the try/except.

@codecov
Copy link

codecov bot commented Jul 6, 2022

Codecov Report

Merging #20629 (5a632fd) into master (b39a3d8) will decrease coverage by 0.14%.
The diff coverage is 89.06%.

❗ Current head 5a632fd differs from pull request most recent head c701d6f. Consider uploading reports for the commit c701d6f to get more accurate results

@@            Coverage Diff             @@
##           master   #20629      +/-   ##
==========================================
- Coverage   66.82%   66.67%   -0.15%     
==========================================
  Files        1752     1752              
  Lines       65616    65570      -46     
  Branches     6938     6938              
==========================================
- Hits        43849    43722     -127     
- Misses      20007    20088      +81     
  Partials     1760     1760              
Flag Coverage Δ
hive ?
mysql 82.45% <93.86%> (+0.07%) ⬆️
postgres 82.53% <93.86%> (+0.07%) ⬆️
presto ?
python 82.61% <93.86%> (-0.29%) ⬇️
sqlite 82.31% <93.86%> (+0.07%) ⬆️
unit ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...ackages/superset-ui-core/src/utils/featureFlags.ts 100.00% <ø> (ø)
...frontend/plugins/plugin-chart-table/src/Styles.tsx 100.00% <ø> (ø)
...nd/plugins/plugin-chart-table/src/controlPanel.tsx 43.47% <ø> (ø)
...perset-frontend/src/addSlice/AddSliceContainer.tsx 66.66% <ø> (ø)
...frontend/src/components/DatabaseSelector/index.tsx 88.88% <ø> (ø)
...end/src/components/Datasource/DatasourceEditor.jsx 65.61% <ø> (ø)
...et-frontend/src/components/EditableTitle/index.tsx 64.28% <ø> (ø)
...frontend/src/components/ImportModal/ErrorAlert.tsx 33.33% <ø> (ø)
...src/dashboard/components/PropertiesModal/index.tsx 62.98% <ø> (ø)
...ersConfigModal/FiltersConfigForm/DatasetSelect.tsx 40.00% <ø> (ø)
... and 141 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b39a3d8...c701d6f. Read the comment docs.

@eschutho eschutho force-pushed the elizabeth/fix-pandas-bug branch from f6cebfe to 1cf4df0 Compare July 7, 2022 23:21
@eschutho eschutho force-pushed the elizabeth/fix-pandas-bug branch from 1cf4df0 to c701d6f Compare July 7, 2022 23:50
@pull-request-size pull-request-size bot added size/XL and removed size/L labels Jul 7, 2022
Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@eschutho eschutho merged commit c2be54c into apache:master Jul 8, 2022
@eschutho eschutho deleted the elizabeth/fix-pandas-bug branch July 8, 2022 23:33
jinghua-qa pushed a commit to preset-io/superset that referenced this pull request Jul 8, 2022
* fix pandas bug when data is blank on post-processing

* account for multiple queries when data is blank

(cherry picked from commit c2be54c)
jinghua-qa pushed a commit to preset-io/superset that referenced this pull request Jul 12, 2022
* fix pandas bug when data is blank on post-processing

* account for multiple queries when data is blank

(cherry picked from commit c2be54c)
jinghua-qa pushed a commit to preset-io/superset that referenced this pull request Jul 12, 2022
* fix pandas bug when data is blank on post-processing

* account for multiple queries when data is blank

(cherry picked from commit c2be54c)
akshatsri pushed a commit to charan1314/superset that referenced this pull request Jul 19, 2022
* fix pandas bug when data is blank on post-processing

* account for multiple queries when data is blank
@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.1.0 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels preset:2022.27 preset-io size/XL 🚢 2.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants