fix: pandas bug when data is blank on post-processing #20629

eschutho · 2022-07-06T18:16:01Z

SUMMARY

There's a bug in post processing for tables and pivot-table charts when the data is empty for both json and csv formats. We'll now just return the original results instead of trying to apply any post-processing on it.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Currently raises a 500

TESTING INSTRUCTIONS

Create a chart with no results for a pivot table and then try to export pivoted results as csv. This would also break on alert/reports when formatting the report as a csv.

ADDITIONAL INFORMATION

Has associated issue:
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

betodealmeida · 2022-07-06T18:19:15Z

superset/charts/post_processing.py

+            try:
+                df = pd.DataFrame.from_dict(query["data"])
+            except ValueError:  # no data error
+                return result


I think it would be better to continue here and in line 336, since other queries might have data (and if they also don't we'll end up returning result unmodified).

What is query["data"] in this case? Would it be preferable to first check whether query["data"] is valid? Per your unit tests it seems like this might be an empty string—the worst of the worst—and maybe we could/should fix this upstream and have it be None, i.e., the following works:

>>> import pandas as pd >>> pd.DataFrame.from_dict(None) Empty DataFrame Columns: [] Index: []

@betodealmeida if the data is None or '', is there any value in continuing this process rather than returning early? AFAICT we'll continue to get more errors down below as well. Per @john-bodley's point, I can do a nullish check instead of the try/except if we want to be more specific to these errors.

If you continue here it would skip to the next step of the for query in result["queries"] loop, so it wouldn't get more errors. There could be other queries in result["queries"] that have non-blank data.

Ok, I see what you're saying. I also added a new test for multiple queries.

based on @john-bodley's comment, i added a few more tests when data is None and found a case where it errors later in the code, so I put a nullish check like he suggested instead of the try/except.

codecov · 2022-07-06T18:23:13Z

Codecov Report

Merging #20629 (5a632fd) into master (b39a3d8) will decrease coverage by 0.14%.
The diff coverage is 89.06%.

❗ Current head 5a632fd differs from pull request most recent head c701d6f. Consider uploading reports for the commit c701d6f to get more accurate results

@@            Coverage Diff             @@
##           master   #20629      +/-   ##
==========================================
- Coverage   66.82%   66.67%   -0.15%     
==========================================
  Files        1752     1752              
  Lines       65616    65570      -46     
  Branches     6938     6938              
==========================================
- Hits        43849    43722     -127     
- Misses      20007    20088      +81     
  Partials     1760     1760

Flag	Coverage Δ
hive	`?`
mysql	`82.45% <93.86%> (+0.07%)`	⬆️
postgres	`82.53% <93.86%> (+0.07%)`	⬆️
presto	`?`
python	`82.61% <93.86%> (-0.29%)`	⬇️
sqlite	`82.31% <93.86%> (+0.07%)`	⬆️
unit	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...ackages/superset-ui-core/src/utils/featureFlags.ts	`100.00% <ø> (ø)`
...frontend/plugins/plugin-chart-table/src/Styles.tsx	`100.00% <ø> (ø)`
...nd/plugins/plugin-chart-table/src/controlPanel.tsx	`43.47% <ø> (ø)`
...perset-frontend/src/addSlice/AddSliceContainer.tsx	`66.66% <ø> (ø)`
...frontend/src/components/DatabaseSelector/index.tsx	`88.88% <ø> (ø)`
...end/src/components/Datasource/DatasourceEditor.jsx	`65.61% <ø> (ø)`
...et-frontend/src/components/EditableTitle/index.tsx	`64.28% <ø> (ø)`
...frontend/src/components/ImportModal/ErrorAlert.tsx	`33.33% <ø> (ø)`
...src/dashboard/components/PropertiesModal/index.tsx	`62.98% <ø> (ø)`
...ersConfigModal/FiltersConfigForm/DatasetSelect.tsx	`40.00% <ø> (ø)`
... and 141 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b39a3d8...c701d6f. Read the comment docs.

betodealmeida

Looks great!

* fix pandas bug when data is blank on post-processing * account for multiple queries when data is blank (cherry picked from commit c2be54c)

* fix pandas bug when data is blank on post-processing * account for multiple queries when data is blank

fix pandas bug when data is blank on post-processing

7e0ec8f

superset-github-bot bot added the preset-io label Jul 6, 2022

pull-request-size bot added the size/L label Jul 6, 2022

eschutho requested a review from betodealmeida July 6, 2022 18:16

betodealmeida reviewed Jul 6, 2022

View reviewed changes

eschutho force-pushed the elizabeth/fix-pandas-bug branch from f6cebfe to 1cf4df0 Compare July 7, 2022 23:21

account for multiple queries when data is blank

c701d6f

eschutho force-pushed the elizabeth/fix-pandas-bug branch from 1cf4df0 to c701d6f Compare July 7, 2022 23:50

pull-request-size bot added size/XL and removed size/L labels Jul 7, 2022

betodealmeida approved these changes Jul 8, 2022

View reviewed changes

eschutho merged commit c2be54c into apache:master Jul 8, 2022

eschutho deleted the elizabeth/fix-pandas-bug branch July 8, 2022 23:33

jinghua-qa added the preset:2022.27 label Jul 8, 2022

akshatsri pushed a commit to charan1314/superset that referenced this pull request Jul 19, 2022

fix: pandas bug when data is blank on post-processing (apache#20629)

2d33c2f

* fix pandas bug when data is blank on post-processing * account for multiple queries when data is blank

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 2.1.0 labels Mar 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: pandas bug when data is blank on post-processing #20629

fix: pandas bug when data is blank on post-processing #20629

eschutho commented Jul 6, 2022

betodealmeida Jul 6, 2022

john-bodley Jul 6, 2022 •

edited

Loading

eschutho Jul 6, 2022

betodealmeida Jul 7, 2022

eschutho Jul 7, 2022

eschutho Jul 7, 2022

codecov bot commented Jul 6, 2022 •

edited

Loading

betodealmeida left a comment

fix: pandas bug when data is blank on post-processing #20629

fix: pandas bug when data is blank on post-processing #20629

Conversation

eschutho commented Jul 6, 2022

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

betodealmeida Jul 6, 2022

Choose a reason for hiding this comment

john-bodley Jul 6, 2022 • edited Loading

Choose a reason for hiding this comment

eschutho Jul 6, 2022

Choose a reason for hiding this comment

betodealmeida Jul 7, 2022

Choose a reason for hiding this comment

eschutho Jul 7, 2022

Choose a reason for hiding this comment

eschutho Jul 7, 2022

Choose a reason for hiding this comment

codecov bot commented Jul 6, 2022 • edited Loading

Codecov Report

betodealmeida left a comment

Choose a reason for hiding this comment

john-bodley Jul 6, 2022 •

edited

Loading

codecov bot commented Jul 6, 2022 •

edited

Loading