Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: rolling and cum operator on multiple series #16945

Merged
merged 3 commits into from
Oct 7, 2021

Conversation

zhaoyongjie
Copy link
Member

@zhaoyongjie zhaoyongjie commented Oct 3, 2021

SUMMARY

Currently, Rolling Window calculation results are incorrect on multiple series.

Frontend codes at: apache-superset/superset-ui#1386

image

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

After

image

Before

image

TESTING INSTRUCTIONS

  1. select birth_names dataset
  2. switch to time-series chart
  3. set time grain to year
  4. set time range to 1970 - 2000
  5. set metrics to sum__num
  6. set group by to gender
  7. set rolling function to sum
  8. set periods to 2
  9. set min periods to 2
  10. use the south panel to observe the accuracy of the data

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@codecov
Copy link

codecov bot commented Oct 3, 2021

Codecov Report

Merging #16945 (f00e4eb) into master (87baac7) will decrease coverage by 0.05%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16945      +/-   ##
==========================================
- Coverage   76.97%   76.91%   -0.06%     
==========================================
  Files        1022     1027       +5     
  Lines       54903    55089     +186     
  Branches     7485     7485              
==========================================
+ Hits        42262    42374     +112     
- Misses      12393    12467      +74     
  Partials      248      248              
Flag Coverage Δ
hive 81.46% <100.00%> (+<0.01%) ⬆️
mysql 81.91% <100.00%> (+0.06%) ⬆️
postgres 81.92% <100.00%> (+0.01%) ⬆️
presto ?
python 82.27% <100.00%> (-0.15%) ⬇️
sqlite 81.59% <100.00%> (+0.06%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/utils/pandas_postprocessing.py 85.80% <100.00%> (+1.37%) ⬆️
superset/sqllab/exceptions.py 67.39% <0.00%> (-13.57%) ⬇️
superset/db_engine_specs/presto.py 84.30% <0.00%> (-6.07%) ⬇️
superset/errors.py 94.20% <0.00%> (-5.80%) ⬇️
superset/exceptions.py 91.26% <0.00%> (-2.92%) ⬇️
superset/dao/base.py 95.12% <0.00%> (-1.76%) ⬇️
superset/common/query_object.py 92.85% <0.00%> (-1.59%) ⬇️
superset/connectors/sqla/models.py 85.83% <0.00%> (-1.42%) ⬇️
superset/models/core.py 89.26% <0.00%> (-0.74%) ⬇️
superset/views/database/views.py 88.41% <0.00%> (-0.05%) ⬇️
... and 15 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 87baac7...f00e4eb. Read the comment docs.

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a quick unit test for is_pivot_df = False to ensure this doesn't break later?

@zhaoyongjie
Copy link
Member Author

Could we add a quick unit test for is_pivot_df = False to ensure this doesn't break later?

Sure, I will add such UT.

@pull-request-size pull-request-size bot added size/L and removed size/M labels Oct 6, 2021
@zhaoyongjie zhaoyongjie requested a review from villebro October 6, 2021 05:28
Comment on lines +581 to +586
df_cum = getattr(df_cum, operation)()
agg_in_pivot_df = df.columns.get_level_values(0).drop_duplicates().to_list()
agg: Dict[str, Dict[str, Any]] = {col: {} for col in agg_in_pivot_df}
df_cum.columns = [
_flatten_column_after_pivot(col, agg) for col in df_cum.columns
]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will create a separate PR for refactoring _flatten_column_after_pivot

Comment on lines +422 to +426
agg_in_pivot_df = df.columns.get_level_values(0).drop_duplicates().to_list()
agg: Dict[str, Dict[str, Any]] = {col: {} for col in agg_in_pivot_df}
df_rolling.columns = [
_flatten_column_after_pivot(col, agg) for col in df_rolling.columns
]
Copy link
Member Author

@zhaoyongjie zhaoyongjie Oct 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will create a separate PR for refactoring _flatten_column_after_pivot

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for this fix+added code coverage! 🎉

@zhaoyongjie zhaoyongjie merged commit fd84614 into apache:master Oct 7, 2021
@villebro villebro added the v1.4 label Oct 12, 2021
eschutho pushed a commit to preset-io/superset that referenced this pull request Oct 27, 2021
* fix: rolling and cum operator on multiple series

* add UT

* updates

(cherry picked from commit fd84614)
opus-42 pushed a commit to opus-42/incubator-superset that referenced this pull request Nov 14, 2021
* fix: rolling and cum operator on multiple series

* add UT

* updates
QAlexBall pushed a commit to QAlexBall/superset that referenced this pull request Dec 28, 2021
* fix: rolling and cum operator on multiple series

* add UT

* updates
@mistercrunch mistercrunch added 🍒 1.4.0 🍒 1.4.1 🍒 1.4.2 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 1.5.0 labels Mar 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L v1.4 🍒 1.4.0 🍒 1.4.1 🍒 1.4.2 🚢 1.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants