Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(pvt2): migrations from legacy pivot table error when form_data have pieces of pvt2 #24710

Merged

Conversation

Always-prog
Copy link
Contributor

SUMMARY

Hello!
Migration from legacy pivot table v2 sometimes returns an error due to pieces of Pivot Table V2 form_data in the legacy pivot. This PR fixes it. Pieces of form_data can occur when users switch from pvt2 tables to legacy pivot.

TESTING INSTRUCTIONS

  1. Create an save Pivot Table v2 visualization
  2. Switch to the legacy pivot table and save
  3. Try to migrate and see the error "Dublicate key".

Or just try to migrate form_data that is already generated by me and see the error.

{'adhoc_filters': [], 'aggregateFunction': 'Sum', 'applied_time_extras': {}, 'colOrder': 'key_a_to_z', 'colTotals': True, 'columns': ['date_obr'], 'conditional_formatting': [], 'dashboards': [], 'datasource': '2858__table', 'date_format': 'smart_date', 'granularity_sqla': 'date_', 'groupby': ['Name_gr'], 'groupbyColumns': ['date_obr'], 'groupbyRows': ['id_group', 'TT'], 'label_colors': {}, 'metrics': ['count'], 'metricsLayout': 'COLUMNS', 'number_format': 'SMART_NUMBER', 'order_desc': True, 'pandas_aggfunc': 'Sum', 'pivot_margins': True, 'rowOrder': 'value_z_to_a', 'rowTotals': True, 'row_limit': 1000, 'series_limit_metric': ['count'], 'shared_label_colors': {}, 'slice_id': 16619, 'time_grain_sqla': 'P1D', 'time_range': 'Last week', 'timeseries_limit_metric': 'count', 'valueFormat': ',d', 'viz_type': 'pivot_table_v2', 'extra_filters': [], 'dataMask': {}, 'extraControls': {}}

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@Always-prog Always-prog requested a review from a team as a code owner July 16, 2023 10:06
@codecov
Copy link

codecov bot commented Jul 16, 2023

Codecov Report

Merging #24710 (8851c4c) into master (cb9b865) will increase coverage by 8.69%.
The diff coverage is 66.66%.

❗ Current head 8851c4c differs from pull request most recent head 3b23515. Consider uploading reports for the commit 3b23515 to get more accurate results

@@            Coverage Diff             @@
##           master   #24710      +/-   ##
==========================================
+ Coverage   58.35%   67.04%   +8.69%     
==========================================
  Files        1901     1901              
  Lines       73933    73936       +3     
  Branches     8183     8183              
==========================================
+ Hits        43146    49573    +6427     
+ Misses      28666    22242    -6424     
  Partials     2121     2121              
Flag Coverage Δ
hive ?
mysql 79.22% <66.66%> (?)
postgres 79.30% <66.66%> (?)
presto ?
python 79.44% <66.66%> (+18.19%) ⬆️
sqlite 77.89% <66.66%> (?)
unit ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/migrations/shared/migrate_viz/base.py 79.04% <66.66%> (-5.27%) ⬇️

... and 347 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@michael-s-molina
Copy link
Member

Pieces of form_data can occur when users switch from pvt2 tables to legacy pivot.

Thanks for the fix @Always-prog! If this behavior happens for Pivot Table, it may also happen for other chart types. Would you mind resolving this in a more global way? You can skip rename keys in base.py:

def _migrate(self) -> None:
   ...
   rv_data = {}
      for key, value in self.data.items():
         // ADD SOMETHING HERE TO SKIP NEW VERSION KEYS
            
         if key in self.rename_keys and self.rename_keys[key] in rv_data:
            raise ValueError("Duplicate key in target viz")

@Always-prog
Copy link
Contributor Author

Always-prog commented Jul 17, 2023

Pieces of form_data can occur when users switch from pvt2 tables to legacy pivot.

Thanks for the fix @Always-prog! If this behavior happens for Pivot Table, it may also happen for other chart types. Would you mind resolving this in a more global way? You can skip rename keys in base.py:

def _migrate(self) -> None:
   ...
   rv_data = {}
      for key, value in self.data.items():
         // ADD SOMETHING HERE TO SKIP NEW VERSION KEYS
            
         if key in self.rename_keys and self.rename_keys[key] in rv_data:
            raise ValueError("Duplicate key in target viz")

Then, can we just instead of an error "Duplicate key in target viz" just skip keys which is in rename dictionary?

@michael-s-molina
Copy link
Member

Then, can we just instead of an error "Duplicate key in target viz" just skip keys which is in rename dictionary?

No because that check is to prevent that different pieces of information are mapped to the same key which is a development error.

@Always-prog
Copy link
Contributor Author

@michael-s-molina I moved removing rename keys from data to the base.py, but before for key, value in self.data.items():. Can you look at the code?

@@ -64,6 +64,12 @@ def _migrate(self) -> None:
if "viz_type" in self.data:
self.data["viz_type"] = self.target_viz_type

# Sometimes visualizations have same keys in the source form_data and rename_keys
# We need to remove them from data to allow the migration to work properly with rename_keys
for key in self.rename_keys.values():
Copy link
Member

@michael-s-molina michael-s-molina Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried we might have keys that belong to the new version but not a corresponding legacy key. In that case, removing the keys will make us lose information. Taking a Pivot Table as example, we might have series_limit (new) but no row_limit (legacy). For safety, could we only remove keys if its legacy counterpart also exists?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! I added removing target key only if source and target rename keys exists in form_data

Copy link
Member

@michael-s-molina michael-s-molina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you for the fix @Always-prog!

@Always-prog
Copy link
Contributor Author

@michael-s-molina Hi! If everything is ok with my PR now, can we merge it?

@michael-s-molina
Copy link
Member

@michael-s-molina Hi! If everything is ok with my PR now, can we merge it?

Hey. Could you rebase it to fix the Docker issue? I can't merge it with that CI check failing.

@Always-prog
Copy link
Contributor Author

@michael-s-molina There is no need to rebase or merge (I already tried). I created a branch on the latest version of the code.
Looks like CI broke accidentally, just needs a rerun.

@michael-s-molina
Copy link
Member

@michael-s-molina There is no need to rebase or merge (I already tried). I created a branch on the latest version of the code. Looks like CI broke accidentally, just needs a rerun.

You need to have #24731 in your commit history to fix the CI error. That's why a rebase is needed.

@Always-prog
Copy link
Contributor Author

@michael-s-molina Got it, thank you. Fixed!

@michael-s-molina michael-s-molina merged commit df106aa into apache:master Jul 20, 2023
29 checks passed
@michael-s-molina michael-s-molina added the v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch label Jul 21, 2023
michael-s-molina pushed a commit that referenced this pull request Jul 26, 2023
@mistercrunch mistercrunch added 🍒 3.0.4 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 3.1.0 labels Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/XS v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch 🍒 3.0.0 🍒 3.0.1 🍒 3.0.2 🍒 3.0.3 🍒 3.0.4 🚢 3.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants