Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#14909] Fix for duplicate key isue when diffing catalogs with arrays #14922

Merged
merged 1 commit into from
Jul 22, 2022

Conversation

jackie-numerade
Copy link
Contributor

@jackie-numerade jackie-numerade commented Jul 21, 2022

What

Fix for duplicateKeyException thrown when diffing catalogs that contain arrays in the schema (see #14909 ). This exception is due to the CatalogHelpers.getFullyQualifiedFieldNamesWithTypes() method returning the same names for both the array node and also the child "items" node, causing a collision during Stream.collect()' in CatalogHelpers.getStreamDiff()`.

Please see the updated unit test and also #14460 for examples of schemas that trigger this exception with the existing code.

How

The issue was caused by getFullyQualifiedFieldNamesWithTypes() filtering out the FieldNameOrList instances from the path list (so that the "items" node ends up with the same name list as the parent array node). The proposed fix here just adds a string "items" to the name list instead of ignoring that list instance altogether. However, this does introduce a new "items" node that could be included in the final Catalog diff. I honestly don't know how/where this diff is used upstream (including the frontend), so I'm not sure if this would cause any issues.

Another alternative would be to just exclude either the array node or the "items" node from the result of the getFullyQualifiedFieldNamesWithTypes() method. The only downside to that approach is that it wouldn't be able to detect diffs in cases where an array field turned into an object field (with the same child schema), or vice versa, since the resulting field list would look the same (though this is probably an edge case).

Recommended reading order

  1. CatalogHelpers.java
  2. CatalogHelpersTest.java

🚨 User Impact 🚨

Unknown, depending on how/where the results of CatalogHelpers.getCatalogDiff() are used.

Close #14909

@jackie-numerade jackie-numerade requested a review from a team as a code owner July 21, 2022 13:58
@CLAassistant
Copy link

CLAassistant commented Jul 21, 2022

CLA assistant check
All committers have signed the CLA.

Copy link
Contributor

@evantahler evantahler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this fix & the tests! This looks 👍 to me, but tagging @jdpgrailsdev for review as well, who knows a lot more about Java than I do.

Copy link
Contributor

@jdpgrailsdev jdpgrailsdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform issues related to the platform area/protocol community
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Duplicate key error when diffing catalogs that contains arrays
5 participants