Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalize storage formats in file metastore #17368

Merged
merged 2 commits into from
Mar 8, 2022

Conversation

7c00
Copy link
Member

@7c00 7c00 commented Feb 28, 2022

In this PR, we changed the storage format class from HiveStorageFormat to StorageFormat, in order to make it possible to create tables and partitions whose stoarge formats are not listed in the HiveStorageFormat. For example, a Hudi COW table is stored in meatastore with the input format as HoodieParquetInputFormat.

This PR is extracted from the some legacy version of #17149

Test plan - Unit tests.

== NO RELEASE NOTE ==

@7c00 7c00 mentioned this pull request Feb 28, 2022
@7c00
Copy link
Member Author

7c00 commented Mar 3, 2022

ping @arunthirupathi

storageFormat = Arrays.stream(HiveStorageFormat.values())
.filter(format -> tableFormat.equals(StorageFormat.fromHiveStorageFormat(format)))
.findFirst();
storageFormat = Optional.of(partition.getStorage().getStorageFormat());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is always present, why have the storageFormat as optional then ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. The optional here is unnecessary. Have pruned.

storageFormat = Arrays.stream(HiveStorageFormat.values())
.filter(format -> tableFormat.equals(StorageFormat.fromHiveStorageFormat(format)))
.findFirst();
storageFormat = Optional.of(tableFormat);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed as well.

@7c00 7c00 force-pushed the presto-hudi-hms branch from 7d0fa28 to b24190c Compare March 3, 2022 04:27
@7c00 7c00 requested a review from arunthirupathi March 3, 2022 04:31
Copy link

@arunthirupathi arunthirupathi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is backward incompatible.

JsonCreator is an annotation used for creating class dynamically from JSON. This change modifies the type. When existing JSON blobs are de-serialized they will fail due to the type change.

Also looks like no-one wrote the tests for it originally for serialization so none of the tests broke.

Can you please investigate how to make these changes without breaking the existing assumptions ? Generally you have to introduce a new property and deprecate the existing property and optionally after few months remove the deprecated property.

@7c00 7c00 force-pushed the presto-hudi-hms branch from b24190c to 7e6afee Compare March 3, 2022 12:57
@7c00
Copy link
Member Author

7c00 commented Mar 3, 2022

@arunthirupathi Thanks for your comments.

I have updated the code to make the legacy json (that is serialized from the previous type definition) readable. The legacy constructors and methods are keeping existing but marked as deprecated. Could you please take a second review?

@7c00 7c00 requested a review from arunthirupathi March 3, 2022 13:09
@arunthirupathi
Copy link

Thanks for making this backward compatible, but I don't see tests that capture the old json and verify they are successfully deserialized.

I want to see the following things in this PR.

  1. Checkout master, serialize some examples (with optional.absent and optional.present with different values).
  2. Create tests that instantiates the new objects from this examples.
  3. Make your code changes.
  4. The tests should still pass for the old values.

Apologies for asking you to create tests, but this code is widely used and we need to be careful. This has a potential to break almost all our users and it is sad that the existing code did not capture the tests.

@7c00 7c00 force-pushed the presto-hudi-hms branch 2 times, most recently from 21952ef to 8b883c2 Compare March 4, 2022 15:25
@7c00
Copy link
Member Author

7c00 commented Mar 4, 2022

It makes sense to add tests to ensure good compatibility! After refactoring the code, I split it into two commits: the first one does json round trip tests and encodes the instances of legacy classes to files; the second one adds tests to decode the file by new classes. Could you take a look to check if this is ok?

Copy link

@arunthirupathi arunthirupathi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor nits, once addressed I will merge this in.

serdeParameters,
externalLocation,
columnStatistics,
eligibleToIgnore, sealedPartition);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting: one parameter per line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

public StorageFormat deserialize(JsonParser p, DeserializationContext ctxt)
throws IOException, JsonProcessingException
{
if (p.currentToken() == VALUE_STRING) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment here, prior to version 0.271 this was HiveStorageFormat and this class is to ensure back ward compatibility.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

@7c00 7c00 force-pushed the presto-hudi-hms branch from 8b883c2 to a289c26 Compare March 7, 2022 02:41
@7c00 7c00 requested a review from arunthirupathi March 7, 2022 02:48
@7c00 7c00 force-pushed the presto-hudi-hms branch from a289c26 to 92d46bd Compare March 8, 2022 04:52
@7c00
Copy link
Member Author

7c00 commented Mar 8, 2022

@arunthirupathi The PR is ready to merge. Could you take a look when you have time? Thanks in advance.

@arunthirupathi arunthirupathi merged commit 733d444 into prestodb:master Mar 8, 2022
@7c00 7c00 deleted the presto-hudi-hms branch March 8, 2022 07:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants