-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hudi connector #17149
Add hudi connector #17149
Conversation
dc1acb2
to
75b1762
Compare
fbb24fd
to
df82b6d
Compare
Thanks @codope @rohanpednekar! |
...-hive-metastore/src/main/java/com/facebook/presto/hive/metastore/file/PartitionMetadata.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiMetadata.java
Outdated
Show resolved
Hide resolved
d3c130c
to
06ab601
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@7c00 Thanks for the contribution. This would be pretty useful for the Hudi community.
I took one pass and left some comments. I have a high-level question. Can you please add to the description, what all Hudi table types and query types are currently supported?
presto-hive-common/src/main/java/org/apache/hadoop/fs/HadoopExtendedFileSystem.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hive/HudiRecordCursors.java
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/ParquetPageSourceUtil.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/ParquetPageSourceUtil.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/RecordCursorUtil.java
Outdated
Show resolved
Hide resolved
private static InputFormat<?, ?> createInputFormat(Configuration conf, String inputFormat) | ||
{ | ||
try { | ||
Class<?> clazz = conf.getClassByName(inputFormat); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of reflection-based check, we can use annotations provided by Hudi. Check these two annotations in Hudi:
UseRecordReaderFromInputFormat
UseFileSplitsFromInputFormat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The annotations UseRecordReaderFromInputFormat
and UseFileSplitsFromInputFormat
indicate that we could delegate the splitting to InputFormat. As we discussed in the design proposal, there are some performance issues in the splitting implementation from InputFormat. So we will ignore these annotations and provide a new splitting implementation optimized for Presto.
format("Invalid partition name %s for partition columns %s", partitionName, partitionColumns)); | ||
Partition partition = metastore.getPartition(context, databaseName, tableName, partitionValues) | ||
.orElseThrow(() -> new PrestoException(HUDI_INVALID_METADATA, format("Partition %s expected but not found", partitionName))); | ||
Map<String, String> keyValues = zipPartitionKeyValues(partitionColumns, partitionValues); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you're considering different partition extractors in hudi-hive sync. Depending on partition extractor, partition values can change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are now retrieving partition metadata through HiveMetaStore, where the partition values stored have been extracted by some partition extractors from their partition locations and thus we dont need to take consisder of the partition extractors.
83edce7
to
23c5fd9
Compare
@pratyakshsharma @codope Thanks for your comments. I have updated the code as the comments. Could you please take a second review? Thanks in advance. |
@codope Can you please take another pass on this? |
Will review by Wednesday. |
ping @codope |
@7c00 I have taken one more pass. The code look fine from Hudi perspective. |
This contains 2 set of changes, changes to hive/metastore and hudi connector. Can you please do a PR for hive/metastore changes alone and hudi connector. I will review the hive/metastore changes closely. |
@arunthirupathi I have extracted the metastore related changes to a new PR at #17368. Could you take a review on it? Thank you in advance ❤️ . |
aa78a34
to
db07aa6
Compare
I have refactored the tests. Instead of dynamically generating tables during testing, the new tests are using the pre-generated data from Hudi Docker demo. The reasons for that are
@codope @pratyakshsharma Could you take a second review? Nearly all the changes comes from the test part. Then I'd like to invite Presto fellows to review the PR. |
I am actually travelling this weekend. Will take a look early next week. |
What is the size of the jar that is getting built ?few 10's of MB is fine, but if it is producing really large then please look into it in reducing the size. When it is ready for review, Please ping me again. For specific connectors, we require it to be thoroughly reviewed by someone else familiar with the connector. Once that review is done, I will make a pass to ensure it meets the coding guidelines. |
@arunthirupathi Sorry for late to reply.
As we are reusing dependencies from presto-hive module and have added no new dependency. The total size of this hudi connector will be less than that of hive connector.
Now this PR is blocked the Avro dependency issue (found in #17463) to be fixed in next Hudi release 0.11. When Hudi 0.11 is released, we are going to continue to work on this PR and invite you to review. Thanks you in advance. cc @codope |
@7c00 Hudi v0.11 is released https://mvnrepository.com/artifact/org.apache.hudi/hudi-presto-bundle/0.11.0 |
The code has been rebased to sync with master branch after #17463 . It's ready to review now. @codope @pratyakshsharma @arunthirupathi @kewang1024 @highker Could you take a look at this PR? Thanks in advance! ❤️ |
@pettyjamesm Please take a look when you get a chance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good, left some nits.
Sorry my notification in Presto is messed up, I will fix it. so that I will have a quicker turn around time next time.
@Override | ||
public String toString() | ||
{ | ||
return id + ":" + name + ":" + hiveType + ":" + columnType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please StringHelper to convert to String ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The toString method here is implemented followed the convention of HiveColumnHandle#toString. In this way, we could have a more compacted text (refer to a Hudi column) when output logs.
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiConnectorFactory.java
Outdated
Show resolved
Hide resolved
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiFile.java
Outdated
Show resolved
Hide resolved
@Override | ||
public String toString() | ||
{ | ||
return path + ":" + start + "+" + length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use toStringHelper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same reason as HudiColumnHandle#toString
int batchSize = dataPage.getPositionCount(); | ||
Block[] blocks = new Block[prefilledBlocks.length]; | ||
for (int i = 0; i < prefilledBlocks.length; i++) { | ||
if (prefilledBlocks[i] != null) { | ||
blocks[i] = new RunLengthEncodedBlock(prefilledBlocks[i], batchSize); | ||
} | ||
else { | ||
blocks[i] = dataPage.getBlock(delegateIndexes[i]); | ||
} | ||
} | ||
return new Page(batchSize, blocks); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code though is right, is inefficient. This is one of the hot paths. If there is no prefilled blocks, short circuit and return the delegate page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, thank you. Add short circuit when no actual prefilled block.
if (valueString.equalsIgnoreCase("false")) { | ||
return false; | ||
} | ||
throw new IllegalArgumentException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the value here on what was illegal, so that it makes debugging easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IllegalArgumentException
is catched later in current method, then a PrestoException with more context infomation is thrown.
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiPageSource.java
Outdated
Show resolved
Hide resolved
BigDecimal decimal = new BigDecimal(valueString); | ||
decimal = decimal.setScale(decimalType.getScale(), BigDecimal.ROUND_UNNECESSARY); | ||
if (decimal.precision() > decimalType.getPrecision()) { | ||
throw new IllegalArgumentException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include the decimalPrecision and decimalType precision in the error message, along with the valueString.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same reaseon as above.
presto-hudi/src/main/java/com/facebook/presto/hudi/HudiPlugin.java
Outdated
Show resolved
Hide resolved
Thanks for reviewing @arunthirupathi |
Looks like there were some changes and this was broken, can you please fix them as well ? |
@arunthirupathi Thanks for your reviewing. The comments have been addressed. Let me ping you again when all CI checks passed. |
Hey @arunthirupathi All CI passed. Let's merge this PR? |
Thanks @arunthirupathi @codope @pratyakshsharma @rohanpednekar for all your time and help! ❤️ |
Add hudi connector
Ref: #17006
Test plan - Unit tests.