-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-24360][SQL] Support Hive 3.1 metastore #23694
Conversation
@@ -1179,3 +1180,128 @@ private[client] class Shim_v2_1 extends Shim_v2_0 { | |||
private[client] class Shim_v2_2 extends Shim_v2_1 | |||
|
|||
private[client] class Shim_v2_3 extends Shim_v2_1 | |||
|
|||
private[client] class Shim_v3_1 extends Shim_v2_3 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine if the signatures are matched and the tests pass. Let me double check them within tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Test build #101874 has finished for PR 23694 at commit
|
Test build #101876 has finished for PR 23694 at commit
|
Retest this please. |
@@ -100,6 +100,7 @@ private[hive] object IsolatedClientLoader extends Logging { | |||
case "2.1" | "2.1.0" | "2.1.1" => hive.v2_1 | |||
case "2.2" | "2.2.0" => hive.v2_2 | |||
case "2.3" | "2.3.0" | "2.3.1" | "2.3.2" | "2.3.3" | "2.3.4" => hive.v2_3 | |||
case "3.1" | "3.1.0" | "3.1.1" => hive.v3_1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason we do not support 3.0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not stable like Hadoop 3.0. Spark skipped Hadoop 3.0 and go with Hadoop 3.1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we face the same issue in 1.0, 2.0, but we still support them. Could we simply support it and let the end-users decide it by themselves?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, hopefully, I can proceed that in another PR.
Test build #101877 has finished for PR 23694 at commit
|
Also, cc @wangyum since he is working on Hive upgrade. |
numDP: JInteger, listBucketingLevel, isAcid, writeIdInLoadTableOrPartition, | ||
stmtIdInLoadTableOrPartition, hasFollowingStatsTask, AcidUtils.Operation.NOT_ACID, | ||
replace: JBoolean) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun, not a big deal but how about adding dropIndex
with, say, unsupported exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible, but it doesn't help much. As we see here, Hive.getIndexes
raises NoSuchMethodError
before we call shim.dropIndex
.
- Hive.java in 2.3 has
getIndexes
. - Hive.java in 3.0 and master doesn't have it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I noticed it. I am leaving this comment orthogonally just as a sanity check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except one question
classOf[AcidUtils.Operation], | ||
JBoolean.TYPE) | ||
|
||
override def loadPartition( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checked
assert(session.nonEmpty) | ||
val database = session.get.sessionState.catalog.getCurrentDatabase | ||
val table = hive.getTable(database, tableName) | ||
val loadFileType = if (replace) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checked
clazzLoadFileType.getEnumConstants.find(_.toString.equalsIgnoreCase("KEEP_EXISTING")) | ||
} | ||
assert(loadFileType.isDefined) | ||
loadDynamicPartitionsMethod.invoke(hive, loadPath, tableName, partSpec, loadFileType.get, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checked
writeIdInLoadTableOrPartition, stmtIdInLoadTableOrPartition: JInteger, replace: JBoolean) | ||
} | ||
|
||
override def loadDynamicPartitions( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checked
|
||
private[client] class Shim_v3_1 extends Shim_v2_3 { | ||
// Spark supports only non-ACID operations | ||
protected lazy val isAcidIUDoperation = JBoolean.FALSE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dongjoon-hyun, isn't it isAcid
defined at Shim_v0_14
? Was wondering why this was separate variable again. Do you see any possibility that this is different specifically for 3.1? Then, it's fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Historically, isAcidIUDoperation
is an evolved one from isAcid
.
In Hive code, isAcid
was a general ACID operation and isAcidIUDoperation
is now used for ACID Insert/Update/Delete operations. Also, they checks isFullAcidTable
and use it together like this.
else if(!isAcidIUDoperation && isFullAcidTable) {
destPath = fixFullAcidPathForLoadData(loadFileType, destPath, txnId, stmtId, tbl);
}
And yes for your last question. We don't know the future of Hive. So, for the different parameter name, we had better handle differently. That was my logic.
writeIdInLoadTableOrPartition, stmtIdInLoadTableOrPartition, replace: JBoolean) | ||
} | ||
|
||
override def loadTable( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checked
clazzLoadFileType.getEnumConstants.find(_.toString.equalsIgnoreCase("KEEP_EXISTING")) | ||
} | ||
assert(loadFileType.isDefined) | ||
loadTableMethod.invoke(hive, loadPath, tableName, loadFileType.get, isSrcLocal: JBoolean, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checked
Thank you so much for review, @HyukjinKwon and @gatorsmile . |
## What changes were proposed in this pull request? Hive 3.1.1 is released. This PR aims to support Hive 3.1.x metastore. Please note that Hive 3.0.0 Metastore is skipped intentionally. ## How was this patch tested? Pass the Jenkins with the updated test cases including 3.1. Closes apache#23694 from dongjoon-hyun/SPARK-24360-3.1. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Hive 3.1.1 is released. This PR aims to support Hive 3.1.x metastore. Please note that Hive 3.0.0 Metastore is skipped intentionally. Pass the Jenkins with the updated test cases including 3.1. Closes apache#23694 from dongjoon-hyun/SPARK-24360-3.1. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
* Compiles with Hops Hive * [SPARK-24360][SQL] Support Hive 3.1 metastore Hive 3.1.1 is released. This PR aims to support Hive 3.1.x metastore. Please note that Hive 3.0.0 Metastore is skipped intentionally. Pass the Jenkins with the updated test cases including 3.1. Closes apache#23694 from dongjoon-hyun/SPARK-24360-3.1. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
* Compiles with Hops Hive * [SPARK-24360][SQL] Support Hive 3.1 metastore Hive 3.1.1 is released. This PR aims to support Hive 3.1.x metastore. Please note that Hive 3.0.0 Metastore is skipped intentionally. Pass the Jenkins with the updated test cases including 3.1. Closes apache#23694 from dongjoon-hyun/SPARK-24360-3.1. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org> [HOPSWORKS-385] append - Bump Hive dependency [HOPSWORKS-385] append - Exclude log4j dependency (#10) * [HOPSWORKS-1105] Bump Hops version to 2.8.2.8 * [HOPSWORKS-385] append - Exclude log4j dependency
## What changes were proposed in this pull request? Hive 3.1.1 is released. This PR aims to support Hive 3.1.x metastore. Please note that Hive 3.0.0 Metastore is skipped intentionally. ## How was this patch tested? Pass the Jenkins with the updated test cases including 3.1. Closes apache#23694 from dongjoon-hyun/SPARK-24360-3.1. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
## What changes were proposed in this pull request? Hive 3.1.1 is released. This PR aims to support Hive 3.1.x metastore. Please note that Hive 3.0.0 Metastore is skipped intentionally. ## How was this patch tested? Pass the Jenkins with the updated test cases including 3.1. Closes apache#23694 from dongjoon-hyun/SPARK-24360-3.1. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
What changes were proposed in this pull request?
Hive 3.1.1 is released. This PR aims to support Hive 3.1.x metastore.
Please note that Hive 3.0.0 Metastore is skipped intentionally.
How was this patch tested?
Pass the Jenkins with the updated test cases including 3.1.