Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24360][SQL] Support Hive 3.1 metastore #23694

Closed
wants to merge 2 commits into from
Closed

[SPARK-24360][SQL] Support Hive 3.1 metastore #23694

wants to merge 2 commits into from

Conversation

dongjoon-hyun
Copy link
Member

What changes were proposed in this pull request?

Hive 3.1.1 is released. This PR aims to support Hive 3.1.x metastore.
Please note that Hive 3.0.0 Metastore is skipped intentionally.

How was this patch tested?

Pass the Jenkins with the updated test cases including 3.1.

@@ -1179,3 +1180,128 @@ private[client] class Shim_v2_1 extends Shim_v2_0 {
private[client] class Shim_v2_2 extends Shim_v2_1

private[client] class Shim_v2_3 extends Shim_v2_1

private[client] class Shim_v3_1 extends Shim_v2_3 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine if the signatures are matched and the tests pass. Let me double check them within tomorrow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@SparkQA
Copy link

SparkQA commented Jan 30, 2019

Test build #101874 has finished for PR 23694 at commit a6a0810.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 30, 2019

Test build #101876 has finished for PR 23694 at commit 9f621e7.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Retest this please.

@@ -100,6 +100,7 @@ private[hive] object IsolatedClientLoader extends Logging {
case "2.1" | "2.1.0" | "2.1.1" => hive.v2_1
case "2.2" | "2.2.0" => hive.v2_2
case "2.3" | "2.3.0" | "2.3.1" | "2.3.2" | "2.3.3" | "2.3.4" => hive.v2_3
case "3.1" | "3.1.0" | "3.1.1" => hive.v3_1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason we do not support 3.0?

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Jan 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not stable like Hadoop 3.0. Spark skipped Hadoop 3.0 and go with Hadoop 3.1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we face the same issue in 1.0, 2.0, but we still support them. Could we simply support it and let the end-users decide it by themselves?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, hopefully, I can proceed that in another PR.

@SparkQA
Copy link

SparkQA commented Jan 30, 2019

Test build #101877 has finished for PR 23694 at commit 9f621e7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member Author

Also, cc @wangyum since he is working on Hive upgrade.

numDP: JInteger, listBucketingLevel, isAcid, writeIdInLoadTableOrPartition,
stmtIdInLoadTableOrPartition, hasFollowingStatsTask, AcidUtils.Operation.NOT_ACID,
replace: JBoolean)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun, not a big deal but how about adding dropIndex with, say, unsupported exception?

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible, but it doesn't help much. As we see here, Hive.getIndexes raises NoSuchMethodError before we call shim.dropIndex.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I noticed it. I am leaving this comment orthogonally just as a sanity check.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except one question

classOf[AcidUtils.Operation],
JBoolean.TYPE)

override def loadPartition(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checked

assert(session.nonEmpty)
val database = session.get.sessionState.catalog.getCurrentDatabase
val table = hive.getTable(database, tableName)
val loadFileType = if (replace) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checked

clazzLoadFileType.getEnumConstants.find(_.toString.equalsIgnoreCase("KEEP_EXISTING"))
}
assert(loadFileType.isDefined)
loadDynamicPartitionsMethod.invoke(hive, loadPath, tableName, partSpec, loadFileType.get,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checked

writeIdInLoadTableOrPartition, stmtIdInLoadTableOrPartition: JInteger, replace: JBoolean)
}

override def loadDynamicPartitions(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checked


private[client] class Shim_v3_1 extends Shim_v2_3 {
// Spark supports only non-ACID operations
protected lazy val isAcidIUDoperation = JBoolean.FALSE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dongjoon-hyun, isn't it isAcid defined at Shim_v0_14? Was wondering why this was separate variable again. Do you see any possibility that this is different specifically for 3.1? Then, it's fine.

Copy link
Member Author

@dongjoon-hyun dongjoon-hyun Jan 31, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historically, isAcidIUDoperation is an evolved one from isAcid.

In Hive code, isAcid was a general ACID operation and isAcidIUDoperation is now used for ACID Insert/Update/Delete operations. Also, they checks isFullAcidTable and use it together like this.

else if(!isAcidIUDoperation && isFullAcidTable) {
    destPath = fixFullAcidPathForLoadData(loadFileType, destPath, txnId, stmtId, tbl);
}

And yes for your last question. We don't know the future of Hive. So, for the different parameter name, we had better handle differently. That was my logic.

writeIdInLoadTableOrPartition, stmtIdInLoadTableOrPartition, replace: JBoolean)
}

override def loadTable(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checked

clazzLoadFileType.getEnumConstants.find(_.toString.equalsIgnoreCase("KEEP_EXISTING"))
}
assert(loadFileType.isDefined)
loadTableMethod.invoke(hive, loadPath, tableName, loadFileType.get, isSrcLocal: JBoolean,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checked

@dongjoon-hyun
Copy link
Member Author

dongjoon-hyun commented Jan 31, 2019

Thank you so much for review, @HyukjinKwon and @gatorsmile .
Merged to master.

jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
## What changes were proposed in this pull request?

Hive 3.1.1 is released. This PR aims to support Hive 3.1.x metastore.
Please note that Hive 3.0.0 Metastore is skipped intentionally.

## How was this patch tested?

Pass the Jenkins with the updated test cases including 3.1.

Closes apache#23694 from dongjoon-hyun/SPARK-24360-3.1.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
SirOibaf pushed a commit to SirOibaf/spark that referenced this pull request May 26, 2019
Hive 3.1.1 is released. This PR aims to support Hive 3.1.x metastore.
Please note that Hive 3.0.0 Metastore is skipped intentionally.

Pass the Jenkins with the updated test cases including 3.1.

Closes apache#23694 from dongjoon-hyun/SPARK-24360-3.1.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
tkakantousis pushed a commit to logicalclocks/spark that referenced this pull request Jun 3, 2019
* Compiles with Hops Hive

* [SPARK-24360][SQL] Support Hive 3.1 metastore

Hive 3.1.1 is released. This PR aims to support Hive 3.1.x metastore.
Please note that Hive 3.0.0 Metastore is skipped intentionally.

Pass the Jenkins with the updated test cases including 3.1.

Closes apache#23694 from dongjoon-hyun/SPARK-24360-3.1.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
tkakantousis pushed a commit to logicalclocks/spark that referenced this pull request Jul 16, 2019
* Compiles with Hops Hive

* [SPARK-24360][SQL] Support Hive 3.1 metastore

Hive 3.1.1 is released. This PR aims to support Hive 3.1.x metastore.
Please note that Hive 3.0.0 Metastore is skipped intentionally.

Pass the Jenkins with the updated test cases including 3.1.

Closes apache#23694 from dongjoon-hyun/SPARK-24360-3.1.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

[HOPSWORKS-385] append - Bump Hive dependency

[HOPSWORKS-385] append - Exclude log4j dependency  (#10)

* [HOPSWORKS-1105] Bump Hops version to 2.8.2.8

* [HOPSWORKS-385] append - Exclude log4j dependency
m0hit-aggarwal pushed a commit to flipkart-incubator/spark that referenced this pull request Apr 13, 2020
## What changes were proposed in this pull request?

Hive 3.1.1 is released. This PR aims to support Hive 3.1.x metastore.
Please note that Hive 3.0.0 Metastore is skipped intentionally.

## How was this patch tested?

Pass the Jenkins with the updated test cases including 3.1.

Closes apache#23694 from dongjoon-hyun/SPARK-24360-3.1.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
m0hit-aggarwal pushed a commit to flipkart-incubator/spark that referenced this pull request Apr 15, 2020
## What changes were proposed in this pull request?

Hive 3.1.1 is released. This PR aims to support Hive 3.1.x metastore.
Please note that Hive 3.0.0 Metastore is skipped intentionally.

## How was this patch tested?

Pass the Jenkins with the updated test cases including 3.1.

Closes apache#23694 from dongjoon-hyun/SPARK-24360-3.1.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants