[#5585] fix(catalog-hadoop): Test and make fileset with cloud storage can work with Spark 3.2.0~3.5.3 #5630

yuqi1129 · 2024-11-20T11:34:18Z

What changes were proposed in this pull request?

Update the Hadoop version from 3.3.0 to 3.3.1 to avoid bugs existing in the Hadoop 3.3.0, why use 3.3.1 and 3.3.6, because version hadoop-aws 3.3.6 is a very updated version and needs the corresponding Hadoop version, which will make it difficult to use in production.
Replace dependencies hadoop-common and hadoop-client with hadoop-client-api and hadoop-client-runtime to avoid third-party dependencies compatibility issues.

Why are the changes needed?

To make fileset that can be used in production.

Fix: #5585

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

Locally and existing UTs and ITs.

yuqi1129 · 2024-11-20T11:36:03Z

...hadoop3/src/main/java/org/apache/gravitino/filesystem/hadoop/GravitinoVirtualFileSystem.java

+                // Reset the FileSystem service loader to make sure the FileSystem will reload the
+                // service file systems, this is a temporary solution to fix the issue
+                // https://github.com/apache/gravitino/issues/5609
+                resetFileSystemServiceLoader(scheme);


@FANNG1 and me are working on this issue and this will not bock this PR.

jerryshao · 2024-11-21T03:56:17Z

bundles/aws-bundle/build.gradle.kts

+  compileOnly(libs.hadoop3.client.runtime)
+
+  implementation(libs.commons.lang3)
+  implementation(libs.guava)


Please make all the changes alphabetically ordering.

yuqi1129 added 2 commits November 20, 2024 19:26

Test and make fileset with cloud storage can work with Spark 3.2.1~3.5.x

72a846d

fix

837cc83

yuqi1129 commented Nov 20, 2024

View reviewed changes

fix test error.

f87fec5

yuqi1129 self-assigned this Nov 20, 2024

yuqi1129 requested review from FANNG1, xloya and jerryshao November 20, 2024 13:44

yuqi1129 changed the title ~~[#5585] fix(catalog-hadoop): Test and make fileset with cloud storage can work with Spark 3.2.1~3.5.x~~ [#5585] fix(catalog-hadoop): Test and make fileset with cloud storage can work with Spark 3.2.0~3.5.x Nov 21, 2024

yuqi1129 changed the title ~~[#5585] fix(catalog-hadoop): Test and make fileset with cloud storage can work with Spark 3.2.0~3.5.x~~ [#5585] fix(catalog-hadoop): Test and make fileset with cloud storage can work with Spark 3.2.0~3.5.3 Nov 21, 2024

jerryshao reviewed Nov 21, 2024

View reviewed changes

jerryshao mentioned this pull request Nov 21, 2024

[#5557] improvement(CI): Add some docs and tests about how to use ADLS in Hive #5558

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#5585] fix(catalog-hadoop): Test and make fileset with cloud storage can work with Spark 3.2.0~3.5.3 #5630

[#5585] fix(catalog-hadoop): Test and make fileset with cloud storage can work with Spark 3.2.0~3.5.3 #5630

yuqi1129 commented Nov 20, 2024

yuqi1129 Nov 20, 2024

jerryshao Nov 21, 2024

[#5585] fix(catalog-hadoop): Test and make fileset with cloud storage can work with Spark 3.2.0~3.5.3 #5630

Are you sure you want to change the base?

[#5585] fix(catalog-hadoop): Test and make fileset with cloud storage can work with Spark 3.2.0~3.5.3 #5630

Conversation

yuqi1129 commented Nov 20, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

yuqi1129 Nov 20, 2024

Choose a reason for hiding this comment

jerryshao Nov 21, 2024

Choose a reason for hiding this comment