-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-45506][CONNECT] Add ivy URI support to SparkConnect addArtifact #43354
[SPARK-45506][CONNECT] Add ivy URI support to SparkConnect addArtifact #43354
Conversation
@@ -51,6 +51,20 @@ | |||
<groupId>org.apache.commons</groupId> | |||
<artifactId>commons-text</artifactId> | |||
</dependency> | |||
<dependency> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much deps do we add to the client by doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
~400 Kb from commons-io and ~1Mb from Ivy
<artifactId>ivy</artifactId> | ||
</dependency> | ||
<dependency> | ||
<groupId>oro</groupId> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW oro is pretty much dead. Why does ivy need it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
glob matchers (ivySettings.getMatcher("glob")
) require oro https://ant.apache.org/ivy/history/latest-milestone/concept.html
Also if I remove this dep then tests start failing:
[error] Uncaught exception when running org.apache.spark.util.MavenUtilsSuite: java.lang.NoClassDefFoundError: org/apache/oro/text/regex/MalformedPatternException
[error] sbt.ForkMain$ForkError: java.lang.NoClassDefFoundError: org/apache/oro/text/regex/MalformedPatternException
[error] at org.apache.ivy.plugins.matcher.GlobPatternMatcher.newMatcher(GlobPatternMatcher.java:57)
[error] at org.apache.ivy.plugins.matcher.AbstractPatternMatcher.getMatcher(AbstractPatternMatcher.java:44)
[error] at org.apache.ivy.plugins.matcher.MatcherHelper.matches(MatcherHelper.java:34)
[error] at org.apache.ivy.plugins.matcher.MatcherHelper.matches(MatcherHelper.java:39)
[error] at org.apache.ivy.plugins.matcher.MatcherHelper.matches(MatcherHelper.java:49)
[error] at org.apache.ivy.core.module.descriptor.DefaultModuleDescriptor.doesExclude(DefaultModuleDescriptor.java:783)
[error] at org.apache.ivy.core.resolve.IvyNode.directlyExcludes(IvyNode.java:439)
[error] at org.apache.ivy.core.resolve.IvyNode.doesExclude(IvyNode.java:421)
[error] at org.apache.ivy.core.resolve.IvyNode.isDependencyModuleExcluded(IvyNode.java:411)
[error] at org.apache.ivy.core.resolve.IvyNode.getDependencies(IvyNode.java:361)
[error] at org.apache.ivy.core.resolve.VisitNode.getDependencies(VisitNode.java:305)
[error] at org.apache.ivy.core.resolve.ResolveEngine.doFetchDependencies(ResolveEngine.java:797)
[error] at org.apache.ivy.core.resolve.ResolveEngine.fetchDependencies(ResolveEngine.java:729)
[error] at org.apache.ivy.core.resolve.ResolveEngine.getDependencies(ResolveEngine.java:607)
[error] at org.apache.ivy.core.resolve.ResolveEngine.resolve(ResolveEngine.java:250)
[error] at org.apache.ivy.Ivy.resolve(Ivy.java:522)
[error] at org.apache.spark.util.MavenUtils$.resolveMavenCoordinates(MavenUtils.scala:456)
} | ||
|
||
/** Creates a compiled class with the source file. Class file will be placed in destDir. */ | ||
def createCompiledClass( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IvyTestUtils.createLocalRepository
-> IvyTestUtils.createJavaClass
to populate a local repository with some JARs.
Also in SparkSubmitSuite
and a bunch of others in spark-core
|
||
test("resolve ivy") { | ||
val artifacts = | ||
Artifact.newIvyArtifacts(URI.create("ivy://org.apache.hive:hive-storage-api:2.7.0")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a risk that we hammer an external repo here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, but many tests in SparkContextSuite
also do this, though this is not a strong supporting argument
I'm wondering if we should add ?repos=...
parameter (or something similar) to the Ivy URI to support providing additional private / local repositories? Spark-core does support this by using Spark confs, but we can't do this in connect.
This will add some useful functionality and will allow us to test this without relying on remote repositories
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. I left a few comments.
common/utils/src/test/scala/org/apache/spark/util/MavenUtilsSuite.scala
Outdated
Show resolved
Hide resolved
common/utils/src/test/scala/org/apache/spark/util/MavenUtilsSuite.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…tests to use local repo instead of remote maven
@vsevolodstep-db can you retrigger tests? I am not able to do so. |
# Conflicts: # common/utils/src/main/scala/org/apache/spark/util/SparkStreamUtils.scala
@HyukjinKwon are the current python failures a known problem? |
Yeah, the test failure at https://github.com/vsevolodstep-db/spark/actions/runs/6691537121/job/18179083580 seems unrelated. |
Merged to master. |
I found that after moving MavenUtilsSuite.scala to the common-utils module, it cannot pass the test. Do you know why? The current GA does not test this case (this issue will be fixed later), and local reproduction can be executed by then
also cc @HyukjinKwon @hvanhovell |
@@ -230,12 +226,11 @@ private[deploy] object IvyTestUtils { | |||
val inside = deps.map(ivyArtifactWriter).mkString("\n") | |||
"\n <dependencies>\n" + inside + "\n </dependencies>" | |||
}.getOrElse("") | |||
content += "\n</ivy-module>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to delete this line? It will make the content of ivy.xml
incomplete...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give a fix: #43834
…ction and make `common-utils` module can run tests on GitHub Action ### What changes were proposed in this pull request? This PR mainly does two things: 1. It revert a line of code in `IvyTestUtils.scala` that was mistakenly deleted in SPARK-45506 | #43354 to ensure that the `ivy.xml` file generated by `IvyTestUtils#createIvyDescriptor` is complete. Before this PR, the generated `ivy.xml` file would missing the `</ivy-module>` end tag, which would cause two test cases in `MavenUtilsSuite` to fail. We can reproduce the problem by executing the `build/sbt "common-utils/test"` command: ``` [info] MavenUtilsSuite: [info] - incorrect maven coordinate throws error (8 milliseconds) [info] - create repo resolvers (24 milliseconds) [info] - create additional resolvers (3 milliseconds) :: loading settings :: url = jar:file:/Users/yangjie01/Library/Caches/Coursier/v1/https/repo1.maven.org/maven2/org/apache/ivy/ivy/2.5.1/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml [info] - add dependencies works correctly (35 milliseconds) [info] - excludes works correctly (2 milliseconds) [info] - ivy path works correctly (3 seconds, 759 milliseconds) [info] - search for artifact at local repositories *** FAILED *** (2 seconds, 833 milliseconds) [info] java.lang.RuntimeException: [unresolved dependency: my.great.lib#mylib;0.1: java.text.ParseException: [[Fatal Error] ivy-0.1.xml.original:22:18: XML document structures must start and end within the same entity. in file:/SourceCode/git/spark-mine-sbt/target/tmp/ivy-8b860aca-a9c4-4af9-b15a-ac8c6049b773/cache/my.great.lib/mylib/ivy-0.1.xml.original [info] ]] [info] at org.apache.spark.util.MavenUtils$.resolveMavenCoordinates(MavenUtils.scala:459) [info] at org.apache.spark.util.MavenUtilsSuite.$anonfun$new$25(MavenUtilsSuite.scala:173) [info] at org.apache.spark.util.MavenUtilsSuite.$anonfun$new$25$adapted(MavenUtilsSuite.scala:172) [info] at org.apache.spark.util.IvyTestUtils$.withRepository(IvyTestUtils.scala:373) [info] at org.apache.spark.util.MavenUtilsSuite.$anonfun$new$18(MavenUtilsSuite.scala:172) [info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) [info] at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) [info] at org.scalatest.TestSuite.withFixture(TestSuite.scala:196) [info] at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195) [info] at org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564) [info] at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) [info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) [info] at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) [info] at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) [info] at org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564) [info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) [info] at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) [info] at scala.collection.immutable.List.foreach(List.scala:333) [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) [info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) [info] at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269) [info] at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268) [info] at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564) [info] at org.scalatest.Suite.run(Suite.scala:1114) [info] at org.scalatest.Suite.run$(Suite.scala:1096) [info] at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564) [info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273) [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:535) [info] at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273) [info] at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272) [info] at org.apache.spark.util.MavenUtilsSuite.org$scalatest$BeforeAndAfterAll$$super$run(MavenUtilsSuite.scala:36) [info] at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) [info] at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) [info] at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) [info] at org.apache.spark.util.MavenUtilsSuite.run(MavenUtilsSuite.scala:36) [info] at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321) [info] at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517) [info] at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:414) [info] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [info] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [info] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [info] at java.base/java.lang.Thread.run(Thread.java:833) [info] - dependency not found throws RuntimeException (2 seconds, 274 milliseconds) [info] - neglects Spark and Spark's dependencies (1 second, 934 milliseconds) [info] - exclude dependencies end to end (953 milliseconds) :: loading settings :: file = /Users/yangjie01/SourceCode/git/spark-mine-sbt/target/tmp/ivy-8b860aca-a9c4-4af9-b15a-ac8c6049b773/ivysettings.xml [info] - load ivy settings file *** FAILED *** (167 milliseconds) [info] java.lang.RuntimeException: [unresolved dependency: my.great.lib#mylib;0.1: java.text.ParseException: [[Fatal Error] ivy-0.1.xml.original:22:18: XML document structures must start and end within the same entity. in file:/SourceCode/git/spark-mine-sbt/target/tmp/ivy-8b860aca-a9c4-4af9-b15a-ac8c6049b773/cache/my.great.lib/mylib/ivy-0.1.xml.original [info] ]] [info] at org.apache.spark.util.MavenUtils$.resolveMavenCoordinates(MavenUtils.scala:459) [info] at org.apache.spark.util.MavenUtilsSuite.$anonfun$new$40(MavenUtilsSuite.scala:260) [info] at org.apache.spark.util.MavenUtilsSuite.$anonfun$new$40$adapted(MavenUtilsSuite.scala:259) [info] at org.apache.spark.util.IvyTestUtils$.withRepository(IvyTestUtils.scala:373) [info] at org.apache.spark.util.MavenUtilsSuite.$anonfun$new$39(MavenUtilsSuite.scala:259) [info] at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18) [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) [info] at org.scalatest.Transformer.apply(Transformer.scala:22) [info] at org.scalatest.Transformer.apply(Transformer.scala:20) [info] at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226) [info] at org.scalatest.TestSuite.withFixture(TestSuite.scala:196) [info] at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195) [info] at org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564) [info] at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224) [info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236) [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) [info] at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236) [info] at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218) [info] at org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564) [info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269) [info] at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413) [info] at scala.collection.immutable.List.foreach(List.scala:333) [info] at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) [info] at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396) [info] at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475) [info] at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269) [info] at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268) [info] at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564) [info] at org.scalatest.Suite.run(Suite.scala:1114) [info] at org.scalatest.Suite.run$(Suite.scala:1096) [info] at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564) [info] at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273) [info] at org.scalatest.SuperEngine.runImpl(Engine.scala:535) [info] at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273) [info] at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272) [info] at org.apache.spark.util.MavenUtilsSuite.org$scalatest$BeforeAndAfterAll$$super$run(MavenUtilsSuite.scala:36) [info] at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:213) [info] at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) [info] at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) [info] at org.apache.spark.util.MavenUtilsSuite.run(MavenUtilsSuite.scala:36) [info] at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321) [info] at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517) [info] at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:414) [info] at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) [info] at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [info] at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [info] at java.base/java.lang.Thread.run(Thread.java:833) [info] - SPARK-10878: test resolution files cleaned after resolving artifact (880 milliseconds) [info] - SPARK-34624: should ignore non-jar dependencies (145 milliseconds) [info] Run completed in 13 seconds, 493 milliseconds. [info] Total number of tests run: 13 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 11, failed 2, canceled 0, ignored 0, pending 0 [info] *** 2 TESTS FAILED *** [error] Failed tests: [error] org.apache.spark.util.MavenUtilsSuite ``` 2. Added `sbt_test_goals` to the `utils` module in `modules.py` to ensure that the unit tests in the `common-utils` module will be verified by GitHub Action. ### Why are the changes needed? Fix the failed test cases in `MavenUtilsSuite` and let GitHub Action verify the unit tests in the `common-utils` module. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Monitor GA, the unit tests in the common-utils module should be run, and `MavenUtilsSuite` should test successfully. ### Was this patch authored or co-authored using generative AI tooling? No Closes #43834 from LuciferYang/utils-ga-test. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
|
||
private[deploy] object IvyTestUtils { | ||
private[spark] object IvyTestUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file should be in src/test
instead of src/main
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
ConfigBuilder("spark.jars.ivySettings") | ||
ConfigBuilder(MavenUtils.JAR_IVY_SETTING_PATH_KEY) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't it more appropriate to have MavenUtils
refer to this config entry and get the key name?
It seems weird and breaks with the remaining configs for this config to get its key name from somewhere else.
…test` directory ### What changes were proposed in this pull request? This pr move `IvyTestUtils` back to `src/test` directory because it has been in the `src/test` directory before the refactoring work of #43354. Meanwhile, in order to make the `core` and `connect-client-jvm` module use `IvyTestUtils` in the tests, this pr has added the corresponding Maven dependencies in the respective `pom.xml` files. ### Why are the changes needed? Move `IvyTestUtils` back to `src/test` directory ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Pass GitHub Actions ### Was this patch authored or co-authored using generative AI tooling? No Closes #44440 from LuciferYang/mv-IvyTestUtils-to-test-dir. Authored-by: yangjie01 <yangjie01@baidu.com> Signed-off-by: yangjie01 <yangjie01@baidu.com>
What changes were proposed in this pull request?
This PR extends existing SparkConnect
SparkSession.addArtifact
API to support Apache Ivy URIs in extend to the existing local.jar
&.class
file support. It leverages the existing support of Apache Ivy which was previously a part of spark-code (SparkSubmitUtils
)This PR contains the following changes:
SparkSubmitUtils
and moving Ivy-related parts tocommons
module. This results in adding two small dependencies tocommon
(apache ivy
andcommons-io
)Utils
&TestUtils
tocommon
SparkSession.addArtifact
to support Ivy URIsRefactoring changes done in 1-2 are not altering code functionality. These changes are mostly about moving code between files & packages and adjusting some dependencies
Why are the changes needed?
It's possible to add maven artifacts to Spark, but SparkConnect currently lacks this functionality
Does this PR introduce any user-facing change?
Yes: it extends the existing SparkConnect
SparkSession.addArtifact
API to support more URI typesHow was this patch tested?
Existing tests covering Ivy resolution functionality;
New test in
ArtifactSuite
testing ivy support inArtifactManager
New E2E test in
ReplE2ESuite
testing adding maven library and using it from UDFWas this patch authored or co-authored using generative AI tooling?
No