Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maven classpath order is non-deterministic and Spark UI is affected #7472

Closed
jpallas opened this issue Jun 4, 2018 · 3 comments
Closed

Maven classpath order is non-deterministic and Spark UI is affected #7472

jpallas opened this issue Jun 4, 2018 · 3 comments
Assignees
Milestone

Comments

@jpallas
Copy link
Contributor

jpallas commented Jun 4, 2018

It seems that every time I run %classpath add mvn org.apache.spark spark-sql_2.11 2.3.0 I get a different order for the classpath.

This is problematic because there seems to be some version conflict in the Spark dependencies that is sensitive to the classpath ordering. I'm not 100% sure where the problem lies, but here's what I have so far. The stack dump when Spark starts up shows

2018-06-03 19:28:54:572 -0700 [SparkUI-40] WARN  ServletHandler - Error for /api/v1/applications/local-1528079332364/allexecutors
java.lang.NoSuchMethodError: javax.ws.rs.core.Application.getProperties()Ljava/util/Map;
	at org.glassfish.jersey.server.ApplicationHandler.<init>(ApplicationHandler.java:331)
	at org.glassfish.jersey.servlet.WebComponent.<init>(WebComponent.java:392)
	at org.glassfish.jersey.servlet.ServletContainer.init(ServletContainer.java:177)
	at org.glassfish.jersey.servlet.ServletContainer.init(ServletContainer.java:369)
	at javax.servlet.GenericServlet.init(GenericServlet.java:244)
	at org.spark_project.jetty.servlet.ServletHolder.initServlet(ServletHolder.java:643)
	at org.spark_project.jetty.servlet.ServletHolder.getServlet(ServletHolder.java:499)
	at org.spark_project.jetty.servlet.ServletHolder.ensureInstance(ServletHolder.java:791)
	at org.spark_project.jetty.servlet.ServletHolder.prepare(ServletHolder.java:776)
	at org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:579)
	at org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
	at org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
	at org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
	at org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
	at org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
	at org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
	at org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
	at org.spark_project.jetty.server.Server.handle(Server.java:534)
	at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:320)
	at org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
	at org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
	at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108)
	at org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
	at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
	at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
	at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
	at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
	at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
	at java.lang.Thread.run(Thread.java:748)

and subsequently, visiting the Spark UI and going to the Executors tab dumps this

2018-06-03 19:30:24:433 -0700 [SparkUI-40] WARN  ServletHandler - /api/v1/applications
java.lang.NullPointerException
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
	at org.spark_project.jetty.servlet.ServletHolder.handle(ServletHolder.java:848)
	at org.spark_project.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584)
	at org.spark_project.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
	at org.spark_project.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
	at org.spark_project.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
	at org.spark_project.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
	at org.spark_project.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:493)
	at org.spark_project.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
	at org.spark_project.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
	at org.spark_project.jetty.server.Server.handle(Server.java:534)
	at org.spark_project.jetty.server.HttpChannel.handle(HttpChannel.java:320)
	at org.spark_project.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
	at org.spark_project.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)
	at org.spark_project.jetty.io.FillInterest.fillable(FillInterest.java:108)
	at org.spark_project.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
	at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
	at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
	at org.spark_project.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
	at org.spark_project.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
	at org.spark_project.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
	at java.lang.Thread.run(Thread.java:748)

These complaints are apparently connected to a conflict between JAX-RS versions 1 and 2. Spark is using Jersey 2, but Hadoop 2.x is using Jersey 1. There is some discussion of conflict at SPARK-15343 but it doesn't yield a definitive resolution (if that's the underlying issue in this case).

It does appear that the order assigned to jersey-core 1.9 and javax.ws.rs-api 2.0.1 affects whether I see errors (but, I have to admit, I'm not sure that's the only factor).

MNG-1412 is where Maven declares that classpath ordering should be consistent.

@jpallas
Copy link
Contributor Author

jpallas commented Jun 6, 2018

I've verified that the classpath constructed by Maven (using dependency:build-classpath) yields a working Spark UI.

The current implementation uses dependency:copy-dependencies to copy all the dependencies into a per-kernel-instance jar directory (ResolverParams.pathToNotebookJars) and then uses a wildcard on that directory to update the classpath. The wildcard produces unpredictable ordering.

A little bit of refactoring is going to be required, because the performance problem from #5701 reappears when a list of actual names is used instead of a wildcard.

Also, the default behavior when Maven builds the classpath is to point into the Maven local repo. It's not clear whether copying all the jar files into a per-kernel-instance directory is needed if wildcards aren't being used. On the other hand, clearing the shared Maven local repo would pull the rug out from under other running kernels, so maybe the copying still makes sense.

@scottdraves
Copy link
Contributor

thank you for researching this joe, i don't understand it yet, will look further after 0.19 is done.

@jaroslawmalekcodete
Copy link
Contributor

Hi @jpallas
I could not reproduce your error. I used your suggestion and I implemented deterministic classpath order based on dependency:build-classpath.

LeeTZ pushed a commit that referenced this issue Aug 30, 2018
…ndency:build-classpath' goal (#7774)

* #7472: ensure deterministic classpath order based on maven 'dependency:build-classpath' goal

* #7472: revert unintentional deletion
@LeeTZ LeeTZ closed this as completed Aug 31, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants