-
Notifications
You must be signed in to change notification settings - Fork 118
PySpark Submission fails without --jars #409
Comments
It makes sense to me that you need to explicitly specify jar deps from |
@erikerlandson can you elaborate on why this makes sense? I just ran into this and don't understand why the jar needs to be provided when I only intend to execute a Python app. |
I don't think this is correct. @ifilonenko can you take another look at this? |
@sahilprasad in general python jobs may execute jvm code, and in particular they may need additional jar deps that have to be supplied using @mccheah do you mean the spark-examples jar shouldn't be needed to run pi.py ? |
I think in this particular example the error message is either incorrect or we're not passing arguments along properly: If it was just a classpath failure then I would expect a |
@mccheah when I replicated this problem, the error that @ifilonenko provided is the only thing that I see. |
What is quite interesting is how I didn't have the missing jar exception when I ran this with #364. But as @erikerlandson mentioned, this seems to be attributed to the spark-examples jar being needed when running pyspark examples. I would assume that a better test would be to run PySpark tasks outside of the spark-examples and see if the error persists. |
My hypothesis is that the Python Dockerfile is running an ill-formatted Java command. @sahilprasad @ifilonenko - if either of you can track down how the command isn't being formed properly then that would be helpful. We had to fix a similar problem with #444 for example. |
@mccheah I can take a look. I also think that it's an ill-formatted Java command that's at the root of the issue, but I'll update this issue with what I find. |
w.r.t. making it easier to observe command formatting, we might tweak |
I was able to get @ifilonenko's first example working without See changes here: branch-2.2-kubernetes...sahilprasad:python-jars |
@sahilprasad you should submit that as PR and we can discuss - are we over-writing existing entries on |
An interesting problem arises when submitting the example PySpark jobs without --jars.
Here is an example submission:
This causes an error:
Error: Could not find or load main class .opt.spark.jars.activation-1.1.1.jar
This error is solved by passing in the necessary --jars that are supplied by the examples jar:
Is this behavior expected? In the integration environment I specify jars for the second PySpark test but not for the first test (as I launch the RSS). However, both seem to pass, making me think that it isnt necessary to specify the jars.
The text was updated successfully, but these errors were encountered: