Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-2678][Core][SQL] A workaround for SPARK-2678 #1801

Closed
wants to merge 5 commits into from

Conversation

liancheng
Copy link
Contributor

JIRA issues:

Related PR:

This PR is both a fix for SPARK-2874 and a workaround for SPARK-2678. Fixing SPARK-2678 completely requires some API level changes that need further discussion, and we decided not to include it in Spark 1.1 release. As currently SPARK-2678 only affects Spark SQL scripts, this workaround is enough for Spark 1.1. Command line option handling logic in bash scripts looks somewhat dirty and duplicated, but it helps to provide a cleaner user interface as well as retain full downward compatibility for now.

@chenghao-intel
Copy link
Contributor

@liancheng thanks for doing this. I don't have too much comments about it. But I have some other questions from the user perspective:

  1. Shall we set the default log level as WARN instead of INFO while running the ./bin/spark-sql, the output seems too verbose for normal user currently.
  2. Shall we provide the default hive-site.xml.template and hive-log4j.properties.template under the conf/ too?
  3. I also noticed the package made by make-distribution.sh, only contains the assembly jars under the folder lib, actually in my cluster, I'd would like to use the SparkSQL with Hive metastore database installed on MySQL, hence I need to put additional jars (mysql driver and datanucleaus) into the folder lib manually, but currently, seems I have to change the scripts of bin/compute-classpath.sh as well to make those new jars effective in classpath. Not sure if there any better idea for the configuration.

Sorry, those questions are not related to this PR.

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA tests have started for PR 1801. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18003/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA tests have started for PR 1801. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18006/consoleFull

@pwendell
Copy link
Contributor

pwendell commented Aug 6, 2014

Thanks cheng. This LGTM pending tests.

@liancheng liancheng changed the title [SPARK-2874][SQL] Fixed usage messages of all Spark SQL related scripts [SPARK-2678][Core][SQL] A workaround for SPARK-2678 Aug 6, 2014
@liancheng
Copy link
Contributor Author

Hey @chenghao-intel, answers for your questions:

  1. Actually bin/spark-sql doesn't output lots of logs by default. Did you set something like hive.root.looger=INFO,console in your hive-site.xml or elsewhere (similar to what shark-withinfo does in Shark)?
  2. Hmm... I'm not very sure about this. Usually people just copy their hive-site.xml and hive-log4j.properties from their existing Hive installation. Maybe we can do it in another PR. One thing to note, hive-default.xml.template in Hive 0.12 isn't a valid XML, needs some minor tweak before being added.
  3. Need some more time to investigate this one. We can discuss this offline.

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA results for PR 1801:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18006/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA results for PR 1801:
- This patch PASSES unit tests.
- This patch merges cleanly
- This patch adds no public classes

For more information see test ouptut:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18003/consoleFull

@SparkQA
Copy link

SparkQA commented Aug 6, 2014

QA tests have started for PR 1801. This patch merges cleanly.
View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18028/consoleFull

@pwendell
Copy link
Contributor

pwendell commented Aug 6, 2014

Thanks Cheng! I'm gonna merge this.

asfgit pushed a commit that referenced this pull request Aug 6, 2014
JIRA issues:

- Main: [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678)
- Related: [SPARK-2874](https://issues.apache.org/jira/browse/SPARK-2874)

Related PR:

- #1715

This PR is both a fix for SPARK-2874 and a workaround for SPARK-2678. Fixing SPARK-2678 completely requires some API level changes that need further discussion, and we decided not to include it in Spark 1.1 release. As currently SPARK-2678 only affects Spark SQL scripts, this workaround is enough for Spark 1.1. Command line option handling logic in bash scripts looks somewhat dirty and duplicated, but it helps to provide a cleaner user interface as well as retain full downward compatibility for now.

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes #1801 from liancheng/spark-2874 and squashes the following commits:

8045d7a [Cheng Lian] Make sure test suites pass
8493a9e [Cheng Lian] Using eval to retain quoted arguments
aed523f [Cheng Lian] Fixed typo in bin/spark-sql
f12a0b1 [Cheng Lian] Worked arount SPARK-2678
daee105 [Cheng Lian] Fixed usage messages of all Spark SQL related scripts
(cherry picked from commit a6cd311)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
@asfgit asfgit closed this in a6cd311 Aug 6, 2014
@concretevitamin
Copy link
Contributor

@liancheng @andrewor14 @pwendell With this patch things like ./bin/spark-shell --master local[2] errors out ("bad options: --master"). I had to workaround this with an extra "--" before the flag. Is this intended?

@concretevitamin
Copy link
Contributor

Oh, it's been reported by #1825.

asfgit pushed a commit that referenced this pull request Aug 7, 2014
Backport  SPARK-2678 fix in PR #1801 to branch-1.0

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes #1831 from liancheng/spark-2678-for-1.0 and squashes the following commits:

cc59929 [Cheng Lian] Backported SPARK-2678 fix
@vanzin
Copy link
Contributor

vanzin commented Aug 7, 2014

This change broke backwards compatibility with existing spark-submit commands. e.g., this used to work fine and doesn't anymore:

spark-submit myapp.jar --master yarn-client --class my.App

You have to change it to:

spark-submit --master yarn-client --class my.App myapp.jar 

I don't think that's acceptable. Lot's of users will be tripped by that, especially since the error message is less than helpful.

@liancheng
Copy link
Contributor Author

@vanzin The point here is that, the only valid way to call spark-submit according both our documentation and the usage message spark-submit prints is

spark-submit [options] <app jar | python file> [app options]

And we decided to be only compatible with this spec we have at hand. Existing scripts that rely on this bug (i.e., putting --master after primary resource) should be fixed, and we're not going to be compatible with them. And we do plan to provide a cleaner spark-submit interface in future releases.

@vanzin
Copy link
Contributor

vanzin commented Aug 7, 2014

I see what you're saying, but what the docs say and what the code does are different things... well, let's see how many people are tripped by this once this change goes out.

A better error message, at least, would go a long way. Currently it says "Cannot load main class" and tells you to add "--verbose". But if you add "--verbose" at the end, you get the same error message... so nothing is clarified.

@liancheng
Copy link
Contributor Author

@vanzin I understand what you're worrying about :) Actually in #1715 we've roughly reached a cleaner solution (see the commends) than this one, but some details haven't been quite nailed down yet, and it's already close to 1.1 release. So, this PR is just a not so elegant fix of SPARK-2678 that aims to be compatible with the spec and sacrificing some clarity in bash scripts. We'll do it again in the right way as what Matei, Patrick, Andrew and I discussed in #1715 once details are nailed down.

@andrewor14
Copy link
Contributor

Yeah, I agree that this break in backward compatibility is pretty bad. At the same time we don't want to introduce some new config (e.g. the proposals in #1715) until we settle on a reasonably intuitive one. Otherwise we may have to support a clumsy way of passing arguments forever. Fortunately the documented way is not affected. It just so happens that our existing implementation is much more flexible than we intended it to be.

But yes, at the very least we should have an error message that points the users of the now deprecated way to the right direction.

chutium added a commit to chutium/spark that referenced this pull request Aug 8, 2014
asfgit pushed a commit that referenced this pull request Aug 10, 2014
As sryza reported, spark-shell doesn't accept any flags.
The root cause is wrong usage of spark-submit in spark-shell and it come to the surface by #1801

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes #1715, Closes #1864, and Closes #1861

Closes #1825 from sarutak/SPARK-2894 and squashes the following commits:

47f3510 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2894
2c899ed [Kousuke Saruta] Removed useless code from java_gateway.py
98287ed [Kousuke Saruta] Removed useless code from java_gateway.py
513ad2e [Kousuke Saruta] Modified util.sh to enable to use option including white spaces
28a374e [Kousuke Saruta] Modified java_gateway.py to recognize arguments
5afc584 [Cheng Lian] Filter out spark-submit options when starting Python gateway
e630d19 [Cheng Lian] Fixing pyspark and spark-shell CLI options
asfgit pushed a commit that referenced this pull request Aug 10, 2014
As sryza reported, spark-shell doesn't accept any flags.
The root cause is wrong usage of spark-submit in spark-shell and it come to the surface by #1801

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes #1715, Closes #1864, and Closes #1861

Closes #1825 from sarutak/SPARK-2894 and squashes the following commits:

47f3510 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2894
2c899ed [Kousuke Saruta] Removed useless code from java_gateway.py
98287ed [Kousuke Saruta] Removed useless code from java_gateway.py
513ad2e [Kousuke Saruta] Modified util.sh to enable to use option including white spaces
28a374e [Kousuke Saruta] Modified java_gateway.py to recognize arguments
5afc584 [Cheng Lian] Filter out spark-submit options when starting Python gateway
e630d19 [Cheng Lian] Fixing pyspark and spark-shell CLI options
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
JIRA issues:

- Main: [SPARK-2678](https://issues.apache.org/jira/browse/SPARK-2678)
- Related: [SPARK-2874](https://issues.apache.org/jira/browse/SPARK-2874)

Related PR:

- apache#1715

This PR is both a fix for SPARK-2874 and a workaround for SPARK-2678. Fixing SPARK-2678 completely requires some API level changes that need further discussion, and we decided not to include it in Spark 1.1 release. As currently SPARK-2678 only affects Spark SQL scripts, this workaround is enough for Spark 1.1. Command line option handling logic in bash scripts looks somewhat dirty and duplicated, but it helps to provide a cleaner user interface as well as retain full downward compatibility for now.

Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes apache#1801 from liancheng/spark-2874 and squashes the following commits:

8045d7a [Cheng Lian] Make sure test suites pass
8493a9e [Cheng Lian] Using eval to retain quoted arguments
aed523f [Cheng Lian] Fixed typo in bin/spark-sql
f12a0b1 [Cheng Lian] Worked arount SPARK-2678
daee105 [Cheng Lian] Fixed usage messages of all Spark SQL related scripts
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
As sryza reported, spark-shell doesn't accept any flags.
The root cause is wrong usage of spark-submit in spark-shell and it come to the surface by apache#1801

Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
Author: Cheng Lian <lian.cs.zju@gmail.com>

Closes apache#1715, Closes apache#1864, and Closes apache#1861

Closes apache#1825 from sarutak/SPARK-2894 and squashes the following commits:

47f3510 [Kousuke Saruta] Merge branch 'master' of git://git.apache.org/spark into SPARK-2894
2c899ed [Kousuke Saruta] Removed useless code from java_gateway.py
98287ed [Kousuke Saruta] Removed useless code from java_gateway.py
513ad2e [Kousuke Saruta] Modified util.sh to enable to use option including white spaces
28a374e [Kousuke Saruta] Modified java_gateway.py to recognize arguments
5afc584 [Cheng Lian] Filter out spark-submit options when starting Python gateway
e630d19 [Cheng Lian] Fixing pyspark and spark-shell CLI options
@liancheng liancheng deleted the spark-2874 branch September 24, 2014 00:08
viirya pushed a commit to viirya/spark-1 that referenced this pull request Oct 19, 2023
* rdar://84031512 Add stored procedure API

* Try to fix scala 2.13 build

* Trigger build

Co-authored-by: Anton Okolnychyi <aokolnychyi@apple.com>
Co-authored-by: Szehon Ho <szehon.apache@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants