Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Including sparkDeployMode parameter #3946

Merged
merged 2 commits into from
Jan 16, 2018
Merged

Conversation

AxVE
Copy link
Contributor

@AxVE AxVE commented Dec 12, 2017

A solution for the issue #3933

This Spark parameter is experimental, used only in cluster deploy-mode and was causing issues.
@codecov-io
Copy link

codecov-io commented Dec 12, 2017

Codecov Report

Merging #3946 into master will decrease coverage by 0.422%.
The diff coverage is n/a.

@@              Coverage Diff               @@
##              master    #3946       +/-   ##
==============================================
- Coverage     78.483%   78.06%   -0.422%     
+ Complexity     16560    16490       -70     
==============================================
  Files           1058     1058               
  Lines          59682    59682               
  Branches        9712     9712               
==============================================
- Hits           46840    46588      -252     
- Misses          9084     9346      +262     
+ Partials        3758     3748       -10
Impacted Files Coverage Δ Complexity Δ
...s/spark/ParallelCopyGCSDirectoryIntoHDFSSpark.java 0% <0%> (-75.51%) 0% <0%> (-17%)
...nder/tools/spark/pipelines/PrintVariantsSpark.java 0% <0%> (-66.667%) 0% <0%> (-2%)
...oadinstitute/hellbender/utils/test/XorWrapper.java 13.043% <0%> (-60.87%) 2% <0%> (-6%)
...lbender/engine/datasources/ReferenceAPISource.java 25.735% <0%> (-60.294%) 8% <0%> (-25%)
...oadinstitute/hellbender/utils/gcs/BucketUtils.java 54.194% <0%> (-25.806%) 30% <0%> (-9%)
...nder/tools/spark/BaseRecalibratorSparkSharded.java 0% <0%> (-22.807%) 0% <0%> (-2%)
...ender/engine/datasources/ReferenceMultiSource.java 61.538% <0%> (-11.538%) 9% <0%> (ø)
...titute/hellbender/utils/test/MiniClusterUtils.java 78.947% <0%> (-10.526%) 6% <0%> (-1%)
...institute/hellbender/exceptions/UserException.java 66.406% <0%> (-3.125%) 3% <0%> (ø)
...adinstitute/hellbender/engine/ReadsDataSource.java 89.394% <0%> (-3.03%) 61% <0%> (-2%)
... and 9 more

@@ -20,6 +20,9 @@
@Argument(fullName = "sparkMaster", doc="URL of the Spark Master to submit jobs to when using the Spark pipeline runner.", optional = true)
private String sparkMaster = SparkContextFactory.DEFAULT_SPARK_MASTER;

@Argument(fullName = "sparkDeployMode", doc="Mode of the spark driver. Default value: client. Possible values: {client, cluster}.", optional = true)
private String sparkDeployMode = "client";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that this should be set in the SparkContext for the SparkCommandLineProgram using the SparkContextFactory. But as a suggestion for downstream projects: can it be possible to make SparkCommandLineArgumentCollection abstract (or an interface), and pass the specified sparkMaster to the map in getSparkProperties() (I guess that using propertyMap.put("spark.master", sparkMaster) should work as expected...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the deploy-mode is related to the spark driver, a.k.a. sparkContext, I put it with the sparkMaster instead of more jobs-related spark parameters (as the programName). However, I could and I should do an accessor (getSparkDeployMode) and try to use it.
At first, I checked if SparkContext was able to give the default deploy-mode (which is 'client' most of the time) to do something as private String sparkDeployMode = SparkContextFactory.DEFAULT_SPARK_DEPLOYMODE; but I failed. I could invest more time and add it myself but I didn't get enough.

I don't think I really undestand your second part. Do you mean getSparkProperties should include the sparkMaster property too ? Is it not already the case ? (I didn't check this part :/).

Note: I'm not a GATK developer. I'm just someone who need the ability to set the deploy-mode. As I had to edit the code to remove a "bug" (the UserClassPathFirst property set to true instead of the default false), I took the opportunity to implement the deploy-mode parameter to avoid to do --conf spark.submit.deployMode=cluster. I'm not sure I have done it in the GATK way...

@lbergelson
Copy link
Member

lbergelson commented Dec 13, 2017

@AxVE Thanks for this PR. We really appreciate your interest and work on resolving this issue! It might take a little bit for me to get to reviewing it properly, we're currently preparing for our release and we're a bit swamped with various issues. I'm worried about changing the userClassPathFirst property. We added that a long time ago because it fixed some issues we were running into at the time. It's completely possible that we no longer have the same issue and it's a harmful remnant from a previous time, but I'm afraid that changing it might have unanticipated consequences in our own spark environment. Unfortunately we don't have good automated tests that would necessarily identify any issue.

@cwhelan Would you be able to test your pipeline with - "spark.driver.userClassPathFirst" : "false" and see if you run into any issues?

I'm also a bit confused about why the change to the arguments is necessary. Clearly in your environment it is, but it goes against my understanding of how we set the arguments to spark submit, so I want to properly understand why the existing --deploy-mode arguments aren't working for you before adding an additional hardcoded argument to the launch script. (As I'm sure you've seen, the launch script is a pretty crufty and brittle piece of code that was really meant to be replaced with a more robust solution by now, so any additional complexity in would be great to avoid...)

@AxVE
Copy link
Contributor Author

AxVE commented Dec 13, 2017

@lbergelson I understand the change of userClassPathFirst can cause problems.
About the new parameter, I shouldn't have "pull request" it. It's absolutly unnecessary as --conf 'spark.submit.deployMode=cluster' works. Plus, as you said, it's hardcoded...
About #3933, I didn't retry the -- --deploy-mode solution after my userClassPathFirst modification.

I added the parameter in my fork to test. I still think it's important to have it in gatk's parameters because it's simple for users, but it's not an emergency.

For me, the userClassPathFirst change is important. Or a parameter to specify it. Without it, I can't get my jobs to work in cluster mode.

@AxVE
Copy link
Contributor Author

AxVE commented Dec 13, 2017

@lbergelson I recheck with -- --deploy-mode cluster and it works.
I will remove my commits which add the parameter sparkDeployMode.

@lbergelson
Copy link
Member

@cwhelan Is there any chance you could run an SV pipeline with this change and see if it works? We added the classpath setting a long time ago for mysterious reasons, and have been afraid to remove it because we don't have good automated tests that run on the actual dataproc environment. I ran our very simple tests with this change but I want to check that it doesn't have negative consequences for your tools.

I would really like to merge this if we can because it's recommended that you don't use this option unless you absolutely have to.

@cwhelan
Copy link
Member

cwhelan commented Jan 12, 2018

I just ran our pipeline with --conf spark.driver.userClassPathFirst=false. It failed near the end with an error that I think is unrelated to the parameter (looks like a regression bug in our logic introduced recently), so I'm inclined to believe changing this setting is fine.

@lbergelson lbergelson self-assigned this Jan 16, 2018
Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't been able to find any justification for keeping this set to true. It seems likely we've outgrown whatever issue this was a workaround for. We'll have to monitor and see if we hit any new issues after changing this, but tests, automated and manual, haven't found anything.

@lbergelson
Copy link
Member

@AxVE Thanks for this! Sorry for the long wait. We'll have to monitor to see if changing it introduces some obscure issues for us that we haven't been able to figure out yet, but it seems like it's probably safe.

@lbergelson lbergelson merged commit e03ead9 into broadinstitute:master Jan 16, 2018
@AxVE
Copy link
Contributor Author

AxVE commented Jan 17, 2018

@lbergelson No problem and thanks you. If I find out some other issues ( I'm going to test some Spark tools) on my side I'll say it to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants