Including sparkDeployMode parameter #3946

AxVE · 2017-12-12T15:54:20Z

A solution for the issue #3933

This Spark parameter is experimental, used only in cluster deploy-mode and was causing issues.

codecov-io · 2017-12-12T16:50:23Z

Codecov Report

Merging #3946 into master will decrease coverage by 0.422%.
The diff coverage is n/a.

@@              Coverage Diff               @@
##              master    #3946       +/-   ##
==============================================
- Coverage     78.483%   78.06%   -0.422%     
+ Complexity     16560    16490       -70     
==============================================
  Files           1058     1058               
  Lines          59682    59682               
  Branches        9712     9712               
==============================================
- Hits           46840    46588      -252     
- Misses          9084     9346      +262     
+ Partials        3758     3748       -10

Impacted Files	Coverage Δ	Complexity Δ
...s/spark/ParallelCopyGCSDirectoryIntoHDFSSpark.java	`0% <0%> (-75.51%)`	`0% <0%> (-17%)`
...nder/tools/spark/pipelines/PrintVariantsSpark.java	`0% <0%> (-66.667%)`	`0% <0%> (-2%)`
...oadinstitute/hellbender/utils/test/XorWrapper.java	`13.043% <0%> (-60.87%)`	`2% <0%> (-6%)`
...lbender/engine/datasources/ReferenceAPISource.java	`25.735% <0%> (-60.294%)`	`8% <0%> (-25%)`
...oadinstitute/hellbender/utils/gcs/BucketUtils.java	`54.194% <0%> (-25.806%)`	`30% <0%> (-9%)`
...nder/tools/spark/BaseRecalibratorSparkSharded.java	`0% <0%> (-22.807%)`	`0% <0%> (-2%)`
...ender/engine/datasources/ReferenceMultiSource.java	`61.538% <0%> (-11.538%)`	`9% <0%> (ø)`
...titute/hellbender/utils/test/MiniClusterUtils.java	`78.947% <0%> (-10.526%)`	`6% <0%> (-1%)`
...institute/hellbender/exceptions/UserException.java	`66.406% <0%> (-3.125%)`	`3% <0%> (ø)`
...adinstitute/hellbender/engine/ReadsDataSource.java	`89.394% <0%> (-3.03%)`	`61% <0%> (-2%)`
... and 9 more

magicDGS · 2017-12-13T10:50:02Z

...main/java/org/broadinstitute/hellbender/engine/spark/SparkCommandLineArgumentCollection.java

@@ -20,6 +20,9 @@
    @Argument(fullName = "sparkMaster", doc="URL of the Spark Master to submit jobs to when using the Spark pipeline runner.", optional = true)
    private String sparkMaster = SparkContextFactory.DEFAULT_SPARK_MASTER;

+    @Argument(fullName = "sparkDeployMode", doc="Mode of the spark driver. Default value: client. Possible values: {client, cluster}.", optional = true)
+    private String sparkDeployMode = "client";


I guess that this should be set in the SparkContext for the SparkCommandLineProgram using the SparkContextFactory. But as a suggestion for downstream projects: can it be possible to make SparkCommandLineArgumentCollection abstract (or an interface), and pass the specified sparkMaster to the map in getSparkProperties() (I guess that using propertyMap.put("spark.master", sparkMaster) should work as expected...

As the deploy-mode is related to the spark driver, a.k.a. sparkContext, I put it with the sparkMaster instead of more jobs-related spark parameters (as the programName). However, I could and I should do an accessor (getSparkDeployMode) and try to use it.
At first, I checked if SparkContext was able to give the default deploy-mode (which is 'client' most of the time) to do something as private String sparkDeployMode = SparkContextFactory.DEFAULT_SPARK_DEPLOYMODE; but I failed. I could invest more time and add it myself but I didn't get enough.

I don't think I really undestand your second part. Do you mean getSparkProperties should include the sparkMaster property too ? Is it not already the case ? (I didn't check this part :/).

Note: I'm not a GATK developer. I'm just someone who need the ability to set the deploy-mode. As I had to edit the code to remove a "bug" (the UserClassPathFirst property set to true instead of the default false), I took the opportunity to implement the deploy-mode parameter to avoid to do --conf spark.submit.deployMode=cluster. I'm not sure I have done it in the GATK way...

lbergelson · 2017-12-13T15:45:38Z

@AxVE Thanks for this PR. We really appreciate your interest and work on resolving this issue! It might take a little bit for me to get to reviewing it properly, we're currently preparing for our release and we're a bit swamped with various issues. I'm worried about changing the userClassPathFirst property. We added that a long time ago because it fixed some issues we were running into at the time. It's completely possible that we no longer have the same issue and it's a harmful remnant from a previous time, but I'm afraid that changing it might have unanticipated consequences in our own spark environment. Unfortunately we don't have good automated tests that would necessarily identify any issue.

@cwhelan Would you be able to test your pipeline with - "spark.driver.userClassPathFirst" : "false" and see if you run into any issues?

I'm also a bit confused about why the change to the arguments is necessary. Clearly in your environment it is, but it goes against my understanding of how we set the arguments to spark submit, so I want to properly understand why the existing --deploy-mode arguments aren't working for you before adding an additional hardcoded argument to the launch script. (As I'm sure you've seen, the launch script is a pretty crufty and brittle piece of code that was really meant to be replaced with a more robust solution by now, so any additional complexity in would be great to avoid...)

AxVE · 2017-12-13T16:15:00Z

@lbergelson I understand the change of userClassPathFirst can cause problems.
About the new parameter, I shouldn't have "pull request" it. It's absolutly unnecessary as --conf 'spark.submit.deployMode=cluster' works. Plus, as you said, it's hardcoded...
About #3933, I didn't retry the -- --deploy-mode solution after my userClassPathFirst modification.

I added the parameter in my fork to test. I still think it's important to have it in gatk's parameters because it's simple for users, but it's not an emergency.

For me, the userClassPathFirst change is important. Or a parameter to specify it. Without it, I can't get my jobs to work in cluster mode.

AxVE · 2017-12-13T17:14:37Z

@lbergelson I recheck with -- --deploy-mode cluster and it works.
I will remove my commits which add the parameter sparkDeployMode.

Update to gatk 4.0

lbergelson · 2018-01-11T21:28:27Z

@cwhelan Is there any chance you could run an SV pipeline with this change and see if it works? We added the classpath setting a long time ago for mysterious reasons, and have been afraid to remove it because we don't have good automated tests that run on the actual dataproc environment. I ran our very simple tests with this change but I want to check that it doesn't have negative consequences for your tools.

I would really like to merge this if we can because it's recommended that you don't use this option unless you absolutely have to.

cwhelan · 2018-01-12T20:40:56Z

I just ran our pipeline with --conf spark.driver.userClassPathFirst=false. It failed near the end with an error that I think is unrelated to the parameter (looks like a regression bug in our logic introduced recently), so I'm inclined to believe changing this setting is fine.

lbergelson

We haven't been able to find any justification for keeping this set to true. It seems likely we've outgrown whatever issue this was a workaround for. We'll have to monitor and see if we hit any new issues after changing this, but tests, automated and manual, haven't found anything.

lbergelson · 2018-01-16T22:05:18Z

@AxVE Thanks for this! Sorry for the long wait. We'll have to monitor to see if changing it introduces some obscure issues for us that we haven't been able to figure out yet, but it seems like it's probably safe.

AxVE · 2018-01-17T09:17:43Z

@lbergelson No problem and thanks you. If I find out some other issues ( I'm going to test some Spark tools) on my side I'll say it to you.

Desactivate userClassPathFirst parameter.

9528df5

This Spark parameter is experimental, used only in cluster deploy-mode and was causing issues.

magicDGS reviewed Dec 13, 2017

View reviewed changes

AxVE force-pushed the master branch from 65092a5 to 9528df5 Compare December 13, 2017 17:24

Merge remote-tracking branch 'upstream/master'

bec2fb8

Update to gatk 4.0

lbergelson self-assigned this Jan 16, 2018

lbergelson added the Spark label Jan 16, 2018

lbergelson approved these changes Jan 16, 2018

View reviewed changes

lbergelson merged commit e03ead9 into broadinstitute:master Jan 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Including sparkDeployMode parameter #3946

Including sparkDeployMode parameter #3946

AxVE commented Dec 12, 2017

codecov-io commented Dec 12, 2017 •

edited

Loading

magicDGS Dec 13, 2017

AxVE Dec 13, 2017

lbergelson commented Dec 13, 2017 •

edited

Loading

AxVE commented Dec 13, 2017

AxVE commented Dec 13, 2017 •

edited

Loading

lbergelson commented Jan 11, 2018

cwhelan commented Jan 12, 2018

lbergelson left a comment

lbergelson commented Jan 16, 2018

AxVE commented Jan 17, 2018

Including sparkDeployMode parameter #3946

Including sparkDeployMode parameter #3946

Conversation

AxVE commented Dec 12, 2017

codecov-io commented Dec 12, 2017 • edited Loading

Codecov Report

magicDGS Dec 13, 2017

Choose a reason for hiding this comment

AxVE Dec 13, 2017

Choose a reason for hiding this comment

lbergelson commented Dec 13, 2017 • edited Loading

AxVE commented Dec 13, 2017

AxVE commented Dec 13, 2017 • edited Loading

lbergelson commented Jan 11, 2018

cwhelan commented Jan 12, 2018

lbergelson left a comment

Choose a reason for hiding this comment

lbergelson commented Jan 16, 2018

AxVE commented Jan 17, 2018

codecov-io commented Dec 12, 2017 •

edited

Loading

lbergelson commented Dec 13, 2017 •

edited

Loading

AxVE commented Dec 13, 2017 •

edited

Loading