Upgrading to Spark 1.0 #256

tdanford · 2014-06-04T01:18:19Z

Upgrading the dependency on Spark to version 1.0.0.

The major changes here are:

all the RDD.groupBy calls return Iterables rather than Seqs, and
we now need an explicit dependency on fastutil.

tdanford · 2014-06-04T01:20:30Z

Fixes #253

AmplabJenkins · 2014-06-04T01:28:15Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/345/

massie · 2014-06-04T15:48:55Z

adam-core/pom.xml

@@ -102,6 +102,10 @@
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.artifact.suffix}</artifactId>
        </dependency>
+		<dependency>


Can you (or Intellij) format this for better readability?

Will do. I blame vim for this.

Also adding the fastutil dependency back in. Spark 0.9 had included the dependency on fastutil, which we also depended on! But Spark 1.0 apparently removes that dependency. So this commit adds that back in, for our own use.

Apparently, the signature for the RDD.groupBy method has changed (in 1.0) to return an Iterable rather than a Seq. This commit includes all the changes taht are needed to account for this change downstream in our code, mostly updating types to Iterable and inserting a few calls to toSeq in cases where that's not sufficient.

As suggested by Matt and Frank, updated two things: 1. the spark.kryo.referenceTracking value, set to 'true', which fixes a StackOverflowError, and 2. updated the target (test) values for the IndelRealignmentTargetSuite tests, which Frank says are apparently going to change soon anyway.

See the thread here: https://issues.apache.org/jira/browse/SPARK-1851

tdanford · 2014-06-05T12:09:14Z

Matt, I think this rebase should address your comments. Let me know if you see any other details to be fixed!

AmplabJenkins · 2014-06-05T12:33:49Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/347/

Upgrading to Spark 1.0

massie · 2014-06-06T04:44:46Z

Thanks, Timothy!

tdanford mentioned this pull request Jun 4, 2014

Upgrade dependency to Spark 1.0.0 #253

Closed

massie reviewed Jun 4, 2014
View reviewed changes

tdanford added 5 commits June 5, 2014 08:02

Updated Spark version to 1.0.0

03e8593

Also adding the fastutil dependency back in. Spark 0.9 had included the dependency on fastutil, which we also depended on! But Spark 1.0 apparently removes that dependency. So this commit adds that back in, for our own use.

Upgrade to Avro version 1.7.6

fb3d31f

See the thread here: https://issues.apache.org/jira/browse/SPARK-1851

Removed an unused import in AvailableComparisons

bc5e98c

massie added a commit that referenced this pull request Jun 6, 2014

Merge pull request #256 from genomebridge/spark-1.0

8a93aed

Upgrading to Spark 1.0

massie merged commit 8a93aed into bigdatagenomics:master Jun 6, 2014

tdanford deleted the spark-1.0 branch June 6, 2014 15:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrading to Spark 1.0 #256

Upgrading to Spark 1.0 #256

tdanford commented Jun 4, 2014

tdanford commented Jun 4, 2014

AmplabJenkins commented Jun 4, 2014

massie Jun 4, 2014

tdanford Jun 4, 2014

tdanford commented Jun 5, 2014

AmplabJenkins commented Jun 5, 2014

massie commented Jun 6, 2014

Upgrading to Spark 1.0 #256

Upgrading to Spark 1.0 #256

Conversation

tdanford commented Jun 4, 2014

tdanford commented Jun 4, 2014

AmplabJenkins commented Jun 4, 2014

massie Jun 4, 2014

Choose a reason for hiding this comment

tdanford Jun 4, 2014

Choose a reason for hiding this comment

tdanford commented Jun 5, 2014

AmplabJenkins commented Jun 5, 2014

massie commented Jun 6, 2014