Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrading to Spark 1.0 #256

Merged
merged 5 commits into from
Jun 6, 2014
Merged

Conversation

tdanford
Copy link
Contributor

@tdanford tdanford commented Jun 4, 2014

Upgrading the dependency on Spark to version 1.0.0.

The major changes here are:

  1. all the RDD.groupBy calls return Iterables rather than Seqs, and
  2. we now need an explicit dependency on fastutil.

@tdanford
Copy link
Contributor Author

tdanford commented Jun 4, 2014

Fixes #253

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/345/

@@ -102,6 +102,10 @@
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_${scala.artifact.suffix}</artifactId>
</dependency>
<dependency>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you (or Intellij) format this for better readability?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. I blame vim for this.

Also adding the fastutil dependency back in. Spark 0.9 had included the dependency on
fastutil, which we also depended on! But Spark 1.0 apparently removes that dependency.
So this commit adds that back in, for our own use.
Apparently, the signature for the RDD.groupBy method has changed (in 1.0) to return an Iterable
rather than a Seq.  This commit includes all the changes taht are needed to account for this change
downstream in our code, mostly updating types to Iterable and inserting a few calls to toSeq in cases
where that's not sufficient.
As suggested by Matt and Frank, updated two things:
1. the spark.kryo.referenceTracking value, set to 'true', which fixes a StackOverflowError, and
2. updated the target (test) values for the IndelRealignmentTargetSuite tests, which Frank says are apparently
   going to change soon anyway.
@tdanford
Copy link
Contributor Author

tdanford commented Jun 5, 2014

Matt, I think this rebase should address your comments. Let me know if you see any other details to be fixed!

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/347/

massie added a commit that referenced this pull request Jun 6, 2014
@massie massie merged commit 8a93aed into bigdatagenomics:master Jun 6, 2014
@massie
Copy link
Member

massie commented Jun 6, 2014

Thanks, Timothy!

@tdanford tdanford deleted the spark-1.0 branch June 6, 2014 15:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants