[ADAM-1533] Set Theory #1561

devin-petersohn · 2017-06-09T17:06:44Z

WIP. Looking for quick feedback on architecture changes. The idea is to move the prepare code into the individual set theory primitive classes, which will unbloat GenomicRDD a bit.

Most of the primitives can be reduced to post-processing on the ShuffleRegionJoin implementations now that I have generalized joins to allow distances also.

TODO:

Implement one-to-self set theory primitives
Create Test cases
Better/More complete Docs

coveralls · 2017-06-09T17:17:21Z

Coverage decreased (-0.3%) to 82.842% when pulling 966be93 on devin-petersohn:issue#1533setTheory into ad5ae6d on bigdatagenomics:master.

AmplabJenkins · 2017-06-09T17:53:34Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2090/

Build result: ABORTED

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1561/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 1daa69788cb362d5664b22a495ca180ca1b99871 # timeout=10Checking out Revision 1daa69788cb362d5664b22a495ca180ca1b99871 (origin/pr/1561/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 1daa69788cb362d5664b22a495ca180ca1b99871First time build. Skipping changelog.Triggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result ABORTEDADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result ABORTEDADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

AmplabJenkins · 2017-06-09T18:02:38Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2091/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1561/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains c9ef99a590e6f63b0763b6a25b9df9f8dbf90d0e # timeout=10Checking out Revision c9ef99a590e6f63b0763b6a25b9df9f8dbf90d0e (origin/pr/1561/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f c9ef99a590e6f63b0763b6a25b9df9f8dbf90d0eFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

coveralls · 2017-06-09T20:21:54Z

Coverage increased (+0.2%) to 83.336% when pulling 348d150 on devin-petersohn:issue#1533setTheory into ad5ae6d on bigdatagenomics:master.

AmplabJenkins · 2017-06-09T20:28:42Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2094/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1561/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains 866341412bcb1847e609c4971aeb0eb21ad60026 # timeout=10Checking out Revision 866341412bcb1847e609c4971aeb0eb21ad60026 (origin/pr/1561/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 866341412bcb1847e609c4971aeb0eb21ad60026First time build. Skipping changelog.Triggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

fnothaft

I've left a few detailed comments within. I'm hesitant to move forward with this. My main architectural objection is that I think we're inverting an abstraction. Is there any reason we can't build the set theory primitives on top of the join code, instead of building the set theory operators using the guts of the join code? See my comment on closest for a concrete example.

Additionally, I'd like to avoid spreading the partitionMap data structure outside of the GenomicRDD hierarchy. We have to do a full scan of the data to compute the partitionMap, and I believe that we can implement a lighter weight alternative that is cheaper to compute and that is compatible with legacy formats.

fnothaft · 2017-06-09T19:56:35Z

adam-core/src/main/scala/org/bdgenomics/adam/models/ReferenceRegion.scala

+   *
+   * @return An empty ReferenceRegion.
+   */
+  private[adam] val empty: ReferenceRegion = ReferenceRegion("", 0L, 0L)


-1 on adding this. What's the use case? Also, this violates our width invariant (end - start > 0) and should throw an illegal argument exception.

I agree. This was in here temporarily for some tests.

fnothaft · 2017-06-09T19:56:54Z

adam-core/src/main/scala/org/bdgenomics/adam/rdd/GenomicRDD.scala

@@ -66,7 +67,6 @@ private[rdd] object GenomicRDD {
   * Replaces file references in a command.
   *
   * @see pipe
-   *


Please revert comment spacing changes throughout this file.

I will revert these on a future push. I want to avoid reverting and re-reverting repeatedly.

fnothaft · 2017-06-09T20:04:52Z

adam-core/src/main/scala/org/bdgenomics/adam/rdd/settheory/Closest.scala

+ * @tparam RT The resulting type of the left after the join.
+ * @tparam RU The resulting type of the right after the join.
+ */
+sealed abstract class Closest[T: ClassTag, U: ClassTag, RT, RU]


Please add inline docs to this class.

Also, I feel like we're inverting an abstraction here. IMO, the correct way to implement this would be to run a self joinAndGroupByLeft and to then:

val (left, rightIter) = kv (left, rightIter.min(_.distance(left)))

Closest doesn't necessarily overlap, and is performed between two RDDs. In your proposed architecture, in order to be truly exhaustive we'd end up joining each record on the left with all records with the same referenceName on the right.

fnothaft · 2017-06-09T20:07:40Z

adam-core/src/main/scala/org/bdgenomics/adam/rdd/settheory/SetTheory.scala

+  protected val optPartitionMap: Option[Array[Option[(ReferenceRegion, ReferenceRegion)]]]
+
+  /**
+   * The condition that should be met in order for the primitive to be


This would be clearer with an example.

Also, if this package is set theoretic, then we should describe it through set theoretic notation, right? I think this method would be "This method evaluates whether a given pair of regions should be members of the set output by this set theoretic operation." Is that correct?

"SetTheory" doesn't work for me as a class and package name.
Perhaps for package rdd.sets, and SetOperation for the top level abstract class?

+1 on package and class rename

fnothaft · 2017-06-09T20:15:52Z

adam-core/src/main/scala/org/bdgenomics/adam/rdd/settheory/SetTheory.scala

+ * @tparam RT The return type for the left side row data.
+ * @tparam RU The return type for the right side row data.
+ */
+private[settheory] abstract class SetTheoryBetweenCollections[T, U, RT, RU] extends SetTheory {


OOC, is there a reason to prefer an abstract class to a trait here? I know that ShuffleRegionJoin is an abstract class, but I was looking through the history and couldn't figure out what was the impetus for the change. CCing in @ryan-williams who made the change in 9ff0fa5.

I went ahead and put this back to a trait. I believe it makes more sense, especially now that we are using the GenomicRDD.

fnothaft · 2017-06-09T20:23:05Z

adam-core/src/main/scala/org/bdgenomics/adam/rdd/settheory/ShuffleRegionJoin.scala

+sealed abstract class ShuffleRegionJoin[T: ClassTag, U: ClassTag, RT, RU]
+    extends SetTheoryBetweenCollections[T, U, RT, RU] with SetTheoryPrimitive {
+
+  override protected def condition(firstRegion: ReferenceRegion,


Isn't condition abstract in the superclass? If so, we shouldn't be using the override modifier.

Is that proper style? Is it still useful for inheriting docs, as @Overrides does in java?

fnothaft · 2017-06-09T20:25:19Z

adam-core/src/main/scala/org/bdgenomics/adam/rdd/settheory/ShuffleRegionJoin.scala

+        val sortedLeft = leftRdd.sortByKey()
+        val partitionMap =
+          sortedLeft.mapPartitions(getRegionBoundsFromPartition).collect
+        (sortedLeft, partitionMap.map(_.get))


If the left RDD is not already sorted, this will lead to an execution DAG that does two full sorts of the left RDD.

fnothaft · 2017-06-09T20:26:29Z

adam-core/src/main/scala/org/bdgenomics/adam/rdd/settheory/ShuffleRegionJoin.scala

+    // convert to an IntervalArray for fast range query
+    val partitionMapIntervals = IntervalArray(
+      adjustedPartitionMapWithIndex,
+      adjustedPartitionMapWithIndex.maxBy(_._1.width)._1.width,


Prefer adjustedPartitionMapWithIndex.map(_._1.width).max.

This was an attempted optimization. Rather than iterating over the entire list twice, we do it once with maxBy. There is certainly little effect because the number of partitions will presumably be relatively small. I'll go ahead and change it.

fnothaft · 2017-06-09T20:34:21Z

adam-core/src/main/scala/org/bdgenomics/adam/rdd/settheory/Closest.scala

+    extends SetTheoryBetweenCollections[T, U, RT, RU]
+    with SetTheoryPrimitive {
+
+  var currentClosest: ReferenceRegion = ReferenceRegion.empty


Thread safety...

It hasn't been a problem yet, but I am working on a fix for this in the case that the Spark Serialization rules change. I don't like this here either.

fnothaft · 2017-06-09T20:39:08Z

adam-core/src/main/scala/org/bdgenomics/adam/rdd/GenomicRDD.scala

      implicit tTag: ClassTag[T],
      xTag: ClassTag[X],
      txTag: ClassTag[(T, X)]): GenomicRDD[(T, X), Z] = InnerShuffleJoin.time {

-    val (leftRddToJoin, rightRddToJoin) =
-      prepareForShuffleRegionJoin(genomicRdd, optPartitions)
+    val preparedLeft =


If we're moving the preparation logic into the join primitive, this code should move with it.

coveralls · 2017-06-12T19:07:16Z

Coverage increased (+0.05%) to 83.17% when pulling a4d196e on devin-petersohn:issue#1533setTheory into ad5ae6d on bigdatagenomics:master.

AmplabJenkins · 2017-06-12T19:10:18Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2095/
Test PASSed.

AmplabJenkins · 2017-06-13T20:49:43Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2096/
Test PASSed.

coveralls · 2017-06-13T21:44:51Z

Coverage increased (+0.2%) to 83.333% when pulling 564bc7d on devin-petersohn:issue#1533setTheory into ad5ae6d on bigdatagenomics:master.

coveralls · 2017-06-13T21:44:51Z

Coverage decreased (-0.3%) to 82.843% when pulling 564bc7d on devin-petersohn:issue#1533setTheory into ad5ae6d on bigdatagenomics:master.

…micRDD

coveralls · 2017-06-13T23:37:56Z

Coverage increased (+0.03%) to 83.155% when pulling a271b45 on devin-petersohn:issue#1533setTheory into ad5ae6d on bigdatagenomics:master.

AmplabJenkins · 2017-06-13T23:41:23Z

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2097/
Test PASSed.

fnothaft

Hi @devin-petersohn,

This is related to the comments I was making in our meeting on Monday. I don't like this architectural shift, because all of the set theoretic operations we are implementing should be able to be implemented on top of the region join primitive. I believe that all the primitives should map into a join [ -> aggregate ] -> predicate flow. There are several advantages to this:

This will allow us to support both join strategies (shuffle and broadcast) in most cases.
This approach requires substantially less code.
This approach should further isolate bugs.
This approach would allow us to minimize the openness of the interfaces we build the join primitives from.

Additionally, I'm not a big fan of making the partition map visible outside of GenomicRDD. Again, there are several reasons for this:

I think that if the entrypoint to ShuffleRegionJoin clearly assumes the contract that the two RDDs are copartitioned, then we can require that all callers enforce that contract, whether they are starting with a GenomicRDD or a plain ReferenceRegion-keyed RDD. If we have an entrypoint that does the prepwork and sets up the contract, then the contract is a bit less clear if you are calling with a ReferenceRegion-keyed RDD. I'm word vomiting a bit here, so let me know if this makes sense or not.
Additionally, this change means that we need to open up various protections on the partition map.
Part of which, I would like to avoid, because I think that we can refactor the partition map in a later PR to simplify the data structure and make it easier to compute.

Let me know your thoughts.

fnothaft · 2017-06-14T16:41:46Z

adam-core/src/main/scala/org/bdgenomics/adam/rdd/GenomicRDD.scala

@@ -154,14 +154,16 @@ trait GenomicRDD[T, U <: GenomicRDD[T, U]] extends Logging {
  // The (ReferenceRegion, ReferenceRegion) tuple contains the bounds of the 
  //   partition, such that the lowest start is first and the highest end is
  //   second.
-  private[rdd] val optPartitionMap: Option[Array[Option[(ReferenceRegion, ReferenceRegion)]]]


I'm a strong -1 on opening up protections on the partition map.

fnothaft · 2017-06-14T16:43:02Z

adam-core/src/main/scala/org/bdgenomics/adam/rdd/GenomicRDD.scala

@@ -1307,6 +1309,54 @@ abstract class AvroGenomicRDD[T <% IndexedRecord: Manifest, U <: AvroGenomicRDD[
  }
 }

+private[rdd] case class PartitionMap(private val optPartitionMap: Option[Array[Option[(ReferenceRegion, ReferenceRegion)]]]) {


As you may have guessed, I am also a strong -1 on making a PartitionMap class, esp. if it is not private to GenomicRDD.

devin-petersohn · 2017-06-15T18:29:10Z

all of the set theoretic operations we are implementing should be able to be implemented on top of the region join primitive

Except for unbounded closest, this is true. Unbounded closest is the one of the reasons the SetOperation class is abstracted this way; it uses much of the same architecture as joins, there are only differences in how to compute the closest and how to copartition the data. There are also SetOperation abstractions we need when we are dealing with a single collection, i.e. Merge. Merge can be implemented with a series of joins, but it would be much more expensive due to the increased cost/number of shuffle phases. I still have the SetOperationWithSingleCollection (and all the one-to-self primitives) to implement.

I believe that all the primitives should map into a join [ -> aggregate ] -> predicate flow

I agree 100% that the majority of the operations should be performed with a join as the first phase, independent of the type of join. However, this will not work, or be efficient, for UnboundedClosest and many one-to-self set theory operations, which are not yet in this PR. I can start laying the groundwork for allowing this, but fully implementing it will probably require a minor refactor of the broadcastRegionJoin code, and perhaps belongs in a separate PR.

I think that if the entrypoint to ShuffleRegionJoin clearly assumes the contract that the two RDDs are copartitioned

The way that it is currently architected, it does not assume this. The goal for pulling the prepareForShuffleRegionJoin code out was to reduce the bloat in GenomicRDD. It seems reasonable to me for SetOperations to prepare the data themselves, particularly if they have a different partitioning requirement than shuffleRegionJoin. This also gives us strong guarantees around not keeping RDDs that have duplicated records from copartitioning. If we do want to allow both SetOperation(GenomicRDD) and SetOperation(RDD[(ReferenceRegion, T)]), it seems that having the prepare code in the SetOperation class would be easiest. In the case that users want to call ShuffleRegionJoin(...).compute() themselves, they can do so without having to worry about how their data is partitioned. I agree that ideally, users would just call GenomicRDD.shuffleRegionJoin(), but there are cases where they cannot. I personally believe there is a stronger case for moving it into the SetOperations and letting each class define the optimal partition scheme for its respective operation, but I am happy to hear the opposing arguments.

I'm a strong -1 on opening up protections on the partition map.

I am not sure what you mean by opening up protections on the partition map since I have moved from private[rdd] -> protected. I do understand what you are saying about setting the optPartitionMap access to private[GenomicRDD], but I disagree with making it that strict. Part of the reason I switched to passing in GenomicRDDs for the SetOperations was because we have the prepare() code in the SetOperations, so they also need to know how the data is partitioned. One-to-self operations need access to the PartitionMap structure to avoid extra shuffle phases and reduce skew.

As you may have guessed, I am also a strong -1 on making a PartitionMap class, esp. if it is not private to GenomicRDD.

The PartitionMap class is a part of a (hopefully) better way of handling sorted data. I think we would rather know that the data is sorted, independent of whether or not the data has a partition map (which would begin to solve our issues of sorted legacy file formats). I plan to have an accompanying object that builds the PartitionMap when given a GenomicRDD or RDD[(ReferenceRegion, T)]. Making the partitionMap a lazy val will also reduce the amount of code and avoid computing it when it isn't needed. GenomicRDDs will take in a Boolean value for sorted rather than an optional partitionMap. There are also a lot of common operations performed on the PartitionMap (toIntervalArray). The PartitionMap class would live in its own file and be private[rdd]. Having a separate PartitionMap class would also clean out a lot of code in GenomicRDD related to computing the optPartitionMap.

Sorry for the wall of text. Feel free to address anything I've said.

devin-petersohn · 2017-06-21T23:06:39Z

adam-core/src/main/scala/org/bdgenomics/adam/rdd/sets/ShuffleRegionJoin.scala

+ * Perform an Inner ShuffleRegionJoin. This is publicly accessible to be
+ * compatible with legacy code.
+ */
+object InnerShuffleRegionJoin {


Pinging @heuermh and @fnothaft to get your thoughts on this architecture. This was what I came up with to guarantee consistency with our previous implementation. From the user point of view, it looks exactly the same as it did before, despite looking different under the hood.

coveralls · 2017-06-21T23:16:47Z

Coverage increased (+0.4%) to 83.527% when pulling 1f73378 on devin-petersohn:issue#1533setTheory into ad5ae6d on bigdatagenomics:master.

AmplabJenkins · 2017-06-21T23:21:04Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/ADAM-prb/2111/

Build result: FAILURE

[...truncated 15 lines...] > /home/jenkins/git2/bin/git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1561/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a -v --no-abbrev --contains f08e10f43371f4280767a3e0c8b22fc4bc6de9f8 # timeout=10Checking out Revision f08e10f43371f4280767a3e0c8b22fc4bc6de9f8 (origin/pr/1561/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f f08e10f43371f4280767a3e0c8b22fc4bc6de9f8First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.11,2.0.0,centosTriggering ADAM-prb ? 2.6.0,2.10,2.0.0,centosTriggering ADAM-prb ? 2.3.0,2.11,1.6.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.6.1,centosTriggering ADAM-prb ? 2.3.0,2.10,2.0.0,centosADAM-prb ? 2.6.0,2.11,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,2.0.0,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,2.0.0,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,2.0.0,centos completed with result FAILUREADAM-prb ? 2.3.0,2.11,1.6.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,1.6.1,centos completed with result FAILUREADAM-prb ? 2.3.0,2.10,2.0.0,centos completed with result FAILURENotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

devin-petersohn · 2017-09-18T22:48:56Z

Closing as won't merge. This work belongs in a downstream app.

devin-petersohn added 2 commits June 6, 2017 12:06

Creating set theory package

c0e07d8

Added Closest implementation

966be93

devin-petersohn added 2 commits June 9, 2017 10:45

Moving copartitioning into ShuffleRegionJoin

d9fff68

Fixing an issue where data was sorted twice

af9b6d8

devin-petersohn changed the title ~~Issue#1533set theory~~ [ADAM-1533] Set Theory Jun 9, 2017

Fixing some formatting issues

348d150

fnothaft reviewed Jun 9, 2017

View reviewed changes

devin-petersohn added 2 commits June 9, 2017 14:23

Formatting source files

a548111

Set theory package now accepts GenomicRDDs as input

a4d196e

More complete docs, adding edge test case

564bc7d

devin-petersohn added 3 commits June 13, 2017 14:58

Created PartitionMap class to interact with partitionmap outside Geno…

a271b45

…micRDD

Package rename settheory -> sets, class rename SetTheory -> SetOperation

1fc2e74

Moving join tests into sets/tests

2a03241

devin-petersohn mentioned this pull request Jun 13, 2017

How to filter genotype RDD with FeatureRDD #890

Closed

fnothaft requested changes Jun 14, 2017

View reviewed changes

devin-petersohn added 2 commits June 21, 2017 14:33

Changing the way PartitionMap works, allowing RDDs for inner join

3914092

Adding documentation and formatting

1f73378

devin-petersohn commented Jun 21, 2017

View reviewed changes

devin-petersohn mentioned this pull request Jun 22, 2017

[CANNOLI-33] Use ADAM tab5 formatter for bowtie bigdatagenomics/cannoli#42

Closed

devin-petersohn closed this Sep 18, 2017

heuermh added this to the 0.23.0 milestone Dec 7, 2017

[ADAM-1533] Set Theory #1561

[ADAM-1533] Set Theory #1561

Conversation

devin-petersohn commented Jun 9, 2017

coveralls commented Jun 9, 2017 • edited Loading

AmplabJenkins commented Jun 9, 2017

Build result: ABORTED

AmplabJenkins commented Jun 9, 2017

Build result: FAILURE

coveralls commented Jun 9, 2017 • edited Loading

AmplabJenkins commented Jun 9, 2017

Build result: FAILURE

fnothaft left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Jun 12, 2017 • edited Loading

AmplabJenkins commented Jun 12, 2017

AmplabJenkins commented Jun 13, 2017

coveralls commented Jun 13, 2017 • edited Loading

coveralls commented Jun 13, 2017

coveralls commented Jun 13, 2017 • edited Loading

AmplabJenkins commented Jun 13, 2017

fnothaft left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

devin-petersohn commented Jun 15, 2017 • edited Loading

Choose a reason for hiding this comment

coveralls commented Jun 21, 2017 • edited Loading

AmplabJenkins commented Jun 21, 2017

Build result: FAILURE

devin-petersohn commented Sep 18, 2017

coveralls commented Jun 9, 2017 •

edited

Loading

coveralls commented Jun 9, 2017 •

edited

Loading

coveralls commented Jun 12, 2017 •

edited

Loading

coveralls commented Jun 13, 2017 •

edited

Loading

coveralls commented Jun 13, 2017 •

edited

Loading

devin-petersohn commented Jun 15, 2017 •

edited

Loading

coveralls commented Jun 21, 2017 •

edited

Loading