-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding several simple normalizations. #311
Adding several simple normalizations. #311
Conversation
* | ||
* @see pkn | ||
*/ | ||
def apply[I <: Interval, T](rdd: RDD[(Double, I, T)]): RDD[(Double, I, T)] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity, why the (Double, I, T) ordering here? I would have expected most of these RDDs to have a form keyed off the interval (i.e. (I, Double)) and if you wanted to carry along extra information, maybe a key-value where the value was a pair (i.e. (I, (Double, T))).
I'm still reviewing this @fnothaft, sorry for the delay... |
Ping @tdanford. |
Ping @fnothaft -- see my question, above, about the |
Ping @fnothaft :-) |
@tdanford just addressed your comment, and rebased the change on master. |
Jenkins, retest this please. |
* @tparam T Type of data passed along. | ||
*/ | ||
def apply[T](rdd: RDD[(Double, T)]): RDD[(Double, T)] = { | ||
val cachedRdd = rdd.cache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, Frank, why (here, and above as well) the explicit call to cache
and unpersist
? Given that these RDDs appear to only be used once...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tdanford cachedRdd is used in 1 count and 3 separate map calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No description provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed; all your base are belong to me now.
Jenkins, retest this please. |
I'm going to merge this anyway, since it builds in my hands -- and the tests that are failing aren't the ones you've added here. |
Adding several simple normalizations.
Thanks, @fnothaft! |
Added two normalizations:
@tdanford @carlyeks and I discussed the best place for this, and thought that it made the most sense to have it inside of ADAM, as downstream tools like RNAdam will depend on it, and the normalizations are useful primitives across many omics algorithms. Also, I would entertain any comments about whether org.bdgenomics.adam.rdd.normalization is the best package for this.