-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clear diffuse high frequency kmers #3604
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3604 +/- ##
===============================================
- Coverage 79.738% 79.725% -0.014%
- Complexity 18153 18161 +8
===============================================
Files 1221 1223 +2
Lines 66623 66647 +24
Branches 10411 10409 -2
===============================================
+ Hits 53124 53134 +10
- Misses 9290 9307 +17
+ Partials 4209 4206 -3
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly trivial suggestions, except one question about a possible typo about <=
vs >
.
build.gradle
Outdated
@@ -123,7 +123,7 @@ dependencies { | |||
// Using the shaded version to avoid conflicts between its protobuf dependency | |||
// and that of Hadoop/Spark (either the one we reference explicitly, or the one | |||
// provided by dataproc). | |||
compile 'com.google.cloud:google-cloud-nio:0.23.1-alpha:shaded' | |||
compile 'com.google.cloud:google-cloud-nio:0.20.4-alpha-20170727.190814-1:shaded' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intended? and not sure why @lbergelson shows up on the author list for this PR. Github has been doing random things....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm assuming this was change was unintentional. It, along with the corresponding change in CommandLineProgram, should both be removed from this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shows this way because I based my PR on Louis' reversion.
@cmnbroad -- I included Louis' reversion so that I had code that worked on Spark. If I eliminate these changes and rebase on master will I have something that works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tedsharpe We just talked about this in the engine team meeting and we've merged Louis' reversion, so it should be in master now.
this.kmersPerPartitionGuess = kmersPerPartitionGuess; | ||
} | ||
|
||
public Iterator<KmerAndCount> apply( Iterator<GATKRead> readItr ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final
please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the class final.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry for the confusion, but I meant the iterator to be final.
|
||
/** | ||
* Iterates over reads, kmerizing them, and counting kmers that appear in a passed-in set. | ||
* It returns a KmerAndCount iterator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doc seems redundant
/** | ||
* Iterates over reads, kmerizing them, and checking the kmers against a set of KmerAndIntervals | ||
* to figure out which intervals a read belongs in. | ||
* Returns a QNameAndInterval iterator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line of doc seems redundant
} | ||
|
||
@Override | ||
public Iterator<QNameAndInterval> apply( GATKRead read ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final
please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Entire class is now final.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here, the read to be final
|
||
@Override | ||
public Iterator<QNameAndInterval> apply( GATKRead read ) { | ||
List<Integer> intervals = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final
please
public Iterator<QNameAndInterval> apply( GATKRead read ) { | ||
List<Integer> intervals = new ArrayList<>(); | ||
SVKmerizer.stream(read.getBases(), kSize, new SVKmerLong()) | ||
.map(kmer -> kmer.canonical(kSize)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm seeing this " stream followed by canonicalize " pattern repeated, would it make sense to extract a method like streamAndCanonicalize
in SVKmerizer
?
SVKmerizer.stream(read.getBases(), kSize, new SVKmerLong()) | ||
.map(kmer -> kmer.canonical(kSize)) | ||
.forEach(kmer -> { | ||
Iterator<KmerAndInterval> kmerAndIntervalIterator = kmerMap.findEach(kmer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final
please
new KmerCounter(kSize,kmersPerPartition,broadcastKmersAndIntervals.getValue()).apply(readItr)) | ||
.mapToPair(kmerAndCount -> new Tuple2<>(kmerAndCount.getKey(), kmerAndCount.getValue())) | ||
.reduceByKey(Integer::sum) | ||
.filter(pair -> pair._2() > maxQNamesPerKmer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean <=
instead of >
for the per kmer qname count filter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is right. I'm producing a list of ubiquitous kmers which I eliminate from the list of interesting kmers a few lines below here.
.collect(); | ||
|
||
for ( final SVKmer kmer : ubiquitousKmers ) { | ||
Iterator<KmerAndInterval> entryItr = kmersAndIntervals.findEach(kmer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final
@@ -304,7 +304,7 @@ protected void printSettings() { | |||
logger.info("Inflater: " + (usingIntelInflater ? "IntelInflater": "JdkInflater")); | |||
|
|||
logger.info("GCS max retries/reopens: " + BucketUtils.getCloudStorageConfiguration(NIO_MAX_REOPENS).maxChannelReopens()); | |||
logger.info("Using google-cloud-java: 0.23.1-alpha"); | |||
logger.info("Using google-cloud-java patch c035098b5e62cb4fe9155eff07ce88449a361f5d from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change should also be reverted.
Comments addressed. |
a6bb540
to
ca2973d
Compare
No description provided.