Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clear diffuse high frequency kmers #3604

Merged
merged 1 commit into from
Sep 25, 2017
Merged

Conversation

tedsharpe
Copy link
Contributor

No description provided.

@codecov-io
Copy link

codecov-io commented Sep 23, 2017

Codecov Report

Merging #3604 into master will decrease coverage by 0.014%.
The diff coverage is 72%.

@@               Coverage Diff               @@
##              master     #3604       +/-   ##
===============================================
- Coverage     79.738%   79.725%   -0.014%     
- Complexity     18153     18161        +8     
===============================================
  Files           1221      1223        +2     
  Lines          66623     66647       +24     
  Branches       10411     10409        -2     
===============================================
+ Hits           53124     53134       +10     
- Misses          9290      9307       +17     
+ Partials        4209      4206        -3
Impacted Files Coverage Δ Complexity Δ
...tructuralVariationDiscoveryArgumentCollection.java 96.429% <ø> (-0.123%) 0 <0> (ø)
...ls/spark/sv/evidence/FindBadGenomicKmersSpark.java 51.389% <ø> (+2.453%) 13 <0> (ø) ⬇️
.../tools/spark/sv/evidence/QNamesForKmersFinder.java 0% <0%> (-100%) 0 <0> (-7)
...r/tools/spark/sv/evidence/QNameIntervalFinder.java 100% <100%> (ø) 6 <6> (?)
...ellbender/tools/spark/sv/evidence/KmerCounter.java 100% <100%> (ø) 6 <6> (?)
...te/hellbender/tools/spark/sv/utils/SVKmerizer.java 83.673% <100%> (-3.827%) 18 <1> (ø)
...llbender/tools/spark/sv/evidence/KmerAndCount.java 40.909% <40.909%> (ø) 7 <7> (?)
...spark/sv/evidence/FindBreakpointEvidenceSpark.java 66.427% <70.732%> (-0.837%) 50 <6> (+3)
...nder/tools/spark/sv/evidence/QNameAndInterval.java 63.158% <0%> (-2.632%) 11% <0%> (-1%)
... and 8 more

@SHuang-Broad SHuang-Broad self-assigned this Sep 24, 2017
Copy link
Contributor

@SHuang-Broad SHuang-Broad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly trivial suggestions, except one question about a possible typo about <= vs >.

build.gradle Outdated
@@ -123,7 +123,7 @@ dependencies {
// Using the shaded version to avoid conflicts between its protobuf dependency
// and that of Hadoop/Spark (either the one we reference explicitly, or the one
// provided by dataproc).
compile 'com.google.cloud:google-cloud-nio:0.23.1-alpha:shaded'
compile 'com.google.cloud:google-cloud-nio:0.20.4-alpha-20170727.190814-1:shaded'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intended? and not sure why @lbergelson shows up on the author list for this PR. Github has been doing random things....

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm assuming this was change was unintentional. It, along with the corresponding change in CommandLineProgram, should both be removed from this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shows this way because I based my PR on Louis' reversion.
@cmnbroad -- I included Louis' reversion so that I had code that worked on Spark. If I eliminate these changes and rebase on master will I have something that works?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tedsharpe We just talked about this in the engine team meeting and we've merged Louis' reversion, so it should be in master now.

this.kmersPerPartitionGuess = kmersPerPartitionGuess;
}

public Iterator<KmerAndCount> apply( Iterator<GATKRead> readItr ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the class final.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the confusion, but I meant the iterator to be final.


/**
* Iterates over reads, kmerizing them, and counting kmers that appear in a passed-in set.
* It returns a KmerAndCount iterator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doc seems redundant

/**
* Iterates over reads, kmerizing them, and checking the kmers against a set of KmerAndIntervals
* to figure out which intervals a read belongs in.
* Returns a QNameAndInterval iterator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line of doc seems redundant

}

@Override
public Iterator<QNameAndInterval> apply( GATKRead read ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entire class is now final.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, the read to be final


@Override
public Iterator<QNameAndInterval> apply( GATKRead read ) {
List<Integer> intervals = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final please

public Iterator<QNameAndInterval> apply( GATKRead read ) {
List<Integer> intervals = new ArrayList<>();
SVKmerizer.stream(read.getBases(), kSize, new SVKmerLong())
.map(kmer -> kmer.canonical(kSize))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing this " stream followed by canonicalize " pattern repeated, would it make sense to extract a method like streamAndCanonicalize in SVKmerizer?

SVKmerizer.stream(read.getBases(), kSize, new SVKmerLong())
.map(kmer -> kmer.canonical(kSize))
.forEach(kmer -> {
Iterator<KmerAndInterval> kmerAndIntervalIterator = kmerMap.findEach(kmer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final please

new KmerCounter(kSize,kmersPerPartition,broadcastKmersAndIntervals.getValue()).apply(readItr))
.mapToPair(kmerAndCount -> new Tuple2<>(kmerAndCount.getKey(), kmerAndCount.getValue()))
.reduceByKey(Integer::sum)
.filter(pair -> pair._2() > maxQNamesPerKmer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean <= instead of > for the per kmer qname count filter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is right. I'm producing a list of ubiquitous kmers which I eliminate from the list of interesting kmers a few lines below here.

.collect();

for ( final SVKmer kmer : ubiquitousKmers ) {
Iterator<KmerAndInterval> entryItr = kmersAndIntervals.findEach(kmer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final

@@ -304,7 +304,7 @@ protected void printSettings() {
logger.info("Inflater: " + (usingIntelInflater ? "IntelInflater": "JdkInflater"));

logger.info("GCS max retries/reopens: " + BucketUtils.getCloudStorageConfiguration(NIO_MAX_REOPENS).maxChannelReopens());
logger.info("Using google-cloud-java: 0.23.1-alpha");
logger.info("Using google-cloud-java patch c035098b5e62cb4fe9155eff07ce88449a361f5d from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should also be reverted.

@tedsharpe
Copy link
Contributor Author

Comments addressed.

@tedsharpe tedsharpe merged commit 01b1160 into master Sep 25, 2017
@tedsharpe tedsharpe deleted the tws_clear_diffuse_hifreq_kmers branch September 25, 2017 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants