Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAM-843] Aggressively project out metadata fields. #844

Merged
merged 1 commit into from
Oct 6, 2015

Conversation

fnothaft
Copy link
Member

@fnothaft fnothaft commented Oct 2, 2015

Resolves #843. I have tested this on our cluster and seen a 2.5x performance improvement for INDEL realignment and MarkDups, and a 1.5x performance improvement for BQSR. Additionally, we see approximately a 3-4x reduction in shuffle size.

@fnothaft fnothaft added this to the 0.18.0 milestone Oct 2, 2015
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/954/
Test PASSed.

sc.loadParquetAlignments(args.inputPath)
} else if (args.forceLoadParquet ||
args.limitProjection ||
args.useAlignedReadPredicate) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if either of these are present along with -force_load_[something other than parquet], should there be a warning?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it weird to have args.limitProjection || args.useAlignedReadPredicate cause interpretation of a file as parquet? would we rather throw an error if someone tries to load a .bam file with one of these flags? I'm not sure, just wondering

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to throw an IAE.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/962/

Build result: FAILURE

GitHub pull request #844 of commit b878311 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/844/merge^{commit} # timeout=10 > git branch -a --contains 02db391edc474258ceb81bcab9913d4a5b6f9bfd # timeout=10 > git rev-parse remotes/origin/pr/844/merge^{commit} # timeout=10Checking out Revision 02db391edc474258ceb81bcab9913d4a5b6f9bfd (origin/pr/844/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 02db391edc474258ceb81bcab9913d4a5b6f9bfdFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/963/
Test PASSed.

@@ -28,5 +28,5 @@ import org.bdgenomics.formats.avro.AlignmentRecord
*/
object AlignmentRecordField extends FieldEnumeration(AlignmentRecord.SCHEMA$) {

val contig, start, end, mapq, readName, sequence, mateAlignmentStart, cigar, qual, recordGroupId, recordGroupName, readPaired, properPair, readMapped, mateMapped, readNegativeStrand, mateNegativeStrand, firstOfPair, secondOfPair, primaryAlignment, failedVendorQualityChecks, duplicateRead, mismatchingPositions, attributes, recordGroupSequencingCenter, recordGroupDescription, recordGroupRunDateEpoch, recordGroupFlowOrder, recordGroupKeySequence, recordGroupLibrary, recordGroupPredictedMedianInsertSize, recordGroupPlatform, recordGroupPlatformUnit, recordGroupSample, mateContig, origQual, supplmentaryAlignment = SchemaValue
val contig, start, end, mapq, readName, sequence, mateAlignmentStart, cigar, qual, recordGroupId, recordGroupName, readPaired, properPair, readMapped, mateMapped, readNegativeStrand, mateNegativeStrand, firstOfPair, secondOfPair, primaryAlignment, failedVendorQualityChecks, duplicateRead, mismatchingPositions, attributes, recordGroupSequencingCenter, recordGroupDescription, recordGroupRunDateEpoch, recordGroupFlowOrder, recordGroupKeySequence, recordGroupLibrary, recordGroupPredictedMedianInsertSize, recordGroupPlatform, recordGroupPlatformUnit, recordGroupSample, mateContig, origQual, supplementaryAlignment, secondaryAlignment = SchemaValue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth wrapping this line? hard to tell what changed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wrap the line, the scripts/format-source script helpfully unwraps the line. Sigh! I agree though.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/ADAM-prb/971/
Test PASSed.

@heuermh
Copy link
Member

heuermh commented Oct 6, 2015

Thanks!

heuermh added a commit that referenced this pull request Oct 6, 2015
[ADAM-843] Aggressively project out metadata fields.
@heuermh heuermh merged commit 9d78117 into bigdatagenomics:master Oct 6, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants