-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADAM-952] Expose sorting by reference index. #1045
Conversation
Test FAILed. Build result: FAILUREGitHub pull request #1045 of commit c059505 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1045/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 4392844e06cd8809682105f9cff03a3ad79b5acd # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1045/merge^{commit} # timeout=10Checking out Revision 4392844e06cd8809682105f9cff03a3ad79b5acd (origin/pr/1045/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 4392844e06cd8809682105f9cff03a3ad79b5acdFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
Fixed build issue! Should be good to go now. |
Test FAILed. Build result: FAILUREGitHub pull request #1045 of commit e112a83 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1045/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains 1af3d16611f2df6a33845d53b0fde6f7978b51d6 # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1045/merge^{commit} # timeout=10Checking out Revision 1af3d16611f2df6a33845d53b0fde6f7978b51d6 (origin/pr/1045/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f 1af3d16611f2df6a33845d53b0fde6f7978b51d6First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
Sorry, was missing the updated file in the push! Sigh. Retesting now. |
Test FAILed. Build result: FAILUREGitHub pull request #1045 of commit a8c5dba automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > /home/jenkins/git2/bin/git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > /home/jenkins/git2/bin/git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > /home/jenkins/git2/bin/git --version # timeout=10 > /home/jenkins/git2/bin/git -c core.askpass=true fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ # timeout=15 > /home/jenkins/git2/bin/git rev-parse origin/pr/1045/merge^{commit} # timeout=10 > /home/jenkins/git2/bin/git branch -a --contains e26696b # timeout=10 > /home/jenkins/git2/bin/git rev-parse remotes/origin/pr/1045/merge^{commit} # timeout=10Checking out Revision e26696b (origin/pr/1045/merge) > /home/jenkins/git2/bin/git config core.sparsecheckout # timeout=10 > /home/jenkins/git2/bin/git checkout -f e26696b8d31578c80366a989deb1b9fcf3b5e67fFirst time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.10,1.5.2,centosTriggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
Fixed a unit test issue and rebased. |
Test PASSed. |
@@ -207,7 +210,11 @@ class Transform(protected val args: TransformArgs) extends BDGSparkCommand[Trans | |||
} | |||
|
|||
log.info("Sorting reads") | |||
adamRecords = oldRdd.sortReadsByReferencePosition() | |||
if (args.sortLexicographically) { | |||
adamRecords = oldRdd.sortReadsByReferencePosition() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the pattern here of all these if
blocks overwriting adamRecords
is confusing. I know it predates this PR so not necessarily a blocker here, but any thoughts as to how to do all of this more clearly?
e.g. there are a bunch of lines manipulating adamRecords
above this that are all rendered moot by these lines, seemingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #1053 for this. I agree that it needs a good cleaning, but I'd like to hold off until after 0.20.0 due to bandwidth/schedule limitations.
Resolves bigdatagenomics#952. Adds function `sortByReferenceIndexAndPosition` on RDDs of `AlignmentRecord`. This sorts reads by their position on a contig, where contigs are ordered by contig index. This conforms to the SAM/BAM sort order.
2e46382
to
05f061f
Compare
Just patched this up with @ryan-williams's review comments and rebased. |
Test PASSed. |
+1 |
merged e51bd90 |
yup indeed, thanks @fnothaft |
Resolves #952. Adds function
sortByReferenceIndexAndPosition
on RDDs ofAlignmentRecord
. This sorts reads by their position on a contig, where contigs are ordered by contig index. This conforms to the SAM/BAM sort order.