Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve sorter. #81

Merged
merged 3 commits into from
Jun 2, 2017
Merged

Improve sorter. #81

merged 3 commits into from
Jun 2, 2017

Conversation

alumi
Copy link
Member

@alumi alumi commented Jun 2, 2017

Summary

This PR improves functions and performance of cljam.sorter.
Also resolves task 7 in #61.

Changes

  • Improve performance of sort-by-pos (about 2x faster).
  • Implemented sort-by-qname.
  • Configurable chunk size.
  • Sort SAM and directly output to BAM and vice versa.
  • Add tests.

Affected modules

  • cljam.sorter
  • cljam.bam
  • cljam.sam

Notes

  • sort-by-pos does not check qname, the same as samtools.

@codecov
Copy link

codecov bot commented Jun 2, 2017

Codecov Report

Merging #81 into master will increase coverage by 0.6%.
The diff coverage is 92.77%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master      #81     +/-   ##
=========================================
+ Coverage   82.42%   83.02%   +0.6%     
=========================================
  Files          62       62             
  Lines        4113     4160     +47     
  Branches      439      430      -9     
=========================================
+ Hits         3390     3454     +64     
+ Misses        284      276      -8     
+ Partials      439      430      -9
Impacted Files Coverage Δ
src/cljam/sam/writer.clj 70% <0%> (ø) ⬆️
src/cljam/cli.clj 54.44% <100%> (+0.92%) ⬆️
src/cljam/bam/reader.clj 74% <91.3%> (+2.03%) ⬆️
src/cljam/sorter.clj 93.1% <92.52%> (+16.37%) ⬆️
src/cljam/sam/reader.clj 96.55% <96.66%> (-0.33%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 74087ab...ee6ff77. Read the comment docs.

@totakke
Copy link
Member

totakke commented Jun 2, 2017

lein test :all fails in my environment. Please check it.

@alumi
Copy link
Member Author

alumi commented Jun 2, 2017

Oh, I'd totally forgotten about that! 😲
Updated test cases to pass :all tests. (it takes > 5min)

Copy link
Member

@totakke totakke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirmed performance improvement in sorting. Great work.

Please add description of a new option to usage string for approval.

@@ -131,6 +131,9 @@
(def ^:private sort-cli-options
[["-o" "--order ORDER" "Sorting order of alignments <coordinate|queryname>"
:default "coordinate"]
["-c" "--chunk CHUNK" "Maximum number of alignments sorted per thread."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add [-c CHUNK] to usage line (L142).

"Usage: cljam sort [-o ORDER] [-c CHUNK] <in.bam|sam> <out.bam|sam>"

@alumi
Copy link
Member Author

alumi commented Jun 2, 2017

@totakke Thank you for the reviewing. I updated usage string in cli.

@totakke totakke merged commit 1ac3617 into master Jun 2, 2017
@totakke totakke deleted the feature/improve-sorter branch June 2, 2017 09:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants