Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove nth's linear search overhead in BCF reader #294

Merged
merged 1 commit into from
Dec 8, 2023

Conversation

athos
Copy link
Member

@athos athos commented Dec 5, 2023

When I was profiling the BCF reader before, I found that the invocation to nth took up most of the time, which does linear search per sample.

This PR removes the overhead of the linear search and improve the performance of the BCF reader by replacing the sequential collection returned from read-typed-value with a vector, not a lazy sequence.

Here are the profiling results before and after the change:

before change after change
before change after change

By this fix, the BCF reader is now roughly 7x faster than before:

(time
 (with-open [r (vcf/reader ".cavia/large.bcf")]
   (run! (constantly nil) (vcf/read-variants-randomly r {:chr "chr1" :end 30000000} {}))))

;; before change
"Elapsed time: 7973.314958 msecs"

;; after change
"Elapsed time: 1139.505833 msecs"

@athos athos self-assigned this Dec 5, 2023
Copy link

codecov bot commented Dec 5, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (d68c01d) 88.33% compared to head (50870bd) 88.75%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #294      +/-   ##
==========================================
+ Coverage   88.33%   88.75%   +0.42%     
==========================================
  Files          81       83       +2     
  Lines        7028     7255     +227     
  Branches      495      515      +20     
==========================================
+ Hits         6208     6439     +231     
+ Misses        325      324       -1     
+ Partials      495      492       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@athos athos changed the base branch from feature/lsb-revamp to master December 5, 2023 03:00
@athos athos marked this pull request as ready for review December 5, 2023 03:03
@athos athos requested review from alumi and a team as code owners December 5, 2023 03:03
@athos athos requested review from matsutomo81 and removed request for a team December 5, 2023 03:03
Copy link
Contributor

@matsutomo81 matsutomo81 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this!
LGTM👍

Copy link
Member

@alumi alumi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 Thanks!

@alumi alumi merged commit d6d1af8 into master Dec 8, 2023
17 checks passed
@alumi alumi deleted the fix/bcf-nth-overhead branch December 8, 2023 01:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants