-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for random reading of bgzipped FASTA files. #174
Conversation
Codecov Report
@@ Coverage Diff @@
## master #174 +/- ##
==========================================
- Coverage 86.9% 86.86% -0.04%
==========================================
Files 73 74 +1
Lines 5683 5750 +67
Branches 482 492 +10
==========================================
+ Hits 4939 4995 +56
- Misses 262 263 +1
- Partials 482 492 +10
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the enhancement!
Functionally, the changes LGTM👍 Just added a trivial comment on a typo.
src/cljam/io/util/bgzf/gzi.clj
Outdated
(bit-and diff 0xffff)))) | ||
|
||
(defn comp->uncomp | ||
"Returns a uncompressed offset for a given virtual file offset" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Returns a uncompressed offset for a given virtual file offset" | |
"Returns an uncompressed offset for a given virtual file offset." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Fixed in c08fa83.
src/cljam/io/fasta/reader.clj
Outdated
(:import [java.io RandomAccessFile InputStream] | ||
[cljam.io.fasta-index.core :as fasta-index] | ||
[cljam.io.util.bgzf.gzi :as gzi] | ||
[cljam.io.util.bgzf :as bgzf]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cljam.io.util.bgzf
is never used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Thanks 🙏
src/cljam/io/util/bgzf/gzi.clj
Outdated
(let [off (unsigned-bit-shift-right compressed-offset 16) | ||
[uncompressed] (->> gzi | ||
rseq | ||
(filter (fn [[u c]] (= (long c) off))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
u
is never used. https://guide.clojure.style/#underscore-for-unused-bindings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Summary
This PR adds support for random reading of bgzip-compressed FASTA files.
cljam.io.sequence/read-sequence
now works onfa.gz
files iffa.gz.fai
andfa.gz.gzi
exist..gzi
is a file format for storing BGZF indices as a list of pairs (compressed offset, uncompressed offset).Note that this implementation does not support indexing FASTA without
.gzi
file.Tests
lein check
✅lein test :all
✅lein cloverage
✅