-
Notifications
You must be signed in to change notification settings - Fork 34
RdRp trimming with hacked usearch v12
Binary is here: https://drive5.com/downloads/usearch12_trim
Command line is like this:
usearch -usearch_global extended_input.fa \
-id 0.01 \
-fulldp \
-maxaccepts 8 \
-maxrejects 32 \
-top_hit_only \
-db trimmed_reference.fa \
-userfields query+target+id+qtrimlo+qtrimhi \
-userout results.tsv \
-trimout trimmed_output.fa
FASTA output is written to -trimout
, tsv with coordinates is -userout
.
Tsv fields are 1. query label, 2 reference label of top hit, 3. %id of semi-global alignment, 4. one-based start coordinate of alignment in query, 5. one-based end coordinated of alignment in query.
Result is considered correct (a true positive) if the overlap between the gold standard trim and tested trim is long enough. Results below are for minimum 50%, 75% and 90% overlap. There are four test+reference pairs made by CVI at 20%id, 50%id, 75%id and 90%id. N=number of test sequences, TP=number of trims with good overlap.
=== minpctov=50 ===
20%id N=141, TP=129, TP=91.5%
50%id N=517, TP=517, TP=100.0%
75%id N=389, TP=389, TP=100.0%
90%id N=138, TP=138, TP=100.0%
=== minpctov=75 ===
20%id N=141, TP=123, TP=87.2%
50%id N=517, TP=508, TP=98.3%
75%id N=389, TP=385, TP=99.0%
90%id N=138, TP=138, TP=100.0%
=== minpctov=90 ===
20%id N=141, TP=108, TP=76.6%
50%id N=517, TP=494, TP=95.6%
75%id N=389, TP=383, TP=98.5%
90%id N=138, TP=138, TP=100.0%