Skip to content

RdRp trimming with hacked usearch v12

Robert Edgar edited this page Mar 17, 2021 · 6 revisions

Binary is here: https://drive5.com/downloads/usearch12_trim

Usage

Command line is like this:

usearch -usearch_global extended_input.fa \
    -id 0.01 \
    -fulldp \
    -maxaccepts 8 \
    -maxrejects 32 \
    -top_hit_only \
    -db trimmed_reference.fa \
    -userfields query+target+id+qtrimlo+qtrimhi \
    -userout results.tsv \
    -trimout trimmed_output.fa

FASTA output is written to -trimout, tsv with coordinates is -userout.

Tsv fields are 1. query label, 2 reference label of top hit, 3. %id of semi-global alignment, 4. one-based start coordinate of alignment in query, 5. one-based end coordinated of alignment in query.

CVI benchmark results

Result is considered correct (a true positive) if the overlap between the gold standard trim and tested trim is long enough. Results below are for minimum 50%, 75% and 90% overlap. There are four test+reference pairs made by CVI at 20%id, 50%id, 75%id and 90%id. N=number of test sequences, TP=number of trims with good overlap.

=== minpctov=50 ===
20%id N=141, TP=129, TP=91.5%
50%id N=517, TP=517, TP=100.0%
75%id N=389, TP=389, TP=100.0%
90%id N=138, TP=138, TP=100.0%

=== minpctov=75 ===
20%id N=141, TP=123, TP=87.2%
50%id N=517, TP=508, TP=98.3%
75%id N=389, TP=385, TP=99.0%
90%id N=138, TP=138, TP=100.0%

=== minpctov=90 ===
20%id N=141, TP=108, TP=76.6%
50%id N=517, TP=494, TP=95.6%
75%id N=389, TP=383, TP=98.5%
90%id N=138, TP=138, TP=100.0%

Clone this wiki locally