Skip to content

Motifator, a new tool for classifiying RdRPs and close homologs

Robert Edgar edited this page Jan 27, 2021 · 16 revisions

Downloads

Binary: s3://serratus-public/rce/motifator/bin/motifator1.1.1114

Usage

motifator -search_rdrp input.fasta [options]

Query is amino acid or nucleotide sequence, the type is detected automatically. Output files are:

-report report.txt
-fevout output.fev
-trim_fastaout trim.faa
-trim_fastaout_nt trim.fna
-bedout hits.bed
-motifs_fastaout motifs.fa

The trim_fastaout_nt file reports the nt palm sequence for translated searches; not supported if the query is aa.

All output options are optional. Fev is "field equals value" format, which is tabbed text with fields such as qlen=10. Trimmed output is the segment from the beginning of motif A to the end of motif C (or C..A if the domain is permuted). Motifs output is the three motifs in canonical A, B, C order separated by xxx. By default, FASTA output is written if the query is predicted to be RdRP.

If -hionly is specified, only high-confidence predictions are written to the trim_fastaout, trim_fastaout_nt and motifs_fastaout files.

By default, 10 threads will be started, or one thread per CPU core, whichever is smaller. The -threads option can be used to specify the number of threads, e.g threads 8.

Description

This is typical report output for a valid RdRP which illustrates what motifator is designed to do.

>A0A1L3KJH1_9VIRU/1426-1810
Length 385aa ABC 173-282(110)

   A:173-184(17.3)   B:237-250(22.6)   C:275-282(12.3)
   VAGDFKNFDKRV      SGCFFTSIVNNIVN    VLGDDHIY
   +||||+|||+++      |||++|||.|.|||    |.|||.|+
   iagDySkFDssl      SGsplTsidNSivN    vyGDDnii

Score 52.3, high-confidence-RdRP: good-ABC-order.good-motif-spacing.high-PSSM-score.

Motifator looks for the characteristic functional motifs called A, B and C in the catalytic "palm" of the RdRP domain. Position Specific Scoring Matrices (PSSMs) are used to search for the motifs. Additional evidence comes from the distances between the motifs. There are PSSMs for RdRP and for RT (reverse transcriptase). RT is an RdRP homolog which also has A, B and C motifs.

Motifator reports a score which is the sum of PSSM log-odds scores minus a penalty if the A-B-C spacings are out of the typical range. The query is reported in categories such as high-confidence-RdRP based on this score and other heuristics.

Validation

Results updated for v1.1.1114

                Name        N     Nhi   Nlo                     Desc
             uniprot      838     785     8             UniProt RdRP
      PF00680_RdRP_1      795     757     6              PFAM RdRP_1
      PF00978_RdRP_2      397     379     1              PFAM RdRP_2
      PF00998_RdRP_3      205     194     2              PFAM RdRP_3
      PF02123_RdRP_4      216     191     1              PFAM RdRP_4
   PF04197_Birn_RdRP       11       1     0          RdRP Birna_RdRP
PF05919_MitoVir_RdRP      181      55     2     PF05919_MitoVir_RdRP
PF17501_Viral_RdRP_C        4       0     0        PFAM Viral_RdRP_C
   PF00972_Flavi_NS5       14       8     0          PFAM Flavi RdRP
      quenya.protref       50       6    17          Quenya proteins
            permuted      117      75    28    Curated permuted RdRP
               rdrp1    14680   12455   198      Serratus RdRP query
            complete      826     817     0  Complete Cov nt genomes
               decoy   296536       4    17         Curated non-RdRP
         PF00078_RT1    46876       0     0          PFAM RVT_1 (RT)
         PF07727_RT2    12037       0     0          PFAM RVT_2 (RT)
           gb241_orf   360114  218687  1358         GB241 viral ORFs
              vgb241  3261824  228655  1592           GB241 viral nt

N=nr sequences, Nhi, Nlo=nr classified as high-, low-confidence RdRP by motifator.

Example

# Download binary (x86)
wget https://serratus-public.s3.amazonaws.com/rce/motifator/bin/motifator1.1.1109
mv motifator* motifator; chmod 755 motifator

# Run Motifator with outputs
INPUT='ERR2756788.cs.fa'
OUTNAME='frank'

./motifator -search_rdrp $INPUT -hionly \
  -report $OUTNAME.txt \
  -fevout $OUTNAME.fev \
  -trim_fastaout $OUTNAME.trim.fa \
  -motifs_fastaout $OUTNAME.motifs.fa
Clone this wiki locally