Skip to content
lukepereira edited this page Mar 29, 2023 · 3 revisions

rdrp1

For the translated-nucleotide search of viral RNA-dependent RNA polymerase (RdRP). Also referred to "RdRP search" in the main Serratus manuscript.

Description

Database was compiled from.

    1. The wolf18 collection is a curated snapshot (ca. 2018) of RdRP from GenBank. link
    1. The wolf20 collection is RdRPs from assembled from marine metagenomes. link
    1. All viral GenBank protein sequences (release version 241) were aligned with diamond --ultra-sensitive against the combined wolf18 and wolf20 sequences (E-value < 1e-6). These produced local alignments which contained truncated RdRP, so each RdRP-containing GenBank sequence was then re-aligned to the wolf18 and wolf20 collection to "trim" them to wolf RdRP boundaries.
    1. The above algorithm was also applied to all viral GenBank nucleotide records to capture additional RdRP not annotated as such by GenBank . A region of HCV capsid protein shares similarity to HCV RdRP, sequences annotated as HCV-capsid were therefore removed. Eight novel coronavirus RdRP sequences identified in a pilot experiment were added manually. The combined RdRP sequences from the above collections were clustered (uclust) at 90% amino acid identity and the resulting representative sequences (centroids, N = 14,653) used as the rdrp1 search query.
    1. Deltavirus antigen protein sequences added manually from NC_001653, M21012, X60193, L22063, AF018077, AJ584848, AJ584847, AJ584844, AJ584849, MT649207, MT649208, MT649206, NC_040845, NC_040729, MN031240, MN031239, MK962760, MK962759, and eight additional homologs we identified in a pilot experiment.
Clone this wiki locally