TransposableELMT

Wrapper script for TE identification and genome masking

Summary

This script follows some of the main procedures set forth in Coghlan, A., Tsai, I.J., Berriman, M. 2018. Creation of a comprehensive repeat library for a newly sequenced parasitic worm genome. Protocolexchange DOI: 10.1038/protex.2018.054

This is a simple wrapper script that uses multiple repeat finding programs including RepeatModeler, TransposonPSI, LTR_finder, and LTR_harvest. LTR_harvest is coupled with LTR_digest and an HMMsearch against pfam domains associated with LTRs to limit false positive identifications. The constructed libraries are run through RepeatClassifier to classify the LTR's. USEARCH is then used on the concatenated library to remove redundant LTR's based on an 80% similarity. The non-redundant library is then used with RepeatMasker to soft mask the assembly.

Currently, all programs are run using default settings with little to no options to alter settings through flags. Additional options may be added to future versions if there is a need.

It is recommended to provide additional currated libraries such as those from RepBase. Simply select an appropriate taxanomic level and download the file in FASTA format. Then provide the file with the -rb flag on the command line.

Dependencies

Basic programs

TE programs

Additional

Dependecies should be able to be called from the commandline, if not then the paths to the parent directories of each executable should be located in $PATH. If all else fails, paths to executables can be passed into the script through flags.

Usage

usage: ./TransposableELMT.py [options] -in genome_assembly.fasta -o output_basename

optional arguments:
  -h, --help                  show this help message and exit
  -in , --input               Genome assembly in FASTA format
  -o , --out                  Basename of output directory and file
  --cpus                      Number of cores to use [default: 2]
  -id , --identity            Cutoff value for percent identity in USEARCH [default: 0.80]
  -en , --engine              Search engine used in RepeatModeler [abblast|wublast|ncbi] [default: ncbi]
  -rb , --repbase_lib         RepBase library of TEs or additional curated library in FASTA format
  -rl , --repeatmodeler_lib   Pre-computed RepeatModeler library
  --hmms                      Path to directory of TE pfam domain files in HMMER3 format [Default: TransposableELMT/te_hmms]
  --REPEATMODELER_PATH        Path to RepeatModeler exe if not set in $PATH
  --REPEATMASKER_PATH         Path to RepeatMasker exe if not set in $PATH
  --BUILDDATABASE_PATH        Path to BuildDatabase exe if not set in $PATH
  --REPEATCLASSIFIER_PATH     Path to RepeatClassifier exe if not set in $PATH
  --LTRFINDER_PATH            Path to LTR_Finder exe if not set in $PATH
  --GENOMETOOLS_PATH          Path to genometools exe if not set in $PATH
  --USEARCH_PATH              Path to USEARCH exe if not set in $PATH
  --TRANSPOSONPSI_PATH        Path to transposonPSI.pl if not set in $PATH
  --CNV_LTRFINDER2GFF_PATH    Path to cnv_ltrfinder2gff.pl if not set in $PATH

Output files

Soft-masked genome assembly in FASTA format
RepeatMasker Table file
RepeatMasker Out file

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
te_hmms		te_hmms
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TransposableELMT.py		TransposableELMT.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TransposableELMT

Wrapper script for TE identification and genome masking

Summary

Dependencies

Basic programs

TE programs

Additional

Usage

Output files

About

Releases 1

Packages

Languages

License

PlantDr430/TransposableELMT

Folders and files

Latest commit

History

Repository files navigation

TransposableELMT

Wrapper script for TE identification and genome masking

Summary

Dependencies

Basic programs

TE programs

Additional

Usage

Output files

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages