Skip to content
forked from fdarthen/taXaminer

Interactive exploration of biodiverse genome assemblies

License

Notifications You must be signed in to change notification settings

BIONF/taXaminer

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

taXaminer

taXaminer - examine the taxonomic diversity in genome assemblies. Designed to detect and differentiate contamination and horizontal gene transfer.

taXaminer combines a reference-free and an alignment-based approach to detect and differentiate contamination and horizontal gene transfer in genome assemblies. It uses a total of 16 intrinsic features to describe the gene set. Among these are the read coverage, sequence composition, gene length and the size of the scaffold it is annotated on (see details here). To identify genes which discern from the average, a Principal Component Analysis is used to cluster genes with similar features. The taxonomic assignment targets at identifying the true taxon of origin for each gene. It is based on their protein sequence to reduce the need of having the exact reference in the database.

The results can be interactively explored in the accompanying dashboard.

Table of Contents

Installation

To install taXaminer, use the python package installer pip. Note: taXaminer is as of yet not published at pypi, thus you need to download this repository and provide pip with the link to the directory for installation.

git clone https://github.com/BIONF/taXaminer.git
pip install ./taXaminer

To install the additional dependencies, use the setup function included in taXaminer. You can install the tools either via conda or locally in a specified directory.

Using conda (installs into the currently active environment):

taxaminer.setup --conda

In a local directory:

taxaminer.setup -o </path/to/tool/directory/>

To download and build the database, use:

taxaminer.setup --db -d </path/to/database/directory/>

Use the following command to use an existing database.

taxaminer.setup -d </path/to/existing_database/directory/>

Usage

  1. Create a configuration file using the following template and adapt it to fit your data.
fasta_path: "path/to/assembly.fasta" # path to assembly FASTA
gff_path: "path/to/assembly.gff" # path to annotation in GFF3 format
output_path: "path/to/output_directory/" # directory to save results in
taxon_id: "<NCBI taxon ID>" # NCBI Taxon ID of query species
  1. To include coverage information, add the path to a sorted bam file (this is optional). Otherwise, omit this parameter from the configuration file.
bam_path_1: "path/to/mapping.bam" # path to BAM file
  • Note: When using multiple coverage sets, duplicate the parameter and increase the number in the suffix

To run taXaminer, call it with the path to the config file, like so:

taxaminer.run <config.yml>

For details on additional options see Configuration parameters.

Bugs

Any bug reports, comments or suggestions are highly appreciated. Please open an issue on GitHub or reach out via email.

Contributors

License

taXaminer is released under MIT license.

Contact

Please contact us via email.

About

Interactive exploration of biodiverse genome assemblies

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%