VIB-UGent Center for Plant Systems Biology—Evolutionary Systems Biology Lab
ksrates is a tool to position whole-genome duplications* (WGDs) relative to speciation events using substitution-rate-adjusted mixed paralog–ortholog distributions of synonymous substitutions per synonymous site (KS).
* or, more generally, whole-genome multiplications (WGMs), but we will simply use the more common WGD to refer to any multiplication
To position ancient WGD events with respect to speciation events in a phylogeny, the KS values of WGD paralog pairs in a species of interest are often compared with the KS values of ortholog pairs between this species and other species. For example, it is common practice to superimpose ortholog and paralog KS distributions in a mixed plot. However, if the lineages involved exhibit different substitution rates, such direct naive comparison of paralog and ortholog KS estimates can be misleading and result in phylogenetic misinterpretation of WGD signatures.
ksrates is user-friendly command-line tool and Nextflow pipeline to compare paralog and ortholog KS distributions derived from genomic or transcriptomic sequences. ksrates estimates differences in synonymous substitution rates among the lineages involved and generates an adjusted mixed plot of paralog and ortholog KS distributions that allows to assess the relative phylogenetic positioning of presumed WGD and speciation events.
For more details, see the related publication and the documentation below.
ksrates can be executed using either a Nextflow pipeline (recommended) or a manual command-line interface. The latter is available via Docker and Singularity containers, and as a Python package to integrate into existing genomics toolsets and workflows.
In the following sections we briefly describe how to install, configure and run the Nextflow pipeline and the basic usage of the command-line interface for the Docker or Singularity containers. For detailed usage information, a full tutorial and additional installation options, please see the full documentation.
To illustrate how to use ksrates, two example datasets are provided for a simple example use case analyzing WGD signatures in monocot plants with oil palm (Elaeis guineensis) as the focal species.
-
example
: a full dataset which contains the complete sequence data for the focal species and two other species and may require hours of computations depending on the available computing resources. We advice to run this dataset on a compute cluster and using the ksrates Nextflow pipeline should make it fairly easy to configure this for a variety of HPC schedulers. -
test
: a small test dataset that contains only a small subset of the sequence data for each of the species and takes only a few minutes to be run. This is intended for a quick check of the tool only and can be run locally, e.g. on a laptop. The results are not very meaningful.
See the Usage sections below and the Tutorial for more detail.
-
Install Nextflow, official instructions are here, but briefly:
-
If you do not have Java installed, install Java (8 or later, up to 15); on Linux you can use:
sudo apt-get install default-jdk
-
Install Nextflow using either:
wget -qO- https://get.nextflow.io | bash
or:
curl -fsSL https://get.nextflow.io | bash
It creates the
nextflow
executable file in the current directory. You may want to move it to a folder accessible from your$PATH
, for example:mv nextflow /usr/local/bin
-
-
Install either Singularity (recommended, but see here) or Docker. This is needed to run the ksrates Singularity or Docker container which contain all other required software dependencies, so nothing else needs to be installed.
-
Install ksrates: When using Nextflow, ksrates and the ksrates Singularity or Docker container will be automatically downloaded simply when you execute the launch of the ksrates pipeline for the first time, and they will be stored and reused for any further executions (see Nextflow pipeline sharing). Therefore, in this case it is not necessary to manually install ksrates, simply continue with the Usage section below.
We briefly illustrate here how to run the ksrates Nextflow pipeline on the test
dataset.
-
Get the example datasets.
-
Clone the repository to get the test datasets:
git clone https://github.com/VIB-PSB/ksrates
-
You may want to copy the dataset folder you want to use to another location, for example your home folder, and then change to that folder:
cp ksrates/test ~ cd ~/test
-
-
Prepare the configuration files.
The
test
directory already contains:-
A pre-filled ksrates configuration file (
config_elaeis.txt
) for the oil palm use case. -
A Nextflow configuration file template (
nextflow.config
) to configure the executor to be used (i.e., a local computer or a compute cluster) and its resources made available to Nextflow such as the number of CPUs. It also configures whether to use the ksrates Singularity or Docker container. The configuration file may need to be adapted to your available resources.See the full documentation and the Nextflow documentation for more detail on Nextflow configuration, e.g. for different HPC schedulers. We also provide additional, more general template Nextflow configuration files in the doc directory in the repository.
-
-
Launch the ksrates Nextflow pipeline.
Note: If this is the first time you launch the pipeline, Nextflow will first download ksrates Nextflow pipeline and the ksrates Singularity or Docker container.
nextflow run VIB-PSB/ksrates --config ./config_elaeis.txt
The path to the ksrates configuration file is specified through the
--config
parameter. If the Nextflow configuration file is namednextflow.config
and located in the launching folder the file is automatically detected. Alternatively, the user can specify a custom file by using the-C
option (see Nextflow documentation).Note: To generate a new ksrates configuration file template for a new analysis, use the
--config
option to specify its file name or file path. If the specified file does not exist (at the given path), the pipeline will generate the template and then exit. Edit and fill in this generated configuration file (see the full documentation for more detail) and then rerun the same command above to relaunch the pipeline.
Install either Singularity (recommended, but see here) or Docker. This is needed to run the ksrates Singularity or Docker container which contain ksrates and all other required software dependencies, so nothing else needs to be installed. The ksrates Singularity or Docker container will be automatically downloaded simply when you execute a ksrates command on the publicly accessible container for the first time, and they will be stored and reused for any further command executions.
We briefly illustrate here how to run ksrates using the Singularity or Docker container.
-
ksrates comes with a command-line interface. Its basic syntax is:
ksrates [OPTIONS] COMMAND [ARGS]...
-
To execute a ksrates command using the Singularity container the syntax is:
singularity exec docker://vibpsb/ksrates ksrates [OPTIONS] COMMAND [ARGS]...
-
Or to execute a ksrates command using the Docker container the syntax is:
docker run --rm -v $PWD:/temp -w /temp vibpsb/ksrates ksrates [OPTIONS] COMMAND [ARGS]...
Some example ksrates commands are:
Show usage and all available COMMAND
s and OPTIONS
:
ksrates -h
Generate a template configuration file for the focal species:
ksrates generate-config config_elaeis.txt
Show usage and ARGS
for a specific COMMAND
:
ksrates orthologs-ks -h
Run the ortholog KS analysis between two species using four threads/CPU cores:
ksrates orthologs-ks config_elaeis.txt elaeis oryza --n-threads 4
Please see the full documentation for more details and the complete set of commands.
If you come across a bug or have any question or suggestion, please open an issue.
If you publish results generated using ksrates, please cite:
Sensalari C., Maere S. and Lohaus R. (2021) ksrates: positioning whole-genome duplications relative to speciation events in KS distributions. Bioinformatics, btab602, doi: https://doi.org/10.1093/bioinformatics/btab602