diff --git a/.travis.yml b/.travis.yml index f518bf0..42cd4ed 100644 --- a/.travis.yml +++ b/.travis.yml @@ -2,9 +2,15 @@ dist: trusty sudo: required services: - docker - + +env: + global: + - ROOTDIR="$TRAVIS_BUILD_DIR" + before_install: - - docker build -t companion_image . + - # docker build -t companion_image . + - docker pull $DOCKER_IMAGE + - docker tag $DOCKER_IMAGE companion_image - docker run -dit -v $TRAVIS_BUILD_DIR:/tmp/companion --name companion_container companion_image /bin/bash - sleep 5 - docker container ls @@ -19,9 +25,7 @@ before_install: cache: directories: - work -env: - global: - - ROOTDIR="$TRAVIS_BUILD_DIR" + script: - docker exec companion_container /bin/bash -c 'cd /tmp/companion/test/testsuite; ./testsuite.rb -threads 2' - travis_wait docker exec companion_container /bin/bash -c 'cd /tmp/companion; ./nextflow -c loc_travis.config -c params_default.config run annot.nf --do_circos=false' diff --git a/README.md b/README.md index 8bf6e17..50e18d4 100644 --- a/README.md +++ b/README.md @@ -5,15 +5,20 @@ A portable, scalable eukaryotic genome annotation pipeline implemented in Nextfl [![License: ISC](https://img.shields.io/badge/License-ISC-brightgreen.svg)](https://github.com/sanger-pathogens/companion/blob/master/LICENSE) [![status](https://img.shields.io/badge/NAR-10.1093%2Fnar.gkw292-brightgreen.svg)](https://doi.org/10.1093/nar/gkw292) + ## Content - * [Introduction](#introduction) - * [Installation](#installation) - * [Required dependencies](#required-dependencies) - * [Usage](#usage) - * [Preparing reference annotations](#preparing-reference-annotations) - * [License](#license) - * [Feedback/Issues](#feedbackissues) - * [Citation](#citation) + * [Introduction](#introduction) + * [Dependencies](#dependencies) + * [Docker](#docker) + * [Installation](#installation) + * [Usage](#usage) + * [Local copy of Companion](#local-copy-of-companion) + * [Running Companion direct from a repository](#running-companion-direct-from-a-repository) + * [Preparing reference annotations](#preparing-reference-annotations) + * [License](#license) + * [Feedback/Issues](#feedbackissues) + * [Citation](#citation) + ## Introduction This software is a comprehensive computational pipeline for the annotation of eukaryotic genomes (like protozoan parasites). It performs the following tasks: @@ -93,7 +98,7 @@ docker pull sangerpathogens/companion To create a local copy of companion, you can download this repo from github (if you are familiar with github, you may of course prefer to _clone_ or _fork_ it). ``` -curl https://github.com/trstickland/companion/archive/master.zip # or click the green button on the guthub web page +curl -L -o companion-master.zip https://github.com/sanger-pathogens/companion/archive/master.zip # or click the green button on the guthub web page unzip companion-master.zip mv companion-master my-companion-project # renaming it to something meaningful to you is a good idea ``` @@ -106,9 +111,19 @@ nextflow run my-companion-project -profile docker The argument `-profile docker` instructs nextflow to run the sangerpathogens/companion docker image for the dependencies. Have a look at the `nextflow.config` file to see the definition of the docker profile, and how the docker image is specified. -You will also find file names, paths, parameters, etc. that you can edit to perform your own runs. +You will also find file names, paths, parameters, etc. that you can edit to perform your own runs. The following warrant +a special mention: + +*inseq* The input FASTA file (`${baseDir}/example-data/L_donovani.1.fasta` in the example parameter file included wirth the distribution) + +*ref_dir* The directory containing reference genomes (`${baseDir}/example-data/references` in the example file) + +*dist_dir* The directory that will contain the newly created output files (`${baseDir}/example-data-output` in the example file) -### running Companion direct from a repository +*run_snap* We recommend SNAP is disabled, as it has not provided useful results in this pipeline (`false` in the example file) + + +### Running Companion direct from a repository If you run nextflow with the name of a github repository, it will pull the contents of the repository and run with those. This command will do the same as the "local copy" example above: @@ -123,10 +138,50 @@ edit the files in your local copy, and nextflow will work from your (now differe If you are familiar with repositories, and the workflow appropriate to using them, this can be a very convenient way of working; otherwise it can become quite confusing, and you may find it easier to work with a simple local copy. - ### Preparing reference annotations -The reference annotations used in the pipeline need to be pre-processed before they can be used. See the the [GitHub wiki](https://github.com/sanger-pathogens/companion/wiki/Preparing-reference-data-sets) for more details. There are also pre-generated reference sets for various parasite species/families. +The reference annotations used in the pipeline need to be pre-processed before they can be used. Only a few pre-generated +reference sets for various parasite species/families are included in the distribution as examples. + +To add a reference organism, you will need: + +- a descriptive name of the organism +- a short abbreviation for the organism +- the genome sequence in a single FASTA file +- a structural gene annotation in GFF3 format (see below for details) +- functional GO annotation in GAF 1.0 format, on the gene level +- a pattern matching chromosome headers, describing how to extract chromosome numbers from them +- an [AUGUSTUS](http://bioinf.uni-greifswald.de/augustus/) model, trained on reference genes + +Insert these file names, etc., where `` appear in the steps below: + +1. Create a new data directory (i.e. the equivalent of the `example-data` directory included in the distribution) +1. Edit `nextflow.config` (and any config files that are referenced) and change parameters such as +`inseq` and `ref_dir` to your new data directory. +1. Copy the new reference genome (FASTA) into `/genomes` +1. Copy GFF3 and GAF files into `/genomes` +1. Copy Augustus model files into `data/augustus/species//` +1. Create new directory `/references//` +1. Add new section to `amber-test-data/references/references-in.json`, using the +short name (same as the directory name in the previous step); in this section add +the names/paths of the files copied (above), a descriptive name, and +a pattern for matching chromosomes in the FASTA files (in this example, _, where _n_ in any integer). +``` +"" : { "gff" : "../genomes/.gff3", + "genome" : "../genomes/.fasta", + "gaf" : "../genomes/.gaf", + "name" : "", + "augustus_model" : "../../data/augustus/species//", + "chromosome_pattern" : "_(%d+)" + } +``` +8. Finally, change directory to `/references` (you _must_ execute the following command in this directory) +and run `../../bin/update_references.lua`. This writes the file `/references/references.json`. + +You can now run _Companion_, and the new reference will be included. + +Further documentation on preparing reference data can be found in the [GitHub wiki](https://github.com/sanger-pathogens/companion/wiki/Preparing-reference-data-sets). + ## License Companion is free software, licensed under [ISC](https://github.com/sanger-pathogens/companion/blob/master/LICENSE). diff --git a/params_default.config b/params_default.config index 2a29fe2..d87aa75 100644 --- a/params_default.config +++ b/params_default.config @@ -7,7 +7,7 @@ params { ref_species = "LmjF.1" // output directory, if desired - // dist_dir = "${baseDir}" + dist_dir = "${baseDir}/example-output" // enable/disable parts of the pipeline run_exonerate = false