Skip to content
This repository has been archived by the owner on Nov 7, 2021. It is now read-only.

Commit

Permalink
Merge pull request #88 from trstickland/doc_rewrite
Browse files Browse the repository at this point in the history
Doc rewrite
  • Loading branch information
ssjunnebo authored Oct 3, 2018
2 parents 32fd793 + c614628 commit e26f4c6
Show file tree
Hide file tree
Showing 2 changed files with 69 additions and 14 deletions.
81 changes: 68 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,20 @@ A portable, scalable eukaryotic genome annotation pipeline implemented in Nextfl
[![License: ISC](https://img.shields.io/badge/License-ISC-brightgreen.svg)](https://github.com/sanger-pathogens/companion/blob/master/LICENSE)
[![status](https://img.shields.io/badge/NAR-10.1093%2Fnar.gkw292-brightgreen.svg)](https://doi.org/10.1093/nar/gkw292)


## Content
* [Introduction](#introduction)
* [Installation](#installation)
* [Required dependencies](#required-dependencies)
* [Usage](#usage)
* [Preparing reference annotations](#preparing-reference-annotations)
* [License](#license)
* [Feedback/Issues](#feedbackissues)
* [Citation](#citation)
* [Introduction](#introduction)
* [Dependencies](#dependencies)
* [Docker](#docker)
* [Installation](#installation)
* [Usage](#usage)
* [Local copy of Companion](#local-copy-of-companion)
* [Running Companion direct from a repository](#running-companion-direct-from-a-repository)
* [Preparing reference annotations](#preparing-reference-annotations)
* [License](#license)
* [Feedback/Issues](#feedbackissues)
* [Citation](#citation)


## Introduction
This software is a comprehensive computational pipeline for the annotation of eukaryotic genomes (like protozoan parasites). It performs the following tasks:
Expand Down Expand Up @@ -93,7 +98,7 @@ docker pull sangerpathogens/companion
To create a local copy of companion, you can download this repo from github (if you are familiar with github, you may
of course prefer to _clone_ or _fork_ it).
```
curl https://github.com/trstickland/companion/archive/master.zip # or click the green button on the guthub web page
curl -L -o companion-master.zip https://github.com/sanger-pathogens/companion/archive/master.zip # or click the green button on the guthub web page
unzip companion-master.zip
mv companion-master my-companion-project # renaming it to something meaningful to you is a good idea
```
Expand All @@ -106,9 +111,19 @@ nextflow run my-companion-project -profile docker
The argument `-profile docker` instructs nextflow to run the sangerpathogens/companion docker image for the dependencies.

Have a look at the `nextflow.config` file to see the definition of the docker profile, and how the docker image is specified.
You will also find file names, paths, parameters, etc. that you can edit to perform your own runs.
You will also find file names, paths, parameters, etc. that you can edit to perform your own runs. The following warrant
a special mention:

*inseq* The input FASTA file (`${baseDir}/example-data/L_donovani.1.fasta` in the example parameter file included wirth the distribution)

*ref_dir* The directory containing reference genomes (`${baseDir}/example-data/references` in the example file)

*dist_dir* The directory that will contain the newly created output files (`${baseDir}/example-data-output` in the example file)

### running Companion direct from a repository
*run_snap* We recommend SNAP is disabled, as it has not provided useful results in this pipeline (`false` in the example file)


### Running Companion direct from a repository

If you run nextflow with the name of a github repository, it will pull the contents of the repository and run with those.
This command will do the same as the "local copy" example above:
Expand All @@ -123,10 +138,50 @@ edit the files in your local copy, and nextflow will work from your (now differe
If you are familiar with repositories, and the workflow appropriate to using them, this can be a very convenient way of
working; otherwise it can become quite confusing, and you may find it easier to work with a simple local copy.


### Preparing reference annotations

The reference annotations used in the pipeline need to be pre-processed before they can be used. See the the [GitHub wiki](https://github.com/sanger-pathogens/companion/wiki/Preparing-reference-data-sets) for more details. There are also pre-generated reference sets for various parasite species/families.
The reference annotations used in the pipeline need to be pre-processed before they can be used. Only a few pre-generated
reference sets for various parasite species/families are included in the distribution as examples.

To add a reference organism, you will need:

- a descriptive name of the organism
- a short abbreviation for the organism
- the genome sequence in a single FASTA file
- a structural gene annotation in GFF3 format (see below for details)
- functional GO annotation in GAF 1.0 format, on the gene level
- a pattern matching chromosome headers, describing how to extract chromosome numbers from them
- an [AUGUSTUS](http://bioinf.uni-greifswald.de/augustus/) model, trained on reference genes

Insert these file names, etc., where `<placeholders>` appear in the steps below:

1. Create a new data directory (i.e. the equivalent of the `example-data` directory included in the distribution)
1. Edit `nextflow.config` (and any config files that are referenced) and change parameters such as
`inseq` and `ref_dir` to your new data directory.
1. Copy the new reference genome (FASTA) into `<new_data_dir>/genomes`
1. Copy GFF3 and GAF files into `<new_data_dir>/genomes`
1. Copy Augustus model files into `data/augustus/species/<species_name>/`
1. Create new directory `<new_data_dir>/references/<short_name>/`
1. Add new section to `amber-test-data/references/references-in.json`, using the
short name (same as the directory name in the previous step); in this section add
the names/paths of the files copied (above), a descriptive name, and
a pattern for matching chromosomes in the FASTA files (in this example, <short_name>_<n>, where _n_ in any integer).
```
"<short_name>" : { "gff" : "../genomes/<gff3_filename>.gff3",
"genome" : "../genomes/<ref_genome_name>.fasta",
"gaf" : "../genomes/<ref_annot_filename>.gaf",
"name" : "<Descriptive Name of Reference Genome>",
"augustus_model" : "../../data/augustus/species/<species_name>/",
"chromosome_pattern" : "<short_name>_(%d+)"
}
```
8. Finally, change directory to `<new_data_dir>/references` (you _must_ execute the following command in this directory)
and run `../../bin/update_references.lua`. This writes the file `<new_data_dir>/references/references.json`.

You can now run _Companion_, and the new reference will be included.

Further documentation on preparing reference data can be found in the [GitHub wiki](https://github.com/sanger-pathogens/companion/wiki/Preparing-reference-data-sets).


## License
Companion is free software, licensed under [ISC](https://github.com/sanger-pathogens/companion/blob/master/LICENSE).
Expand Down
2 changes: 1 addition & 1 deletion params_default.config
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ params {
ref_species = "LmjF.1"

// output directory, if desired
// dist_dir = "${baseDir}"
dist_dir = "${baseDir}/example-output"

// enable/disable parts of the pipeline
run_exonerate = false
Expand Down

0 comments on commit e26f4c6

Please sign in to comment.