Merge branch 'dev' into master

SciLifeLab · Sep 13, 2018 · a227630 · a227630
2 parents 72c0c2c + 7a10f5d
commit a227630
Show file tree

Hide file tree

Showing 11 changed files with 653 additions and 378 deletions.
diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -31,5 +31,8 @@ A clear and concise description of what you expected to happen.
 **Container (please complete the following information):**
  - tag: [e.g. 1.0.0]
 
+**Sarek (please complete the following information):**
+ - version: [e.g. 2.1.0]
+
 **Additional context**
 Add any other context about the problem here.
diff --git a/.github/RELEASE_CHECKLIST.md b/.github/RELEASE_CHECKLIST.md
@@ -1,24 +1,34 @@
 # Release checklist
-This checklist is for our own reference
-
-1. Check that everything is up to date and ready to go
-    - Travis tests are passing
-    - Manual tests on Bianca are passing
-2. Increase version numbers
-3. Update version numbers in code: `configuration/base.config`
-4. Build, and get the containers.
-    - `./scripts/do_all.sh --push --tag <VERSION>`
-    - `./scripts/do_all.sh --pull --tag <VERSION>`
-5. Test against sample data.
-    - Check for any command line errors
-    - Check version numbers are printed correctly
-    - `./scripts/test.sh -p docker --tag <VERSION>`
-    - `./scripts/test.sh -p singularity --tag <VERSION>`
-    - `./scripts/test.sh -p singularityPath --tag <VERSION>`
-6. Commit and push version updates
-7. Make a [release](https://github.com/SciLifeLab/Sarek/releases) on GitHub
-8. Choose an appropriate codename for the release
-9. Update [bio.tools](https://bio.tools/Sarek) with the new release
-10. Tweet that new version is released
-11. Commit and push. Continue making more awesome :metal:
-12. Have fika :cake:
+
+> This checklist is for our own reference, to help us prepare a new release
+
+1.  Check that everything is ready to go
+
+    -   [PRs](https://github.com/SciLifeLab/Sarek/pulls) are merged
+    -   [Travis tests](https://travis-ci.org/SciLifeLab/Sarek/branches) are passing on `dev`
+
+2.  Increase version number following [semantic versioning](http://semver.org/spec/v2.0.0.html)
+3.  Choose an appropriate codename for the release
+    -   i.e. Peaks in [Sarek National Park](https://en.wikipedia.org/wiki/Sarek_National_Park#Topography)
+4.  Build docker containers.
+
+    -   `./scripts/do_all.sh --tag <VERSION>`
+
+5.  Test against sample data.
+
+    -   `./scripts/test.sh -p docker --tag <VERSION>`
+    -   Check for any command line errors
+
+6.  Use script to update version in files:
+
+    -   `./scripts/do_release.sh -r "<VERSION>" -c "<CODENAME>"`
+
+7.  Push latest updates
+8.  Make a PR against `dev`
+9.  Merge said PR
+10. Make a [release](https://github.com/SciLifeLab/Sarek/releases) on GitHub
+11. Update [bio.tools](https://bio.tools/Sarek) with the new release details
+12. Tweet that a new version is released
+13. Add a new `Unreleased` section in `CHANGELOG.md` for the `dev` version
+14. Commit and push. Continue making more awesome :metal:
+15. Have fika :cake:
diff --git a/CHANGELOG.md b/CHANGELOG.md
diff --git a/Dockerfile b/Dockerfile
@@ -6,5 +6,5 @@ LABEL \
 	maintainer="Maxime Garcia <maxime.garcia@scilifelab.se>, Szilveszter Juhos <Szilveszter.Juhos@scilifelab.se>"
 
 COPY environment.yml /
-RUN conda env update -n root -f /environment.yml && conda clean -a
-ENV PATH /opt/conda/bin:$PATH
+RUN conda env create -f /environment.yml && conda clean -a
+ENV PATH /opt/conda/envs/sarek-2.1.0/bin:$PATH
diff --git a/README.md b/README.md
@@ -82,12 +82,13 @@ The Sarek pipeline comes with documentation in the `docs/` directory:
 06. [Configuration and profiles documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/CONFIG.md)
 07. [Intervals documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/INTERVALS.md)
 08. [Running the pipeline](https://github.com/SciLifeLab/Sarek/blob/master/docs/USAGE.md)
-09. [Examples](https://github.com/SciLifeLab/Sarek/blob/master/docs/USE_CASES.md)
-10. [TSV file documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/TSV.md)
-11. [Processes documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/PROCESS.md)
-12. [Documentation about containers](https://github.com/SciLifeLab/Sarek/blob/master/docs/CONTAINERS.md)
-13. [More information about ASCAT](https://github.com/SciLifeLab/Sarek/blob/master/docs/ASCAT.md)
-14. [Output documentation structure](https://github.com/SciLifeLab/Sarek/blob/master/docs/OUTPUT.md)
+09. [Command line parameters](https://github.com/SciLifeLab/Sarek/blob/master/docs/PARAMETERS.md)
+10. [Examples](https://github.com/SciLifeLab/Sarek/blob/master/docs/USE_CASES.md)
+11. [Input files documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/INPUT.md)
+12. [Processes documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/PROCESS.md)
+13. [Documentation about containers](https://github.com/SciLifeLab/Sarek/blob/master/docs/CONTAINERS.md)
+14. [More information about ASCAT](https://github.com/SciLifeLab/Sarek/blob/master/docs/ASCAT.md)
+15. [Output documentation structure](https://github.com/SciLifeLab/Sarek/blob/master/docs/OUTPUT.md)
 
 ## Contributions & Support
 

diff --git a/docs/CONFIG.md b/docs/CONFIG.md
@@ -5,7 +5,8 @@ For more informations on how to use configuration files, have a look at the [Nex
 For more informations about profiles, have a look at the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html#config-profiles)
 
 We provides several configuration files and profiles for Sarek.
-The standard ones are designed to work on a Swedish UPPMAX clusters, and can be modified and tailored to your own need.
+The standard ones are designed to work on a Swedish UPPMAX cluster, but can be modified and tailored to your own need.
+
 
 ## Configuration files
 
@@ -51,10 +52,14 @@ To be used for Travis (2 cpus) or on small computer for testing purpose
 Slurm configuration for a UPPMAX cluster
 Will run the workflow on `/scratch` using the Nextflow [`scratch`](https://www.nextflow.io/docs/latest/process.html#scratch) directive
 
-## profiles
+## Profiles
+A profile is a convenient way of specifying which set of configuration files to use.
+The default profile is `standard`, but Sarek has multiple predefined profiles which are listed below that can be specified by specifying `-profile <profile>`:
+
+```bash
+nextflow run SciLifeLab/Sarek --sample mysample.tsv -profile myprofile
+```
 
-Every profile can be modified for your own use.
-To use a profile, you'll need to specify `-profile <profile>`
 
 ### `docker`
 
@@ -82,3 +87,14 @@ Singularity images will be pulled automatically.
 
 This is the profile for Singularity testing on a small machine, or on Travis CI.
 Singularity images will be pulled automatically.
+
+## Customisation
+The recommended way to use custom settings is to supply Sarek with an additional configuration file. You can use the files in the [`conf/`](https://github.com/SciLifeLab/Sarek/tree/master/conf) directory as an inspiration to make this new `.config` file and specify it using the `-c` flag:
+
+```bash
+nextflow run SciLifeLab/Sarek --sample mysample.tsv -c conf/personal.config
+```
+
+Any configuration field specified in this file has precedence over the predefined configurations but any field left out from the file will be set by the normal configuration files included in the specified (or `standard`) profile.
+
+Furthermore, to find out which configuration files take action for the different profiles, the profiles are defined in the file  [`nextflow.config`](https://github.com/SciLifeLab/Sarek/blob/master/nextflow.config).
diff --git a/docs/TSV.md → docs/INPUT.md b/docs/TSV.md → docs/INPUT.md
@@ -3,7 +3,7 @@
 Input files for Sarek can be specified using a tsv file given to the `--sample` parameter. The tsv file is a Tab Separated Value file with columns: `subject gender status sample lane fastq1 fastq2` or `subject gender status sample bam bai`.
 The content of these columns should be quite straight-forward:
 
-- `subject` designate the subject, it should be the ID of the Patient, or if you don't have one, il could be the Normal ID Sample.
+- `subject` designate the subject, it should be the ID of the Patient, or if you don't have one, it could be the Normal ID Sample.
 - `gender` is the gender of the Patient, (XX or XY)
 - `status` is the status of the Patient, (0 for Normal or 1 for Tumor)
 - `sample` designate the Sample, it should be the ID of the Sample (it is possible to have more than one tumor sample for each patient)
@@ -57,3 +57,44 @@ All the files will be in he Preprocessing/Recalibrated/ directory, and by defaul
 ```bash
 nextflow run SciLifeLab/Sarek/somaticVC.nf --sample Preprocessing/Recalibrated/mysample.tsv --tools Mutect2,Strelka
 ```
+
+## Input FASTQ file name best practices
+
+The input folder, containing the FASTQ files for one individual (ID) should be organized into one subfolder for every sample.
+All fastq files for that sample should be collected here.
+
+```
+ID
++--sample1
++------sample1_lib_flowcell-index_lane_R1_1000.fastq.gz
++------sample1_lib_flowcell-index_lane_R2_1000.fastq.gz
++------sample1_lib_flowcell-index_lane_R1_1000.fastq.gz
++------sample1_lib_flowcell-index_lane_R2_1000.fastq.gz
++--sample2
++------sample2_lib_flowcell-index_lane_R1_1000.fastq.gz
++------sample2_lib_flowcell-index_lane_R2_1000.fastq.gz
++--sample3
++------sample3_lib_flowcell-index_lane_R1_1000.fastq.gz
++------sample3_lib_flowcell-index_lane_R2_1000.fastq.gz
++------sample3_lib_flowcell-index_lane_R1_1000.fastq.gz
++------sample3_lib_flowcell-index_lane_R2_1000.fastq.gz
+```
+
+Fastq filename structure:
+
+- `sample_lib_flowcell-index_lane_R1_1000.fastq.gz` and
+- `sample_lib_flowcell-index_lane_R2_1000.fastq.gz`
+
+Where:
+
+- `sample` = sample id
+- `lib` = indentifier of libaray preparation
+- `flowcell` = identifyer of flow cell for the sequencing run
+- `lane` = identifier of the lane of the sequencing run
+
+Read group information will be parsed from fastq file names according to this:
+
+- `RGID` = "sample_lib_flowcell_index_lane"
+- `RGPL` = "Illumina"
+- `PU` = sample
+- `RGLB` = lib
diff --git a/docs/PARAMETERS.md b/docs/PARAMETERS.md
@@ -0,0 +1,139 @@
+# Parameters
+
+A list of all possible parameter that can be used for the different scripts included in Sarek.
+
+## Common for all scripts
+
+### --help
+
+Display help
+
+### --noReports
+
+Disable all QC tools and MultiQC.
+
+### --outDir
+
+Choose an output directory
+
+### --project `ProjectID`
+
+Specify a project number ID on a UPPMAX cluster.
+(optional if not on such a cluster)
+
+### --sample `file.tsv`
+
+Use the given TSV file as sample (cf [TSV documentation](TSV.md)).
+Is not used for `annotate.nf` and `runMultiQC.nf`.
+
+### --tools `tool1[,tool2,tool3...]`
+
+Choose which tools will be used in the workflow.
+Different tools to be separated by commas.
+Possible values are:
+
+- haplotypecaller (use `HaplotypeCaller` for VC) (germlineVC.nf)
+- manta (use `Manta` for SV) (germlineVC.nf,somaticVC.nf)
+- strelka (use `Strelka` for VC) (germlineVC.nf,somaticVC.nf)
+- ascat (use `ASCAT` for CNV) (somaticVC.nf)
+- mutect2 (use `MuTect2` for VC) (somaticVC.nf)
+- snpeff (use `snpEff` for Annotation) (annotate.nf)
+- vep (use `VEP` for Annotation) (annotate.nf)
+
+`--tools` option is case insensitive to avoid easy introduction of errors when choosing tools.
+So you can write `--tools mutect2,ascat` or `--tools MuTect2,ASCAT` without worrying about case sensitivity.
+
+### --verbose
+
+Display more information about files being processed.
+
+## Preprocessing script (`main.nf`)
+### --step `step`
+
+Choose from wich step the workflow will start.
+Choose only one step.
+Possible values are:
+
+- mapping (default, will start workflow with FASTQ files)
+- recalibrate (will start workflow with BAM files and Recalibration Tables
+
+`--step` option is case insensitive to avoid easy introduction of errors when choosing a step.
+
+### --test
+
+Test run Sarek on a smaller dataset, that way you don't have to specify `--sample Sarek-data/testdata/tsv/tiny.tsv`
+
+### --onlyQC
+
+Run only QC tools and MultiQC to generate a HTML report.
+
+
+## Annotate script (`annotate.nf`)
+
+### --annotateTools `tool1[,tool2,tool3...]`
+
+Choose which tools to annotate.
+Different tools to be separated by commas.
+Possible values are:
+- haplotypecaller (Annotate `HaplotypeCaller` output)
+- manta (Annotate `Manta` output)
+- mutect2 (Annotate `MuTect2` output)
+- strelka (Annotate `Strelka` output)
+
+### --annotateVCF `file1[,file2,file3...]`
+
+Choose vcf to annotate.
+Different vcfs to be separated by commas.
+
+
+## MultiQC script (`runMultiQC.nf`)
+### --callName `Name`
+
+Specify a name for MultiQC report (optional)
+
+### --contactMail `email`
+
+Specify an email for MultiQC report (optional)
+
+
+## References
+
+For most use cases, the reference information is already in the configuration file [`conf/genomes.config`](https://github.com/SciLifeLab/Sarek/blob/master/conf/genomes.config).
+However, if needed, you can specify any reference file at the command line.
+
+### --acLoci `acLoci file`
+
+### --bwaIndex `bwaIndex file`
+
+### --cosmic `cosmic file`
+
+### --cosmicIndex `cosmicIndex file`
+
+### --dbsnp `dbsnp file`
+
+### --dbsnpIndex `dbsnpIndex file`
+
+### --genomeDict `genomeDict file`
+
+### --genomeFile `genomeFile file`
+
+### --genomeIndex `genomeIndex file`
+
+### --intervals `intervals file`
+
+### --knownIndels `knownIndels file`
+
+### --knownIndelsIndex `knownIndelsIndex file`
+
+### --snpeffDb `snpeffDb file`
+
+## Hardware Parameters
+
+For most use cases, the reference information is already in the appropriate [configuration files](https://github.com/SciLifeLab/Sarek/blob/master/conf/).
+However, it is still possible to specify these parameters at the command line as well.
+
+### --runTime `time`
+
+### --singleCPUMem `memory`
+
+### --totalMemory `memory`
diff --git a/docs/RELEASE.md b/docs/RELEASE.md
@@ -0,0 +1,38 @@
+# RELEASE
+
+> This document is for helping Sarek core developers and anyone joining the team to prepare a new release
+
+## [CHECKLIST](https://github.com/SciLifeLab/Sarek/blob/master/.github/RELEASE_CHECKLIST.md)
+
+This checklist is for our own reference, to help us prepare a new release.
+Just follow it and be sure to check every item on the list.
+
+## [Helper script](https://github.com/SciLifeLab/Sarek/blob/master/scripts/do_release.sh)
+
+This script will update the version number in the following files:
+
+-   [CHANGELOG.md](https://github.com/SciLifeLab/Sarek/blob/master/CHANGELOG.md)
+    -   Will change Unreleased to correct version number and add codename and date
+-   [Dockerfile](https://github.com/SciLifeLab/Sarek/blob/master/Dockerfile)
+    -   Will update to correct version number
+-   [Singularity](https://github.com/SciLifeLab/Sarek/blob/master/Singularity)
+    -   Will update to correct version number
+-   [conf/base.config](https://github.com/SciLifeLab/Sarek/blob/master/conf/base.config)
+    -   Will update to correct version number
+
+### Usage
+
+### Usage
+
+```bash
+./scripts/do_release.sh -r "<RELEASE>" -c "<CODENAME>"
+```
+
+-   `-r|--release` specify the new version number
+-   `-c|--codename` specify the codename
+
+### Example
+
+```bash
+./scripts/do_release.sh -r "2.2.0" -c "Skårki"
+```