SciLifeLab · maxulysse · Sep 13, 2018 · Sep 10, 2018 · Sep 10, 2018 · Sep 10, 2018
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -12,6 +12,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
 - [#615](https://github.com/SciLifeLab/Sarek/pull/615) - Update documentation
 - [#620](https://github.com/SciLifeLab/Sarek/pull/620) - Add `tmp/` to `.gitignore`
 - [#625](https://github.com/SciLifeLab/Sarek/pull/625) - Add [`pathfindr`](https://github.com/NBISweden/pathfindr) as a submodule
+- [#639](https://github.com/SciLifeLab/Sarek/pull/639) - Add a complete example analysis to docs
 
 ### `Changed`
 - [#608](https://github.com/SciLifeLab/Sarek/pull/608) - Update Nextflow required version
@@ -24,6 +25,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
 - [#632](https://github.com/SciLifeLab/Sarek/pull/632) - Use 2 threads and 2 cpus FastQC processes
 - [#637](https://github.com/SciLifeLab/Sarek/pull/637) - Update tool version gathering
 - [#638](https://github.com/SciLifeLab/Sarek/pull/638) - Use correct `.simg` extension for Singularity images
+- [#639](https://github.com/SciLifeLab/Sarek/pull/639) - Smaller refactoring of the docs
 
 ### `Removed`
 - [#616](https://github.com/SciLifeLab/Sarek/pull/616) - Remove old Issue Template

diff --git a/README.md b/README.md
@@ -82,12 +82,13 @@ The Sarek pipeline comes with documentation in the `docs/` directory:
 06. [Configuration and profiles documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/CONFIG.md)
 07. [Intervals documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/INTERVALS.md)
 08. [Running the pipeline](https://github.com/SciLifeLab/Sarek/blob/master/docs/USAGE.md)
-09. [Examples](https://github.com/SciLifeLab/Sarek/blob/master/docs/USE_CASES.md)
-10. [TSV file documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/TSV.md)
-11. [Processes documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/PROCESS.md)
-12. [Documentation about containers](https://github.com/SciLifeLab/Sarek/blob/master/docs/CONTAINERS.md)
-13. [More information about ASCAT](https://github.com/SciLifeLab/Sarek/blob/master/docs/ASCAT.md)
-14. [Output documentation structure](https://github.com/SciLifeLab/Sarek/blob/master/docs/OUTPUT.md)
+09. [Command line parameters](https://github.com/SciLifeLab/Sarek/blob/master/docs/PARAMETERS.md)
+10. [Examples](https://github.com/SciLifeLab/Sarek/blob/master/docs/USE_CASES.md)
+11. [Input files documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/INPUT.md)
+12. [Processes documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/PROCESS.md)
+13. [Documentation about containers](https://github.com/SciLifeLab/Sarek/blob/master/docs/CONTAINERS.md)
+14. [More information about ASCAT](https://github.com/SciLifeLab/Sarek/blob/master/docs/ASCAT.md)
+15. [Output documentation structure](https://github.com/SciLifeLab/Sarek/blob/master/docs/OUTPUT.md)
 
 ## Contributions & Support
 

diff --git a/docs/CONFIG.md b/docs/CONFIG.md
@@ -5,7 +5,8 @@ For more informations on how to use configuration files, have a look at the [Nex
 For more informations about profiles, have a look at the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html#config-profiles)
 
 We provides several configuration files and profiles for Sarek.
-The standard ones are designed to work on a Swedish UPPMAX clusters, and can be modified and tailored to your own need.
+The standard ones are designed to work on a Swedish UPPMAX cluster, but can be modified and tailored to your own need.
+
 
 ## Configuration files
 
@@ -51,10 +52,14 @@ To be used for Travis (2 cpus) or on small computer for testing purpose
 Slurm configuration for a UPPMAX cluster
 Will run the workflow on `/scratch` using the Nextflow [`scratch`](https://www.nextflow.io/docs/latest/process.html#scratch) directive
 
-## profiles
+## Profiles
+A profile is a convenient way of specifying which set of configuration files to use.
+The default profile is `standard`, but Sarek has multiple predefined profiles which are listed below that can be specified by specifying `-profile <profile>`:
+
+```bash
+nextflow run SciLifeLab/Sarek --sample mysample.tsv -profile myprofile
+```
 
-Every profile can be modified for your own use.
-To use a profile, you'll need to specify `-profile <profile>`
 
 ### `docker`
 
@@ -82,3 +87,14 @@ Singularity images will be pulled automatically.
 
 This is the profile for Singularity testing on a small machine, or on Travis CI.
 Singularity images will be pulled automatically.
+
+## Customisation
+The recommended way to use custom settings is to supply Sarek with an additional configuration file. You can use the files in the [`conf/`](https://github.com/SciLifeLab/Sarek/tree/master/conf) directory as an inspiration to make this new `.config` file and specify it using the `-c` flag:
+
+```bash
+nextflow run SciLifeLab/Sarek --sample mysample.tsv -c conf/personal.config
+```
+
+Any configuration field specified in this file has precedence over the predefined configurations but any field left out from the file will be set by the normal configuration files included in the specified (or `standard`) profile.
+
+Furthermore, to find out which configuration files take action for the different profiles, the profiles are defined in the file  [`nextflow.config`](https://github.com/SciLifeLab/Sarek/blob/master/nextflow.config).
diff --git a/docs/TSV.md → docs/INPUT.md b/docs/TSV.md → docs/INPUT.md
@@ -3,7 +3,7 @@
 Input files for Sarek can be specified using a tsv file given to the `--sample` parameter. The tsv file is a Tab Separated Value file with columns: `subject gender status sample lane fastq1 fastq2` or `subject gender status sample bam bai`.
 The content of these columns should be quite straight-forward:
 
-- `subject` designate the subject, it should be the ID of the Patient, or if you don't have one, il could be the Normal ID Sample.
+- `subject` designate the subject, it should be the ID of the Patient, or if you don't have one, it could be the Normal ID Sample.
 - `gender` is the gender of the Patient, (XX or XY)
 - `status` is the status of the Patient, (0 for Normal or 1 for Tumor)
 - `sample` designate the Sample, it should be the ID of the Sample (it is possible to have more than one tumor sample for each patient)
@@ -57,3 +57,44 @@ All the files will be in he Preprocessing/Recalibrated/ directory, and by defaul
 ```bash
 nextflow run SciLifeLab/Sarek/somaticVC.nf --sample Preprocessing/Recalibrated/mysample.tsv --tools Mutect2,Strelka
 ```
+
+## Input FASTQ file name best practices
+
+The input folder, containing the FASTQ files for one individual (ID) should be organized into one subfolder for every sample.
+All fastq files for that sample should be collected here.
+
+```
+ID
++--sample1
++------sample1_lib_flowcell-index_lane_R1_1000.fastq.gz
++------sample1_lib_flowcell-index_lane_R2_1000.fastq.gz
++------sample1_lib_flowcell-index_lane_R1_1000.fastq.gz
++------sample1_lib_flowcell-index_lane_R2_1000.fastq.gz
++--sample2
++------sample2_lib_flowcell-index_lane_R1_1000.fastq.gz
++------sample2_lib_flowcell-index_lane_R2_1000.fastq.gz
++--sample3
++------sample3_lib_flowcell-index_lane_R1_1000.fastq.gz
++------sample3_lib_flowcell-index_lane_R2_1000.fastq.gz
++------sample3_lib_flowcell-index_lane_R1_1000.fastq.gz
++------sample3_lib_flowcell-index_lane_R2_1000.fastq.gz
+```
+
+Fastq filename structure:
+
+- `sample_lib_flowcell-index_lane_R1_1000.fastq.gz` and
+- `sample_lib_flowcell-index_lane_R2_1000.fastq.gz`
+
+Where:
+
+- `sample` = sample id
+- `lib` = indentifier of libaray preparation
+- `flowcell` = identifyer of flow cell for the sequencing run
+- `lane` = identifier of the lane of the sequencing run
+
+Read group information will be parsed from fastq file names according to this:
+
+- `RGID` = "sample_lib_flowcell_index_lane"
+- `RGPL` = "Illumina"
+- `PU` = sample
+- `RGLB` = lib
diff --git a/docs/PARAMETERS.md b/docs/PARAMETERS.md
@@ -0,0 +1,139 @@
+# Parameters
+
+A list of all possible parameter that can be used for the different scripts included in Sarek.
+
+## Common for all scripts
+
+### --help
+
+Display help
+
+### --noReports
+
+Disable all QC tools and MultiQC.
+
+### --outDir
+
+Choose an output directory
+
+### --project `ProjectID`
+
+Specify a project number ID on a UPPMAX cluster.
+(optional if not on such a cluster)
+
+### --sample `file.tsv`
+
+Use the given TSV file as sample (cf [TSV documentation](TSV.md)).
+Is not used for `annotate.nf` and `runMultiQC.nf`.
+
+### --tools `tool1[,tool2,tool3...]`
+
+Choose which tools will be used in the workflow.
+Different tools to be separated by commas.
+Possible values are:
+
+- haplotypecaller (use `HaplotypeCaller` for VC) (germlineVC.nf)
+- manta (use `Manta` for SV) (germlineVC.nf,somaticVC.nf)
+- strelka (use `Strelka` for VC) (germlineVC.nf,somaticVC.nf)
+- ascat (use `ASCAT` for CNV) (somaticVC.nf)
+- mutect2 (use `MuTect2` for VC) (somaticVC.nf)
+- snpeff (use `snpEff` for Annotation) (annotate.nf)
+- vep (use `VEP` for Annotation) (annotate.nf)
+
+`--tools` option is case insensitive to avoid easy introduction of errors when choosing tools.
+So you can write `--tools mutect2,ascat` or `--tools MuTect2,ASCAT` without worrying about case sensitivity.
+
+### --verbose
+
+Display more information about files being processed.
+
+## Preprocessing script (`main.nf`)
+### --step `step`
+
+Choose from wich step the workflow will start.
+Choose only one step.
+Possible values are:
+
+- mapping (default, will start workflow with FASTQ files)
+- recalibrate (will start workflow with BAM files and Recalibration Tables
+
+`--step` option is case insensitive to avoid easy introduction of errors when choosing a step.
+
+### --test
+
+Test run Sarek on a smaller dataset, that way you don't have to specify `--sample Sarek-data/testdata/tsv/tiny.tsv`
+
+### --onlyQC
+
+Run only QC tools and MultiQC to generate a HTML report.
+
+
+## Annotate script (`annotate.nf`)
+
+### --annotateTools `tool1[,tool2,tool3...]`
+
+Choose which tools to annotate.
+Different tools to be separated by commas.
+Possible values are:
+- haplotypecaller (Annotate `HaplotypeCaller` output)
+- manta (Annotate `Manta` output)
+- mutect2 (Annotate `MuTect2` output)
+- strelka (Annotate `Strelka` output)
+
+### --annotateVCF `file1[,file2,file3...]`
+
+Choose vcf to annotate.
+Different vcfs to be separated by commas.
+
+
+## MultiQC script (`runMultiQC.nf`)
+### --callName `Name`
+
+Specify a name for MultiQC report (optional)
+
+### --contactMail `email`
+
+Specify an email for MultiQC report (optional)
+
+
+## References
+
+For most use cases, the reference information is already in the configuration file [`conf/genomes.config`](https://github.com/SciLifeLab/Sarek/blob/master/conf/genomes.config).
+However, if needed, you can specify any reference file at the command line.
+
+### --acLoci `acLoci file`
+
+### --bwaIndex `bwaIndex file`
+
+### --cosmic `cosmic file`
+
+### --cosmicIndex `cosmicIndex file`
+
+### --dbsnp `dbsnp file`
+
+### --dbsnpIndex `dbsnpIndex file`
+
+### --genomeDict `genomeDict file`
+
+### --genomeFile `genomeFile file`
+
+### --genomeIndex `genomeIndex file`
+
+### --intervals `intervals file`
+
+### --knownIndels `knownIndels file`
+
+### --knownIndelsIndex `knownIndelsIndex file`
+
+### --snpeffDb `snpeffDb file`
+
+## Hardware Parameters
+
+For most use cases, the reference information is already in the appropriate [configuration files](https://github.com/SciLifeLab/Sarek/blob/master/conf/).
+However, it is still possible to specify these parameters at the command line as well.
+
+### --runTime `time`
+
+### --singleCPUMem `memory`
+
+### --totalMemory `memory`