Skip to content
This repository has been archived by the owner on Jan 27, 2020. It is now read-only.

Beginners usage docs [skip ci] #639

Merged
merged 9 commits into from
Sep 13, 2018
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
- [#615](https://github.com/SciLifeLab/Sarek/pull/615) - Update documentation
- [#620](https://github.com/SciLifeLab/Sarek/pull/620) - Add `tmp/` to `.gitignore`
- [#625](https://github.com/SciLifeLab/Sarek/pull/625) - Add [`pathfindr`](https://github.com/NBISweden/pathfindr) as a submodule
- [#639](https://github.com/SciLifeLab/Sarek/pull/639) - Add a complete example analysis to docs

### `Changed`
- [#608](https://github.com/SciLifeLab/Sarek/pull/608) - Update Nextflow required version
Expand All @@ -24,6 +25,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
- [#632](https://github.com/SciLifeLab/Sarek/pull/632) - Use 2 threads and 2 cpus FastQC processes
- [#637](https://github.com/SciLifeLab/Sarek/pull/637) - Update tool version gathering
- [#638](https://github.com/SciLifeLab/Sarek/pull/638) - Use correct `.simg` extension for Singularity images
- [#639](https://github.com/SciLifeLab/Sarek/pull/639) - Smaller refactoring of the docs

### `Removed`
- [#616](https://github.com/SciLifeLab/Sarek/pull/616) - Remove old Issue Template
Expand Down
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,12 +82,13 @@ The Sarek pipeline comes with documentation in the `docs/` directory:
06. [Configuration and profiles documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/CONFIG.md)
07. [Intervals documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/INTERVALS.md)
08. [Running the pipeline](https://github.com/SciLifeLab/Sarek/blob/master/docs/USAGE.md)
09. [Examples](https://github.com/SciLifeLab/Sarek/blob/master/docs/USE_CASES.md)
10. [TSV file documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/TSV.md)
11. [Processes documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/PROCESS.md)
12. [Documentation about containers](https://github.com/SciLifeLab/Sarek/blob/master/docs/CONTAINERS.md)
13. [More information about ASCAT](https://github.com/SciLifeLab/Sarek/blob/master/docs/ASCAT.md)
14. [Output documentation structure](https://github.com/SciLifeLab/Sarek/blob/master/docs/OUTPUT.md)
09. [Command line parameters](https://github.com/SciLifeLab/Sarek/blob/master/docs/PARAMETERS.md)
10. [Examples](https://github.com/SciLifeLab/Sarek/blob/master/docs/USE_CASES.md)
11. [Input files documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/INPUT.md)
12. [Processes documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/PROCESS.md)
13. [Documentation about containers](https://github.com/SciLifeLab/Sarek/blob/master/docs/CONTAINERS.md)
14. [More information about ASCAT](https://github.com/SciLifeLab/Sarek/blob/master/docs/ASCAT.md)
15. [Output documentation structure](https://github.com/SciLifeLab/Sarek/blob/master/docs/OUTPUT.md)

## Contributions & Support

Expand Down
24 changes: 20 additions & 4 deletions docs/CONFIG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ For more informations on how to use configuration files, have a look at the [Nex
For more informations about profiles, have a look at the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html#config-profiles)

We provides several configuration files and profiles for Sarek.
The standard ones are designed to work on a Swedish UPPMAX clusters, and can be modified and tailored to your own need.
The standard ones are designed to work on a Swedish UPPMAX cluster, but can be modified and tailored to your own need.


## Configuration files

Expand Down Expand Up @@ -51,10 +52,14 @@ To be used for Travis (2 cpus) or on small computer for testing purpose
Slurm configuration for a UPPMAX cluster
Will run the workflow on `/scratch` using the Nextflow [`scratch`](https://www.nextflow.io/docs/latest/process.html#scratch) directive

## profiles
## Profiles
A profile is a convenient way of specifying which set of configuration files to use.
The default profile is `standard`, but Sarek has multiple predefined profiles which are listed below that can be specified by specifying `-profile <profile>`:

```bash
nextflow run SciLifeLab/Sarek --sample mysample.tsv -profile myprofile
```

Every profile can be modified for your own use.
To use a profile, you'll need to specify `-profile <profile>`

### `docker`

Expand Down Expand Up @@ -82,3 +87,14 @@ Singularity images will be pulled automatically.

This is the profile for Singularity testing on a small machine, or on Travis CI.
Singularity images will be pulled automatically.

## Customisation
The recommended way to use custom settings is to supply Sarek with an additional configuration file. You can use the files in the [`conf/`](https://github.com/SciLifeLab/Sarek/tree/master/conf) directory as an inspiration to make this new `.config` file and specify it using the `-c` flag:

```bash
nextflow run SciLifeLab/Sarek --sample mysample.tsv -c conf/personal.config
```

Any configuration field specified in this file has precedence over the predefined configurations but any field left out from the file will be set by the normal configuration files included in the specified (or `standard`) profile.

Furthermore, to find out which configuration files take action for the different profiles, the profiles are defined in the file [`nextflow.config`](https://github.com/SciLifeLab/Sarek/blob/master/nextflow.config).
43 changes: 42 additions & 1 deletion docs/TSV.md → docs/INPUT.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Input files for Sarek can be specified using a tsv file given to the `--sample` parameter. The tsv file is a Tab Separated Value file with columns: `subject gender status sample lane fastq1 fastq2` or `subject gender status sample bam bai`.
The content of these columns should be quite straight-forward:

- `subject` designate the subject, it should be the ID of the Patient, or if you don't have one, il could be the Normal ID Sample.
- `subject` designate the subject, it should be the ID of the Patient, or if you don't have one, it could be the Normal ID Sample.
- `gender` is the gender of the Patient, (XX or XY)
- `status` is the status of the Patient, (0 for Normal or 1 for Tumor)
- `sample` designate the Sample, it should be the ID of the Sample (it is possible to have more than one tumor sample for each patient)
Expand Down Expand Up @@ -57,3 +57,44 @@ All the files will be in he Preprocessing/Recalibrated/ directory, and by defaul
```bash
nextflow run SciLifeLab/Sarek/somaticVC.nf --sample Preprocessing/Recalibrated/mysample.tsv --tools Mutect2,Strelka
```

## Input FASTQ file name best practices

The input folder, containing the FASTQ files for one individual (ID) should be organized into one subfolder for every sample.
All fastq files for that sample should be collected here.

```
ID
+--sample1
+------sample1_lib_flowcell-index_lane_R1_1000.fastq.gz
+------sample1_lib_flowcell-index_lane_R2_1000.fastq.gz
+------sample1_lib_flowcell-index_lane_R1_1000.fastq.gz
+------sample1_lib_flowcell-index_lane_R2_1000.fastq.gz
+--sample2
+------sample2_lib_flowcell-index_lane_R1_1000.fastq.gz
+------sample2_lib_flowcell-index_lane_R2_1000.fastq.gz
+--sample3
+------sample3_lib_flowcell-index_lane_R1_1000.fastq.gz
+------sample3_lib_flowcell-index_lane_R2_1000.fastq.gz
+------sample3_lib_flowcell-index_lane_R1_1000.fastq.gz
+------sample3_lib_flowcell-index_lane_R2_1000.fastq.gz
```

Fastq filename structure:

- `sample_lib_flowcell-index_lane_R1_1000.fastq.gz` and
- `sample_lib_flowcell-index_lane_R2_1000.fastq.gz`

Where:

- `sample` = sample id
- `lib` = indentifier of libaray preparation
- `flowcell` = identifyer of flow cell for the sequencing run
- `lane` = identifier of the lane of the sequencing run

Read group information will be parsed from fastq file names according to this:

- `RGID` = "sample_lib_flowcell_index_lane"
- `RGPL` = "Illumina"
- `PU` = sample
- `RGLB` = lib
139 changes: 139 additions & 0 deletions docs/PARAMETERS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# Parameters

A list of all possible parameter that can be used for the different scripts included in Sarek.

## Common for all scripts

### --help

Display help

### --noReports

Disable all QC tools and MultiQC.

### --outDir

Choose an output directory

### --project `ProjectID`

Specify a project number ID on a UPPMAX cluster.
(optional if not on such a cluster)

### --sample `file.tsv`

Use the given TSV file as sample (cf [TSV documentation](TSV.md)).
Is not used for `annotate.nf` and `runMultiQC.nf`.

### --tools `tool1[,tool2,tool3...]`

Choose which tools will be used in the workflow.
Different tools to be separated by commas.
Possible values are:

- haplotypecaller (use `HaplotypeCaller` for VC) (germlineVC.nf)
- manta (use `Manta` for SV) (germlineVC.nf,somaticVC.nf)
- strelka (use `Strelka` for VC) (germlineVC.nf,somaticVC.nf)
- ascat (use `ASCAT` for CNV) (somaticVC.nf)
- mutect2 (use `MuTect2` for VC) (somaticVC.nf)
- snpeff (use `snpEff` for Annotation) (annotate.nf)
- vep (use `VEP` for Annotation) (annotate.nf)

`--tools` option is case insensitive to avoid easy introduction of errors when choosing tools.
So you can write `--tools mutect2,ascat` or `--tools MuTect2,ASCAT` without worrying about case sensitivity.

### --verbose

Display more information about files being processed.

## Preprocessing script (`main.nf`)
### --step `step`

Choose from wich step the workflow will start.
Choose only one step.
Possible values are:

- mapping (default, will start workflow with FASTQ files)
- recalibrate (will start workflow with BAM files and Recalibration Tables

`--step` option is case insensitive to avoid easy introduction of errors when choosing a step.

### --test

Test run Sarek on a smaller dataset, that way you don't have to specify `--sample Sarek-data/testdata/tsv/tiny.tsv`

### --onlyQC

Run only QC tools and MultiQC to generate a HTML report.


## Annotate script (`annotate.nf`)

### --annotateTools `tool1[,tool2,tool3...]`

Choose which tools to annotate.
Different tools to be separated by commas.
Possible values are:
- haplotypecaller (Annotate `HaplotypeCaller` output)
- manta (Annotate `Manta` output)
- mutect2 (Annotate `MuTect2` output)
- strelka (Annotate `Strelka` output)

### --annotateVCF `file1[,file2,file3...]`

Choose vcf to annotate.
Different vcfs to be separated by commas.


## MultiQC script (`runMultiQC.nf`)
### --callName `Name`

Specify a name for MultiQC report (optional)

### --contactMail `email`

Specify an email for MultiQC report (optional)


## References

For most use cases, the reference information is already in the configuration file [`conf/genomes.config`](https://github.com/SciLifeLab/Sarek/blob/master/conf/genomes.config).
However, if needed, you can specify any reference file at the command line.

### --acLoci `acLoci file`

### --bwaIndex `bwaIndex file`

### --cosmic `cosmic file`

### --cosmicIndex `cosmicIndex file`

### --dbsnp `dbsnp file`

### --dbsnpIndex `dbsnpIndex file`

### --genomeDict `genomeDict file`

### --genomeFile `genomeFile file`

### --genomeIndex `genomeIndex file`

### --intervals `intervals file`

### --knownIndels `knownIndels file`

### --knownIndelsIndex `knownIndelsIndex file`

### --snpeffDb `snpeffDb file`

## Hardware Parameters

For most use cases, the reference information is already in the appropriate [configuration files](https://github.com/SciLifeLab/Sarek/blob/master/conf/).
However, it is still possible to specify these parameters at the command line as well.

### --runTime `time`

### --singleCPUMem `memory`

### --totalMemory `memory`
Loading