Skip to content
This repository has been archived by the owner on Jan 27, 2020. It is now read-only.

Commit

Permalink
Merge branch 'dev' into master
Browse files Browse the repository at this point in the history
  • Loading branch information
Szilveszter Juhos authored Sep 13, 2018
2 parents 72c0c2c + 7a10f5d commit a227630
Show file tree
Hide file tree
Showing 11 changed files with 653 additions and 378 deletions.
3 changes: 3 additions & 0 deletions .github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,5 +31,8 @@ A clear and concise description of what you expected to happen.
**Container (please complete the following information):**
- tag: [e.g. 1.0.0]

**Sarek (please complete the following information):**
- version: [e.g. 2.1.0]

**Additional context**
Add any other context about the problem here.
56 changes: 33 additions & 23 deletions .github/RELEASE_CHECKLIST.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,34 @@
# Release checklist
This checklist is for our own reference

1. Check that everything is up to date and ready to go
- Travis tests are passing
- Manual tests on Bianca are passing
2. Increase version numbers
3. Update version numbers in code: `configuration/base.config`
4. Build, and get the containers.
- `./scripts/do_all.sh --push --tag <VERSION>`
- `./scripts/do_all.sh --pull --tag <VERSION>`
5. Test against sample data.
- Check for any command line errors
- Check version numbers are printed correctly
- `./scripts/test.sh -p docker --tag <VERSION>`
- `./scripts/test.sh -p singularity --tag <VERSION>`
- `./scripts/test.sh -p singularityPath --tag <VERSION>`
6. Commit and push version updates
7. Make a [release](https://github.com/SciLifeLab/Sarek/releases) on GitHub
8. Choose an appropriate codename for the release
9. Update [bio.tools](https://bio.tools/Sarek) with the new release
10. Tweet that new version is released
11. Commit and push. Continue making more awesome :metal:
12. Have fika :cake:

> This checklist is for our own reference, to help us prepare a new release
1. Check that everything is ready to go

- [PRs](https://github.com/SciLifeLab/Sarek/pulls) are merged
- [Travis tests](https://travis-ci.org/SciLifeLab/Sarek/branches) are passing on `dev`

2. Increase version number following [semantic versioning](http://semver.org/spec/v2.0.0.html)
3. Choose an appropriate codename for the release
- i.e. Peaks in [Sarek National Park](https://en.wikipedia.org/wiki/Sarek_National_Park#Topography)
4. Build docker containers.

- `./scripts/do_all.sh --tag <VERSION>`

5. Test against sample data.

- `./scripts/test.sh -p docker --tag <VERSION>`
- Check for any command line errors

6. Use script to update version in files:

- `./scripts/do_release.sh -r "<VERSION>" -c "<CODENAME>"`

7. Push latest updates
8. Make a PR against `dev`
9. Merge said PR
10. Make a [release](https://github.com/SciLifeLab/Sarek/releases) on GitHub
11. Update [bio.tools](https://bio.tools/Sarek) with the new release details
12. Tweet that a new version is released
13. Add a new `Unreleased` section in `CHANGELOG.md` for the `dev` version
14. Commit and push. Continue making more awesome :metal:
15. Have fika :cake:
320 changes: 177 additions & 143 deletions CHANGELOG.md

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,5 @@ LABEL \
maintainer="Maxime Garcia <maxime.garcia@scilifelab.se>, Szilveszter Juhos <Szilveszter.Juhos@scilifelab.se>"

COPY environment.yml /
RUN conda env update -n root -f /environment.yml && conda clean -a
ENV PATH /opt/conda/bin:$PATH
RUN conda env create -f /environment.yml && conda clean -a
ENV PATH /opt/conda/envs/sarek-2.1.0/bin:$PATH
13 changes: 7 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,12 +82,13 @@ The Sarek pipeline comes with documentation in the `docs/` directory:
06. [Configuration and profiles documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/CONFIG.md)
07. [Intervals documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/INTERVALS.md)
08. [Running the pipeline](https://github.com/SciLifeLab/Sarek/blob/master/docs/USAGE.md)
09. [Examples](https://github.com/SciLifeLab/Sarek/blob/master/docs/USE_CASES.md)
10. [TSV file documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/TSV.md)
11. [Processes documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/PROCESS.md)
12. [Documentation about containers](https://github.com/SciLifeLab/Sarek/blob/master/docs/CONTAINERS.md)
13. [More information about ASCAT](https://github.com/SciLifeLab/Sarek/blob/master/docs/ASCAT.md)
14. [Output documentation structure](https://github.com/SciLifeLab/Sarek/blob/master/docs/OUTPUT.md)
09. [Command line parameters](https://github.com/SciLifeLab/Sarek/blob/master/docs/PARAMETERS.md)
10. [Examples](https://github.com/SciLifeLab/Sarek/blob/master/docs/USE_CASES.md)
11. [Input files documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/INPUT.md)
12. [Processes documentation](https://github.com/SciLifeLab/Sarek/blob/master/docs/PROCESS.md)
13. [Documentation about containers](https://github.com/SciLifeLab/Sarek/blob/master/docs/CONTAINERS.md)
14. [More information about ASCAT](https://github.com/SciLifeLab/Sarek/blob/master/docs/ASCAT.md)
15. [Output documentation structure](https://github.com/SciLifeLab/Sarek/blob/master/docs/OUTPUT.md)

## Contributions & Support

Expand Down
24 changes: 20 additions & 4 deletions docs/CONFIG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ For more informations on how to use configuration files, have a look at the [Nex
For more informations about profiles, have a look at the [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html#config-profiles)

We provides several configuration files and profiles for Sarek.
The standard ones are designed to work on a Swedish UPPMAX clusters, and can be modified and tailored to your own need.
The standard ones are designed to work on a Swedish UPPMAX cluster, but can be modified and tailored to your own need.


## Configuration files

Expand Down Expand Up @@ -51,10 +52,14 @@ To be used for Travis (2 cpus) or on small computer for testing purpose
Slurm configuration for a UPPMAX cluster
Will run the workflow on `/scratch` using the Nextflow [`scratch`](https://www.nextflow.io/docs/latest/process.html#scratch) directive

## profiles
## Profiles
A profile is a convenient way of specifying which set of configuration files to use.
The default profile is `standard`, but Sarek has multiple predefined profiles which are listed below that can be specified by specifying `-profile <profile>`:

```bash
nextflow run SciLifeLab/Sarek --sample mysample.tsv -profile myprofile
```

Every profile can be modified for your own use.
To use a profile, you'll need to specify `-profile <profile>`

### `docker`

Expand Down Expand Up @@ -82,3 +87,14 @@ Singularity images will be pulled automatically.

This is the profile for Singularity testing on a small machine, or on Travis CI.
Singularity images will be pulled automatically.

## Customisation
The recommended way to use custom settings is to supply Sarek with an additional configuration file. You can use the files in the [`conf/`](https://github.com/SciLifeLab/Sarek/tree/master/conf) directory as an inspiration to make this new `.config` file and specify it using the `-c` flag:

```bash
nextflow run SciLifeLab/Sarek --sample mysample.tsv -c conf/personal.config
```

Any configuration field specified in this file has precedence over the predefined configurations but any field left out from the file will be set by the normal configuration files included in the specified (or `standard`) profile.

Furthermore, to find out which configuration files take action for the different profiles, the profiles are defined in the file [`nextflow.config`](https://github.com/SciLifeLab/Sarek/blob/master/nextflow.config).
43 changes: 42 additions & 1 deletion docs/TSV.md → docs/INPUT.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Input files for Sarek can be specified using a tsv file given to the `--sample` parameter. The tsv file is a Tab Separated Value file with columns: `subject gender status sample lane fastq1 fastq2` or `subject gender status sample bam bai`.
The content of these columns should be quite straight-forward:

- `subject` designate the subject, it should be the ID of the Patient, or if you don't have one, il could be the Normal ID Sample.
- `subject` designate the subject, it should be the ID of the Patient, or if you don't have one, it could be the Normal ID Sample.
- `gender` is the gender of the Patient, (XX or XY)
- `status` is the status of the Patient, (0 for Normal or 1 for Tumor)
- `sample` designate the Sample, it should be the ID of the Sample (it is possible to have more than one tumor sample for each patient)
Expand Down Expand Up @@ -57,3 +57,44 @@ All the files will be in he Preprocessing/Recalibrated/ directory, and by defaul
```bash
nextflow run SciLifeLab/Sarek/somaticVC.nf --sample Preprocessing/Recalibrated/mysample.tsv --tools Mutect2,Strelka
```

## Input FASTQ file name best practices

The input folder, containing the FASTQ files for one individual (ID) should be organized into one subfolder for every sample.
All fastq files for that sample should be collected here.

```
ID
+--sample1
+------sample1_lib_flowcell-index_lane_R1_1000.fastq.gz
+------sample1_lib_flowcell-index_lane_R2_1000.fastq.gz
+------sample1_lib_flowcell-index_lane_R1_1000.fastq.gz
+------sample1_lib_flowcell-index_lane_R2_1000.fastq.gz
+--sample2
+------sample2_lib_flowcell-index_lane_R1_1000.fastq.gz
+------sample2_lib_flowcell-index_lane_R2_1000.fastq.gz
+--sample3
+------sample3_lib_flowcell-index_lane_R1_1000.fastq.gz
+------sample3_lib_flowcell-index_lane_R2_1000.fastq.gz
+------sample3_lib_flowcell-index_lane_R1_1000.fastq.gz
+------sample3_lib_flowcell-index_lane_R2_1000.fastq.gz
```

Fastq filename structure:

- `sample_lib_flowcell-index_lane_R1_1000.fastq.gz` and
- `sample_lib_flowcell-index_lane_R2_1000.fastq.gz`

Where:

- `sample` = sample id
- `lib` = indentifier of libaray preparation
- `flowcell` = identifyer of flow cell for the sequencing run
- `lane` = identifier of the lane of the sequencing run

Read group information will be parsed from fastq file names according to this:

- `RGID` = "sample_lib_flowcell_index_lane"
- `RGPL` = "Illumina"
- `PU` = sample
- `RGLB` = lib
139 changes: 139 additions & 0 deletions docs/PARAMETERS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
# Parameters

A list of all possible parameter that can be used for the different scripts included in Sarek.

## Common for all scripts

### --help

Display help

### --noReports

Disable all QC tools and MultiQC.

### --outDir

Choose an output directory

### --project `ProjectID`

Specify a project number ID on a UPPMAX cluster.
(optional if not on such a cluster)

### --sample `file.tsv`

Use the given TSV file as sample (cf [TSV documentation](TSV.md)).
Is not used for `annotate.nf` and `runMultiQC.nf`.

### --tools `tool1[,tool2,tool3...]`

Choose which tools will be used in the workflow.
Different tools to be separated by commas.
Possible values are:

- haplotypecaller (use `HaplotypeCaller` for VC) (germlineVC.nf)
- manta (use `Manta` for SV) (germlineVC.nf,somaticVC.nf)
- strelka (use `Strelka` for VC) (germlineVC.nf,somaticVC.nf)
- ascat (use `ASCAT` for CNV) (somaticVC.nf)
- mutect2 (use `MuTect2` for VC) (somaticVC.nf)
- snpeff (use `snpEff` for Annotation) (annotate.nf)
- vep (use `VEP` for Annotation) (annotate.nf)

`--tools` option is case insensitive to avoid easy introduction of errors when choosing tools.
So you can write `--tools mutect2,ascat` or `--tools MuTect2,ASCAT` without worrying about case sensitivity.

### --verbose

Display more information about files being processed.

## Preprocessing script (`main.nf`)
### --step `step`

Choose from wich step the workflow will start.
Choose only one step.
Possible values are:

- mapping (default, will start workflow with FASTQ files)
- recalibrate (will start workflow with BAM files and Recalibration Tables

`--step` option is case insensitive to avoid easy introduction of errors when choosing a step.

### --test

Test run Sarek on a smaller dataset, that way you don't have to specify `--sample Sarek-data/testdata/tsv/tiny.tsv`

### --onlyQC

Run only QC tools and MultiQC to generate a HTML report.


## Annotate script (`annotate.nf`)

### --annotateTools `tool1[,tool2,tool3...]`

Choose which tools to annotate.
Different tools to be separated by commas.
Possible values are:
- haplotypecaller (Annotate `HaplotypeCaller` output)
- manta (Annotate `Manta` output)
- mutect2 (Annotate `MuTect2` output)
- strelka (Annotate `Strelka` output)

### --annotateVCF `file1[,file2,file3...]`

Choose vcf to annotate.
Different vcfs to be separated by commas.


## MultiQC script (`runMultiQC.nf`)
### --callName `Name`

Specify a name for MultiQC report (optional)

### --contactMail `email`

Specify an email for MultiQC report (optional)


## References

For most use cases, the reference information is already in the configuration file [`conf/genomes.config`](https://github.com/SciLifeLab/Sarek/blob/master/conf/genomes.config).
However, if needed, you can specify any reference file at the command line.

### --acLoci `acLoci file`

### --bwaIndex `bwaIndex file`

### --cosmic `cosmic file`

### --cosmicIndex `cosmicIndex file`

### --dbsnp `dbsnp file`

### --dbsnpIndex `dbsnpIndex file`

### --genomeDict `genomeDict file`

### --genomeFile `genomeFile file`

### --genomeIndex `genomeIndex file`

### --intervals `intervals file`

### --knownIndels `knownIndels file`

### --knownIndelsIndex `knownIndelsIndex file`

### --snpeffDb `snpeffDb file`

## Hardware Parameters

For most use cases, the reference information is already in the appropriate [configuration files](https://github.com/SciLifeLab/Sarek/blob/master/conf/).
However, it is still possible to specify these parameters at the command line as well.

### --runTime `time`

### --singleCPUMem `memory`

### --totalMemory `memory`
38 changes: 38 additions & 0 deletions docs/RELEASE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# RELEASE

> This document is for helping Sarek core developers and anyone joining the team to prepare a new release
## [CHECKLIST](https://github.com/SciLifeLab/Sarek/blob/master/.github/RELEASE_CHECKLIST.md)

This checklist is for our own reference, to help us prepare a new release.
Just follow it and be sure to check every item on the list.

## [Helper script](https://github.com/SciLifeLab/Sarek/blob/master/scripts/do_release.sh)

This script will update the version number in the following files:

- [CHANGELOG.md](https://github.com/SciLifeLab/Sarek/blob/master/CHANGELOG.md)
- Will change Unreleased to correct version number and add codename and date
- [Dockerfile](https://github.com/SciLifeLab/Sarek/blob/master/Dockerfile)
- Will update to correct version number
- [Singularity](https://github.com/SciLifeLab/Sarek/blob/master/Singularity)
- Will update to correct version number
- [conf/base.config](https://github.com/SciLifeLab/Sarek/blob/master/conf/base.config)
- Will update to correct version number

### Usage

### Usage

```bash
./scripts/do_release.sh -r "<RELEASE>" -c "<CODENAME>"
```

- `-r|--release` specify the new version number
- `-c|--codename` specify the codename

### Example

```bash
./scripts/do_release.sh -r "2.2.0" -c "Skårki"
```
Loading

0 comments on commit a227630

Please sign in to comment.