Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update comments merged on the 3.0 PR back to dev #664

Merged
merged 41 commits into from
Jul 20, 2022
Merged
Show file tree
Hide file tree
Changes from 39 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
546ccd3
feat: nf-core bump-version . 2.7.1
maxulysse May 13, 2021
168b918
feat: manual bump to 2.7.1
maxulysse May 13, 2021
3230b99
Merge remote-tracking branch 'upstream/dev' into dev_2.7.1
maxulysse May 18, 2021
b3f1fa5
Merge remote-tracking branch 'upstream/dev' into dev_2.7.1
maxulysse May 27, 2021
ec59f4f
Merge remote-tracking branch 'upstream/dev' into dev_2.7.1
maxulysse May 27, 2021
4952831
Merge remote-tracking branch 'upstream/dev' into dev_2.7.1
maxulysse May 27, 2021
ba8eed5
Merge remote-tracking branch 'upstream/dev' into dev_2.7.1
maxulysse Jun 1, 2021
3c98945
Merge remote-tracking branch 'upstream/dev' into dev_2.7.1
maxulysse Jun 11, 2021
68b9930
Merge pull request #374 from maxulysse/dev_2.7.1
maxulysse Jun 14, 2021
48960d4
Fix `MapReads` caching bug (#555)
May 26, 2022
613193e
Bump version to 2.7.2
Jun 8, 2022
0842a51
Ensure DSL1 is used
Jun 8, 2022
1e71ee3
Update Sarek workflow PNG image
Jun 9, 2022
e8f56e5
Merge pull request #566 from Sage-Bionetworks-Workflows/bgrande/issue…
FriederikeHanssen Jun 10, 2022
784ea35
Merge remote-tracking branch 'upstream/dev' into 3.0
maxulysse Jul 13, 2022
3b68cb2
Merge remote-tracking branch 'upstream/dev' into 3.0
maxulysse Jul 14, 2022
ccf33f6
Merge remote-tracking branch 'upstream/dev' into 3.0
maxulysse Jul 17, 2022
1dbb717
prettier
maxulysse Jul 17, 2022
22d3b44
Merge remote-tracking branch 'upstream/dev' into 3.0
maxulysse Jul 17, 2022
8b9183d
Merge remote-tracking branch 'upstream/dev' into 3.0
maxulysse Jul 17, 2022
f4055c8
Merge remote-tracking branch 'upstream/dev' into 3.0
maxulysse Jul 17, 2022
0261cd0
Merge remote-tracking branch 'upstream/dev' into 3.0
maxulysse Jul 18, 2022
813a5cb
Merge remote-tracking branch 'upstream/dev' into 3.0
maxulysse Jul 18, 2022
b1068e3
Merge remote-tracking branch 'upstream/dev' into 3.0
maxulysse Jul 18, 2022
7c3bcfc
Merge remote-tracking branch 'upstream/dev' into 3.0
maxulysse Jul 19, 2022
980f76b
Update workflows/sarek.nf
FriederikeHanssen Jul 19, 2022
5556e57
Merge remote-tracking branch 'upstream/dev' into 3.0
maxulysse Jul 19, 2022
433ef65
Merge branch '3.0' of github.com:maxulysse/nf-core_sarek into 3.0
maxulysse Jul 19, 2022
a958da4
Update README.md
FriederikeHanssen Jul 19, 2022
0c0a83e
Update README.md
FriederikeHanssen Jul 19, 2022
bb9a013
Update README.md
FriederikeHanssen Jul 19, 2022
3142264
Update docs/usage.md
FriederikeHanssen Jul 19, 2022
0264d84
Apply suggestions from code review
FriederikeHanssen Jul 19, 2022
5271e35
Merge remote-tracking branch 'upstream/dev' into 3.0
maxulysse Jul 20, 2022
b58ca44
Merge branch '3.0' of github.com:maxulysse/nf-core_sarek into 3.0
maxulysse Jul 20, 2022
11c1842
Merge branch '3.0' into dev
maxulysse Jul 20, 2022
25d2c10
Merge remote-tracking branch 'upstream/dev' into 3.0
maxulysse Jul 20, 2022
c064aee
Merge remote-tracking branch 'upstream/dev' into 3.0
maxulysse Jul 20, 2022
db7024e
Merge branch '3.0' into dev_3.0_PR
maxulysse Jul 20, 2022
fc50739
remove duplication
maxulysse Jul 20, 2022
fc09504
Apply suggestions from code review
maxulysse Jul 20, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,17 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [dev](https://github.com/nf-core/sarek/tree/dev)
## [2.7.2](https://github.com/nf-core/sarek/releases/tag/2.7.2) - Áhkká

Áhkká is one of the massifs just outside of the Sarek National Park.

### Fixed

- [#566](https://github.com/nf-core/sarek/pull/566) - Fix caching bug affecting a variable number of `MapReads` jobs due to non-deterministic state of `statusMap` during caching evaluation

## [2.7.1](https://github.com/nf-core/sarek/releases/tag/2.7.1) - Pårtejekna

Pårtejekna is one of glaciers of the Pårte Massif.

### Added

Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,6 @@ Friederike Hanssen and Gisela Gabernet at [QBiC](https://www.qbic.uni-tuebingen.

Main authors:

- [Gisela Gabernet](https://github.com/ggabernet)
- [Maxime Garcia](https://github.com/maxulysse)
- [Friederike Hanssen](https://github.com/FriederikeHanssen)
- [Szilveszter Juhos](https://github.com/szilvajuhos)
Expand All @@ -99,6 +98,7 @@ We thank the following people for their extensive assistance in the development
- [Chela James](https://github.com/chelauk)
- [David Mas-Ponte](https://github.com/davidmasp)
- [Francesco L](https://github.com/nibscles)
- [Gisela Gabernet](https://github.com/ggabernet)
- [Harshil Patel](https://github.com/drpatelh)
- [James A. Fellows Yates](https://github.com/jfy133)
- [Jesper Eisfeldt](https://github.com/J35P312)
Expand All @@ -108,7 +108,7 @@ We thank the following people for their extensive assistance in the development
- [Lucia Conde](https://github.com/lconde-ucl)
- [Malin Larsson](https://github.com/malinlarsson)
- [Marcel Martin](https://github.com/marcelm)
- [Nick Smith](https://github,com/nickhsmith)
- [Nick Smith](https://github.com/nickhsmith)
- [Nilesh Tawari](https://github.com/nilesh-tawari)
- [Olga Botvinnik](https://github.com/olgabot)
- [Oskar Wacker](https://github.com/WackerO)
Expand Down
112 changes: 111 additions & 1 deletion docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -744,6 +744,36 @@ GATK.GRCh38:
| vep_genome | | 'GRCh38' | |
| chr_dir | | "${params.igenomes_base}/Homo_sapiens/GATK/GRCh38/Sequence/Chromosomes" | |

## How to run sarek when no(t all) reference files are in igenomes

For common genomes, such as GRCh38 and GRCh37, the pipeline is shipped with (almost) all necessary reference files. However, sometimes it is necessary to use custom references for some or all files:

### No igenomes reference files are used

If none of your required genome files are in igenomes, `--igenomes_ignore` must be set to ignore any igenomes input and `--genome null`. The `fasta` file is the only required input file and must be provided to run the pipeline. All other possible reference file can be provided in addition. For details, see the paramter documentation.

Minimal example for custom genomes:

```
nextflow run nf-core/sarek --genome null --igenomes_ignore --fasta <custom.fasta>
asp8200 marked this conversation as resolved.
Show resolved Hide resolved
```

### Overwrite specific reference files

If you don't want to use some of the provided reference genomes, they can be overwritten by either providing a new file or setting the respective file parameter to `false`, if it should be ignored:

Example for using a custom known indels file:

```
nextflow run nf-core/sarek --known_indels <my_known_indels.vcf.gz> --genome GRCh38.GATK
```

Example for not using known indels, but all other provided reference files:

```
nextflow run nf-core/sarek --known_indels false --genome GRCh38.GATK
```

maxulysse marked this conversation as resolved.
Show resolved Hide resolved
## How to customise SnpEff and VEP annotation

_under construction help needed_
Expand Down Expand Up @@ -784,7 +814,7 @@ Based on [nfcore/base:1.12.1](https://hub.docker.com/r/nfcore/base/tags), it con
### Using downloaded cache

Both `snpEff` and `VEP` enable usage of cache, if no pre-build container is available.
The cache needs to made available on the machine where Sarek is run.
The cache needs to be made available on the machine where Sarek is run.
You need to specify the cache directory using `--snpeff_cache` and `--vep_cache` in the command lines or within configuration files.

Example:
Expand Down Expand Up @@ -844,6 +874,86 @@ nextflow run download_cache.nf --cadd_cache </path/to/CADD/cache> --cadd_version
Resource requests are difficult to generalize and are often dependent on input data size. Currently, the number of cpus and memory requested by default were adapted from tests on 5 ICGC paired whole-genome sequencing samples with approximately 40X and 80X depth.
For targeted data analysis, this is overshooting by a lot. In this case resources for each process can be limited by either setting `--max_memory` and `-max_cpus` or tailoring the request by process name as described [here](#resource-requests). If you are using sarek for a certain data type regulary, and would like to make these requests available to others on your system, an institution-specific, pipeline-specific config file can be added [here](https://github.com/nf-core/configs/tree/master/conf/pipeline/sarek).

For mapping, sarek follows the parameter suggestions provided in this [paper](https://www.nature.com/articles/s41467-018-06159-4):
maxulysse marked this conversation as resolved.
Show resolved Hide resolved

`-K 100000000` : for deterministic pipeline results, for more info see [here](https://github.com/CCDG/Pipeline-Standardization/issues/2)

`-Y`: force soft-clipping rather than default hard-clipping of supplementary alignments

In addition, currently, reads with tumor status in the sample sheet are mapped with a mismatch penalty of `-B 3`.

## Spark related issues

If you have problems running processes that make use of Spark such as `MarkDuplicates`.
You are probably experiencing issues with the limit of open files in your system.
You can check your current limit by typing the following:
maxulysse marked this conversation as resolved.
Show resolved Hide resolved
maxulysse marked this conversation as resolved.
Show resolved Hide resolved

```bash
ulimit -n
```

The default limit size is usually 1024 which is quite low to run Spark jobs.
In order to increase the size limit permanently you can:

Edit the file `/etc/security/limits.conf` and add the lines:

```bash
* soft nofile 65535
* hard nofile 65535
```

Edit the file `/etc/sysctl.conf` and add the line:

```bash
fs.file-max = 65535
```

Edit the file `/etc/sysconfig/docker` and add the new limits to OPTIONS like this:

```bash
OPTIONS=”—default-ulimit nofile=65535:65535"
```

Re-start your session.

Note that the way to increase the open file limit in your system may be slightly different or require additional steps.

### Cannot delete work folder when using docker + Spark

Currently, when running spark-based tools in combination with docker, it is required to set `docker.userEmulation = false`. This can unfortunately cause permission issues when `work/` is being written with root permissions. In case this happens, you might need to configure docker to run without `userEmulation` (see [here](https://github.com/Midnighter/nf-core-adr/blob/main/docs/adr/0008-refrain-from-using-docker-useremulation-in-nextflow.md)).

## How to handle UMIs

Sarek can process UMI-reads, using [fgbio](http://fulcrumgenomics.github.io/fgbio/tools/latest/) tools.

In order to use reads containing UMI tags as your initial input, you need to include `--umi_read_structure [structure]` in your parameters.
maxulysse marked this conversation as resolved.
Show resolved Hide resolved

This will enable pre-processing of the reads and UMI consensus reads calling, which will then be used to continue the workflow from the mapping steps. For post-UMI processing depending on the experimental setup, duplicate marking and base quality recalibration can be skipped with [`--skip_tools`].
maxulysse marked this conversation as resolved.
Show resolved Hide resolved

### UMI Read Structure

This parameter is a string, which follows a [convention](https://github.com/fulcrumgenomics/fgbio/wiki/Read-Structures) to describe the structure of the umi.
If your reads contain a UMI only on one end, the string should only represent one structure (i.e. "2M11S+T"); should your reads contain a UMI on both ends, the string will contain two structures separated by a blank space (i.e. "2M11S+T 2M11S+T").

### Limitations and future updates

Recent updates to Samtools have been introduced, which can speed-up performance of fgbio tools used in this workflow.
The current workflow does not handle duplex UMIs (i.e. where opposite strands of a duplex molecule have been tagged with a different UMI), and best practices have been proposed to process this type of data.
Both changes will be implemented in a future release.

## MultiQC related issues

### Plots for SnpEff are missing

When plots are missing, it is possible that the fasta and the custom SnpEff database are not matching https://pcingola.github.io/SnpEff/se_faq/#error_chromosome_not_found-details.
The SnpEff completes without throwing an error causing nextflow to complete successfully. An indication for the error are these lines in the `.command` files:

```
ERRORS: Some errors were detected
Error type Number of errors
ERROR_CHROMOSOME_NOT_FOUND 17522411
```

## How to set sarek up to use sentieon

Sarek 3.0 is currently not supporting sentieon. It is planned for the upcoming release 3.1. In the meantime, please revert to the last release 2.7.2.
2 changes: 1 addition & 1 deletion workflows/sarek.nf
Original file line number Diff line number Diff line change
Expand Up @@ -1234,7 +1234,7 @@ def extract_csv(csv_file) {
System.exit(1)
}
} else {
log.warn "Missing or unknown field in csv file header. Please check your samplesheet"
log.error "Missing or unknown field in csv file header. Please check your samplesheet"
System.exit(1)
}
}
Expand Down