Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT: Improve documentation for annotation cache #1248

Merged
merged 12 commits into from
Sep 28, 2023
6 changes: 4 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## dev

- [#1246](https://github.com/nf-core/sarek/pull/1246) - Back to dev

### Added

- [#1246](https://github.com/nf-core/sarek/pull/1246) - Back to dev

### Changed

- [#1248](https://github.com/nf-core/sarek/pull/1248) - Improve annotation-cache docs

### Fixed

- [#1247](https://github.com/nf-core/sarek/pull/1247) - FIX: Result paths for full size test to be correctly displayed on the website
Expand Down
26 changes: 14 additions & 12 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -840,17 +840,17 @@ For GATK.GRCh38 the links for each reference file and the corresponding processe

## How to customise SnpEff and VEP annotation

SNPeff and VEP require a large resource of files known as a cache.
SNPeff and VEP both require a large resource of files known as a cache.
These are folders composed of multiple gigabytes of files which need to be available for the software to properly function.
To use these, supply the parameters `--vep_cache` and/or `--snpeff_cache` with the locations to the root of the annotation cache folder for each tool.

### Specify the cache location

Params `--snpeff_cache` and `--vep_cache` are used to specify the locations to the root of the annotation cache folder.
The cache will be located within a subfolder with the path `${vep_species}/${vep_genome}_${vep_cache_version}` for VEP and `${snpeff_species}.${snpeff_version}` for SnpEff.
The cache will be located within a subfolder with the path `${snpeff_species}.${snpeff_version}` for SnpEff and `${vep_species}/${vep_genome}_${vep_cache_version}` for VEP.
If this directory is missing, Sarek will raise an error.

For example this is a typical folder structure for GRCh38 and WBCel235, with SNPeff cache version 105 and VEP cache version 110:
For example this is a typical folder structure for `GRCh38` and `WBCel235`, with SNPeff cache version 105 and VEP cache version 110:

```text
/data/
Expand All @@ -872,20 +872,20 @@ Both SnpEff and VEP will figure out internally the path towards the specific cac
By default all is specified in the [igenomes.config](https://github.com/nf-core/sarek/blob/master/conf/igenomes.config) file.
Explanation can be found for all params in the documentation:

- [snpeff_db](https://nf-co.re/sarek/latest/parameters#snpeff_db)
- [snpeff_genome](https://nf-co.re/sarek/latest/parameters#snpeff_genome)
- [vep_genome](https://nf-co.re/sarek/latest/parameters#vep_genome)
- [vep_species](https://nf-co.re/sarek/latest/parameters#vep_species)
- [vep_cache_version](https://nf-co.re/sarek/latest/parameters#vep_cache_version)
- [snpeff_db](https://nf-co.re/sarek/parameters#snpeff_db)
- [snpeff_genome](https://nf-co.re/sarek/parameters#snpeff_genome)
- [vep_genome](https://nf-co.re/sarek/parameters#vep_genome)
- [vep_species](https://nf-co.re/sarek/parameters#vep_species)
- [vep_cache_version](https://nf-co.re/sarek/parameters#vep_cache_version)

With the previous example of `GRCh38`, these are the values that were used for these params:

```bash
snpeff_db = '105'
snpeff_genome = 'GRCh38'
vep_cache_version = '110'
vep_genome = 'GRCh38'
vep_species = 'homo_sapiens'
vep_cache_version = '110'
```

### Usage recommendation with AWS iGenomes
Expand Down Expand Up @@ -931,11 +931,11 @@ nextflow run nf-core/sarek \
These params can be specified in a config file or in a profile using the params scope, or even in a json or a yaml file using the `-params-file` nextflow option.

Note: we recommend storing each annotation cache in a separate directory so each cache version is handled differently.
This may mean you have many similar directories but will dramatically reduce the storage burden on machines running the VEP or snpEff process.
This may mean you have many similar directories but will dramatically reduce the storage burden on machines running the SnpEff or VEP process.

### Use annotation-cache for SnpEff and VEP

[Annotation-cache](https://github.com/annotation-cache) is an open AWS registry resource that stores a mirror of some cache files on AWS S3 which can be used with Sarek.
[Annotation-cache](https://annotation-cache.github.io) is an open AWS registry resource that stores a mirror of some cache files on AWS S3 which can be used with Sarek.
It contains some genome builds which can be found by checking the contents of the S3 bucket.

SNPeff and VEP cache are stored at the following location on S3:
Expand All @@ -954,7 +954,9 @@ aws s3 --no-sign-request ls s3://annotation-cache/vep_cache/

Since both Snpeff and VEP are internally figuring the path towards the specific cache version / species, `annotation-cache` is using an extra set of keys to specify the species and genome build.

So if you are using this resource, please either use the `--use_annotation_cache_keys`, or point towards the specific species, genome and build matches the directory structure within the cache.
So if you are using this resource, please either set `--use_annotation_cache` to use the AWS annotation cache, or point towards your own cache folder structure matching the expected structure.
maxulysse marked this conversation as resolved.
Show resolved Hide resolved

Please refer to the [annotation-cache documentation](https://annotation-cache.github.io) for more details.

### Use Sarek to download cache and annotate in one go

Expand Down
24 changes: 8 additions & 16 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -406,16 +406,14 @@
"fa_icon": "fas fa-file",
"default": "s3://annotation-cache/vep_cache/",
"description": "Path to VEP cache.",
"help_text": "Path to VEP cache which should contain the relevant species, genome and build directories at the path ${vep_species}/${vep_genome}_${vep_cache_version}",
"hidden": true
"help_text": "Path to VEP cache which should contain the relevant species, genome and build directories at the path ${vep_species}/${vep_genome}_${vep_cache_version}"
},
"snpeff_cache": {
"type": "string",
"fa_icon": "fas fa-file",
"default": "s3://annotation-cache/snpeff_cache/",
"description": "Path to snpEff cache.",
"help_text": "Path to snpEff cache which should contain the relevant genome and build directory in the path ${snpeff_species}.${snpeff_version}",
"hidden": true
"help_text": "Path to snpEff cache which should contain the relevant genome and build directory in the path ${snpeff_species}.${snpeff_version}"
},
"vep_include_fasta": {
"type": "boolean",
Expand Down Expand Up @@ -514,13 +512,12 @@
"default": "--everything --filter_common --per_gene --total_length --offline --format vcf",
"fa_icon": "fas fa-toolbox",
"description": "Add an extra custom argument to VEP.",
"hidden": true,
"help_text": "Using this params you can add custom args to VEP."
},
"use_annotation_cache_keys": {
maxulysse marked this conversation as resolved.
Show resolved Hide resolved
"type": "boolean",
"fa_icon": "fas fa-toolbox",
"description": "Use annotation cache keys for snpeff_cache and vep_cache.",
"description": "Use annotation cache keys for snpeff_cache and vep_cache.\nOnly when using annotation-cache or a similar structure.\nSee [here](https://annotation-cache.github.io/) for more information.",
"hidden": true
},
"outdir_cache": {
Expand Down Expand Up @@ -720,36 +717,31 @@
"type": "string",
"fa_icon": "fas fa-database",
"description": "snpEff DB version.",
"help_text": "If you use AWS iGenomes, this has already been set for you appropriately.\nThis is used to specify the database to be use to annotate with.\nAlternatively databases' names can be listed with the `snpEff databases`.",
"hidden": true
"help_text": "If you use AWS iGenomes, this has already been set for you appropriately.\nThis is used to specify the database to be use to annotate with.\nAlternatively databases' names can be listed with the `snpEff databases`."
},
"snpeff_genome": {
"type": "string",
"fa_icon": "fas fa-microscope",
"description": "snpEff genome.",
"help_text": "If you use AWS iGenomes, this has already been set for you appropriately.\nThis is used to specify the genome when using the container with pre-downloaded cache.",
"hidden": true
"help_text": "If you use AWS iGenomes, this has already been set for you appropriately.\nThis is used to specify the genome when using the container with pre-downloaded cache."
},
"vep_genome": {
"type": "string",
"fa_icon": "fas fa-microscope",
"description": "VEP genome.",
"help_text": "If you use AWS iGenomes, this has already been set for you appropriately.\nThis is used to specify the genome when using the container with pre-downloaded cache.",
"hidden": true
"help_text": "If you use AWS iGenomes, this has already been set for you appropriately.\nThis is used to specify the genome when using the container with pre-downloaded cache."
},
"vep_species": {
"type": "string",
"fa_icon": "fas fa-microscope",
"description": "VEP species.",
"help_text": "If you use AWS iGenomes, this has already been set for you appropriately.\nAlternatively species listed in Ensembl Genomes caches can be used.",
"hidden": true
"help_text": "If you use AWS iGenomes, this has already been set for you appropriately.\nAlternatively species listed in Ensembl Genomes caches can be used."
},
"vep_cache_version": {
"type": "number",
"fa_icon": "fas fa-tag",
"description": "VEP cache version.",
"help_text": "If you use AWS iGenomes, this has already been set for you appropriately.\nAlternatively cache version can be use to specify the correct Ensembl Genomes version number as these differ from the concurrent Ensembl/VEP version numbers",
"hidden": true
"help_text": "If you use AWS iGenomes, this has already been set for you appropriately.\nAlternatively cache version can be use to specify the correct Ensembl Genomes version number as these differ from the concurrent Ensembl/VEP version numbers"
},
"save_reference": {
"type": "boolean",
Expand Down
32 changes: 26 additions & 6 deletions workflows/sarek.nf
Original file line number Diff line number Diff line change
Expand Up @@ -324,25 +324,45 @@ vep_species = params.vep_species ?: Channel.empty()

// Initialize files channels based on params, not defined within the params.genomes[params.genome] scope
if (params.snpeff_cache && params.tools && params.tools.contains("snpeff")) {
def snpeff_annotation_cache_key = params.use_annotation_cache_keys ? "${params.snpeff_genome}.${params.snpeff_db}/" : ""
if (params.snpeff_cache == "s3://annotation-cache/snpeff_cache") {
def snpeff_annotation_cache_key = "${params.snpeff_genome}.${params.snpeff_db}/"
} else {
def snpeff_annotation_cache_key = params.use_annotation_cache_keys ? "${params.snpeff_genome}.${params.snpeff_db}/" : ""
}
def snpeff_cache_dir = "${snpeff_annotation_cache_key}${params.snpeff_genome}.${params.snpeff_db}"
def snpeff_cache_path_full = file("$params.snpeff_cache/$snpeff_cache_dir", type: 'dir')
if ( !snpeff_cache_path_full.exists() || !snpeff_cache_path_full.isDirectory() ) {
error("Files within --snpeff_cache invalid. Make sure there is a directory named ${snpeff_cache_dir} in ${params.snpeff_cache}.\nhttps://nf-co.re/sarek/dev/usage#how-to-customise-snpeff-and-vep-annotation")
if (params.snpeff_cache == "s3://annotation-cache/snpeff_cache") {
error("This path is not available within annotation-cache. Please check https://annotation-cache.github.io/ to create a request for it.")
} else {
error("Files within --snpeff_cache invalid. Make sure there is a directory named ${snpeff_cache_dir} in ${params.snpeff_cache}.\nhttps://nf-co.re/sarek/dev/usage#how-to-customise-snpeff-and-vep-annotation")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the release we need to check all the error message point to the release docs. I think there are also a couple left in other error messages

}
}
snpeff_cache = Channel.fromPath(file("${params.snpeff_cache}/${snpeff_annotation_cache_key}"), checkIfExists: true).collect()
.map{ cache -> [ [ id:"${params.snpeff_genome}.${params.snpeff_db}" ], cache ] }
} else snpeff_cache = []
} else if (params.tools && params.tools.contains("snpeff") && !params.download_cache) {
error("No cache for SnpEff or automatic download of said cache has been detected.\nPlease refer to https://nf-co.re/sarek/docs/usage/#how-to-customise-snpeff-and-vep-annotation for more information.")
} else snpeff_cache = []

if (params.vep_cache && params.tools && params.tools.contains("vep")) {
def vep_annotation_cache_key = params.use_annotation_cache_keys ? "${params.vep_cache_version}_${params.vep_genome}/" : ""
if (params.vep_cache == "s3://annotation-cache/vep_cache") {
def vep_annotation_cache_key = "${params.vep_cache_version}_${params.vep_genome}/"
} else {
def vep_annotation_cache_key = params.use_annotation_cache_keys ? "${params.vep_cache_version}_${params.vep_genome}/" : ""
}
def vep_cache_dir = "${vep_annotation_cache_key}${params.vep_species}/${params.vep_cache_version}_${params.vep_genome}"
def vep_cache_path_full = file("$params.vep_cache/$vep_cache_dir", type: 'dir')
if ( !vep_cache_path_full.exists() || !vep_cache_path_full.isDirectory() ) {
error("Files within --vep_cache invalid. Make sure there is a directory named ${vep_cache_dir} in ${params.vep_cache}.\nhttps://nf-co.re/sarek/dev/usage#how-to-customise-snpeff-and-vep-annotation")
if (params.vep_cache == "s3://annotation-cache/vep_cache") {
error("This path is not available within annotation-cache. Please check https://annotation-cache.github.io/ to create a request for it.")
} else {
error("Files within --vep_cache invalid. Make sure there is a directory named ${vep_cache_dir} in ${params.vep_cache}.\nhttps://nf-co.re/sarek/dev/usage#how-to-customise-snpeff-and-vep-annotation")
}
}
vep_cache = Channel.fromPath(file("${params.vep_cache}/${vep_annotation_cache_key}"), checkIfExists: true).collect()
} else vep_cache = []
} else if (params.tools && params.tools.contains("vep") && !params.download_cache) {
error("No cache for VEP or automatic download of said cache has been detected.\nPlease refer to https://nf-co.re/sarek/docs/usage/#how-to-customise-snpeff-and-vep-annotation for more information.")
} else vep_cache = []

vep_extra_files = []

Expand Down