Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor references #505

Merged
merged 55 commits into from
Dec 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
3332ebb
WIP: restructure reference building
rannick May 24, 2024
bf0c33e
refactor downloading of references
rannick May 28, 2024
a19abbd
fix params
rannick May 28, 2024
93fbfb8
fix
rannick May 28, 2024
b7dd137
update refs generation
rannick Jun 13, 2024
89be7c1
fix small issues
rannick Jun 13, 2024
d12312a
Merge branch 'dev' of https://github.com/nf-core/rnafusion into refac…
rannick Sep 12, 2024
0c6fe82
updates
rannick Sep 12, 2024
eb6e354
change strategy
rannick Sep 23, 2024
40fbd83
intermediate state
rannick Oct 7, 2024
7a88971
ensembl to gencode, removing chrgtf
rannick Oct 16, 2024
0b1a067
syntax
rannick Nov 13, 2024
590b638
merging
rannick Nov 13, 2024
f134b84
merging
rannick Nov 13, 2024
7a5a917
cleanup
rannick Nov 13, 2024
d18b303
update modules, remove chrgtf, adapt with meta/no meta
rannick Nov 14, 2024
786b9bf
remove unnecessary file
rannick Nov 14, 2024
460b645
add parameter interpretation in modules.config
rannick Nov 29, 2024
0c90acb
Merge branch 'dev' into refactor-references
rannick Nov 29, 2024
d290fcd
update fusioncatcher container
rannick Dec 6, 2024
19757b6
Merge branch 'refactor-references' of https://github.com/nf-core/rnaf…
rannick Dec 6, 2024
1a1314e
fix merge conflicts
rannick Dec 9, 2024
be6e102
fix linting issues
rannick Dec 9, 2024
41523f9
prettier
rannick Dec 9, 2024
86c299d
update comments
rannick Dec 9, 2024
4c3bc68
Merge branch 'dev' into refactor-references
rannick Dec 9, 2024
35b5d9c
fix i/o in channels
rannick Dec 10, 2024
cdea091
Merge branch 'refactor-references' of https://github.com/nf-core/rnaf…
rannick Dec 10, 2024
24a4610
test build removed from ci as build_references happens before the run
rannick Dec 10, 2024
76cf2aa
update snapshots, add meta
rannick Dec 10, 2024
56ed001
remove trash files
rannick Dec 10, 2024
23c4014
use stubs
rannick Dec 10, 2024
d52f0a0
update changelog, first step
rannick Dec 10, 2024
5409f8c
update GTF_TO_REFFLAT options, update snapshots
rannick Dec 10, 2024
ef5ab96
update star in snapshot
rannick Dec 10, 2024
117884c
add human_gencode_filter to starfusion build
rannick Dec 11, 2024
4fab8b6
Merge branch 'dev' into refactor-references
rannick Dec 11, 2024
3788788
remove test.xml
rannick Dec 11, 2024
dcd9e21
Merge branch 'refactor-references' of https://github.com/nf-core/rnaf…
rannick Dec 11, 2024
8e9be7c
update snap
rannick Dec 11, 2024
5cdf95a
try updating snapshot again
rannick Dec 11, 2024
fa128b6
avoid using meta for remapping
rannick Dec 17, 2024
f2bc314
fix merge conflicts
rannick Dec 17, 2024
17ef233
use custom container for fusioncatcher, fix typo
rannick Dec 17, 2024
f3f5714
fix merge conflicts
rannick Dec 19, 2024
f8dd516
update changelog
rannick Dec 19, 2024
d62c4b5
update changelog, retrofit trim_workflowo
rannick Dec 19, 2024
38de48b
fix merge conflicts
rannick Dec 20, 2024
de5d6d5
add species as parameter
rannick Dec 20, 2024
bdae293
fix some issues with channels
rannick Dec 20, 2024
509a792
add species to schema
rannick Dec 20, 2024
8776835
fix erroneous addition of meta
rannick Dec 20, 2024
60f0da6
Merge branch 'dev' into refactor-references
rannick Dec 20, 2024
ab65540
fixes
rannick Dec 20, 2024
31b400a
Merge branch 'dev' into refactor-references
rannick Dec 20, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 0 additions & 17 deletions .github/workflows/awsfulltest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,6 @@ jobs:
"cosmic_username": "${{ secrets.cosmic_username }}",
"cosmic_passwd": "${{ secrets.cosmic_passwd }}",
"all": true,
"build_references": true
}
profiles: test_full,aws_tower
- uses: actions/upload-artifact@v4
Expand All @@ -55,19 +54,3 @@ jobs:
path: |
seqera_platform_action_*.log
seqera_platform_action_*.json
- name: Launch run workflow via tower
uses: seqeralabs/action-tower-launch@v2
with:
workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }}
access_token: ${{ secrets.TOWER_ACCESS_TOKEN }}
compute_env: ${{ secrets.TOWER_COMPUTE_ENV }}
workdir: s3://${{ secrets.AWS_S3_BUCKET }}/work/rnafusion/work-${{ github.sha }}
parameters: |
{
"outdir": "s3://${{ secrets.AWS_S3_BUCKET }}/rnafusion/results-${{ github.sha }}",
"genomes_base": "s3://${{ secrets.AWS_S3_BUCKET }}/rnafusion/results-${{ github.sha }}/references",
"cosmic_username": "${{ secrets.cosmic_username }}",
"cosmic_passwd": "${{ secrets.cosmic_passwd }}",
"all": true,
}
profiles: test_full,aws_tower
27 changes: 1 addition & 26 deletions .github/workflows/awstest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,7 @@ jobs:
"cosmic_username": "${{ secrets.cosmic_username }}",
"cosmic_passwd": "${{ secrets.cosmic_passwd }}",
"all": true,
"stub": true,
"build_references": true
"stub": true
}
profiles: test,aws_tower
- uses: actions/upload-artifact@v4
Expand All @@ -36,27 +35,3 @@ jobs:
path: |
tower_action_*.log
tower_action_*.json

- name: Launch workflow via tower
uses: seqeralabs/action-tower-launch@v2
with:
workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }}
access_token: ${{ secrets.TOWER_ACCESS_TOKEN }}
compute_env: ${{ secrets.TOWER_COMPUTE_ENV }}
workdir: s3://${{ secrets.AWS_S3_BUCKET }}/work/rnafusion/work-${{ github.sha }}
parameters: |
{
"outdir": "s3://${{ secrets.AWS_S3_BUCKET }}/rnafusion/results-${{ github.sha }}",
"genomes_base": "s3://${{ secrets.AWS_S3_BUCKET }}/rnafusion/results-${{ github.sha }}/references",
"cosmic_username": "${{ secrets.cosmic_username }}",
"cosmic_passwd": "${{ secrets.cosmic_passwd }}",
"all": true,
"stub": true
}
profiles: test,aws_tower
- uses: actions/upload-artifact@v4
with:
name: Seqera Platform debug log file
path: |
seqera_platform_action_*.log
seqera_platform_action_*.json
1 change: 0 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,6 @@ jobs:
- "latest-stable"
test_profile:
- "test_stub"
- "test_build"
compute_profile:
- "docker"
- "singularity"
Expand Down
14 changes: 11 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Add nf-test to local module: `STARFUSION_BUILD`. [#585](https://github.com/nf-core/rnafusion/pull/585)
- Add nf-test to local module: `STARFUSION_DETECT`. [#586](https://github.com/nf-core/rnafusion/pull/586)
- Added a new module `CTATSPLICING_STARTOCANCERINTRONS` and a new parameter `--ctatsplicing`. This options creates reports on cancer splicing abberations and requires one or both of `--arriba` and `--starfusion` to be given. [#587](https://github.com/nf-core/rnafusion/pull/587)
- Add parameter `--references_only` when no data should be analysed, but only the references should be built [#505](https://github.com/nf-core/rnafusion/pull/505)

### Changed

Expand All @@ -34,6 +35,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Remove double nested folder introduced in [#577](https://github.com/nf-core/rnafusion/pull/577), [#581](https://github.com/nf-core/rnafusion/pull/581)
- Use docker.io and galaxy containers for fusioncatcher and starfusion (incl. fusioninspector) instead of wave as they are not functional on wave [#588](https://github.com/nf-core/rnafusion/pull/588)
- Update STAR-Fusion to 1.14 [#588](https://github.com/nf-core/rnafusion/pull/588)
- Use "-genePredExt -geneNameAsName2 -ignoreGroupsWithoutExons" (to mimic gms/tomte) for GTF_TO_REFFLAT [#505](https://github.com/nf-core/rnafusion/pull/505)
- Integrate reference building in the main workflow [#505](https://github.com/nf-core/rnafusion/pull/505)
- Move from ensembl to gencode base [#505](https://github.com/nf-core/rnafusion/pull/505)
- Update from ensembl 102 to gencode 46 default references [#505](https://github.com/nf-core/rnafusion/pull/505)

### Fixed

Expand All @@ -48,12 +53,15 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Removed

- Remove fusionGDB from documentation and fusion-report download stubs [#503](https://github.com/nf-core/rnafusion/pull/503)
- Removed test-build as reference building gets integrated in the main workflow [#505](https://github.com/nf-core/rnafusion/pull/505)
- Removed parameter `--build_references`

### Parameters

| Old parameter | New parameter |
| ------------- | ------------- |
| | `--no_cosmic` |
| Old parameter | New parameter |
| -------------------- | ------------------- |
| | `--no_cosmic` |
| `--build_references` | `--references_only` |

## v3.0.2 - [2024-04-10]

Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@

## Introduction

**nf-core/rnafusion** is a bioinformatics best-practice analysis pipeline for RNA sequencing consisting of several tools designed for detecting and visualizing fusion genes. Results from up to 5 fusion callers tools are created, and are also aggregated, most notably in a pdf visualiation document, a vcf data collection file, and html and tsv reports.
**nf-core/rnafusion** is a bioinformatics best-practice analysis pipeline for RNA sequencing consisting of several tools designed for detecting and visualizing fusion genes. Results from up to 5 fusion callers tools are created, and are also aggregated, most notably in a pdf visualisation document, a vcf data collection file, and html and tsv reports.

On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the [nf-core website](https://nf-co.re/rnafusion/results).

Expand All @@ -31,9 +31,9 @@ In rnafusion the full-sized test includes reference building and fusion detectio

### Build references

`--build_references` triggers a parallel workflow to build references, which is a prerequisite to running the pipeline:
`--references_only` triggers a workflow to ONLY build references, otherwise the references are build when the analysis is run:

1. Download ensembl fasta and gtf files
1. Download gencode fasta and gtf files
2. Create [STAR](https://github.com/alexdobin/STAR) index
3. Download [Arriba](https://github.com/suhrig/arriba) references
4. Download [FusionCatcher](https://github.com/ndaniel/fusioncatcher) references
Expand Down Expand Up @@ -78,7 +78,7 @@ First, build the references:
nextflow run nf-core/rnafusion \
-profile test,<docker/singularity/.../institute> \
--outdir <OUTDIR>\
--build_references \
--references_only \
-stub
```

Expand Down
74 changes: 42 additions & 32 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,6 @@ process {
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]

withName: GFFREAD {
ext.args = '-w -S'
}

withName: 'ARRIBA_ARRIBA' {
publishDir = [
path: { "${params.outdir}/arriba" },
Expand All @@ -40,7 +36,7 @@ process {
}

withName: 'ARRIBA_VISUALISATION' {
ext.when = { !params.fusioninspector_only && (params.starfusion || params.all) }
ext.when = { {!params.fusioninspector_only} && ({params.starfusion} || {params.all}) }
ext.prefix = { "${meta.id}_combined_fusions_arriba_visualisation" }
publishDir = [
path: { "${params.outdir}/arriba_visualisation" },
Expand Down Expand Up @@ -73,9 +69,9 @@ process {
]
}

withName: 'ENSEMBL_DOWNLOAD' {
withName: 'GENCODE_DOWNLOAD' {
publishDir = [
path: { "${params.genomes_base}/ensembl" },
path: { "${params.genomes_base}/gencode" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
Expand All @@ -87,7 +83,7 @@ process {

withName: 'FASTQC' {
ext.args = '--quiet'
ext.when = { !params.skip_qc }
ext.when = {!params.skip_qc}
publishDir = [
path: { "${params.outdir}/fastqc" },
mode: params.publish_dir_mode,
Expand All @@ -97,6 +93,7 @@ process {

withName: 'FASTQC_FOR_FASTP' {
ext.args = '--quiet'
ext.when = { !params.skip_qc }
ext.prefix = { "${meta.id}_trimmed" }
publishDir = [
path: { "${params.outdir}/fastqc_for_fastp" },
Expand All @@ -119,7 +116,7 @@ process {

withName: 'FUSIONINSPECTOR' {
ext.when = { !params.skip_vis }
ext.args = { params.fusioninspector_limitSjdbInsertNsj != 1000000 ? "--STAR_xtra_params \"--limitSjdbInsertNsj ${params.fusioninspector_limitSjdbInsertNsj}\"" : '' }
ext.args = { ${params.fusioninspector_limitSjdbInsertNsj} != 1000000 ? "--STAR_xtra_params \"--limitSjdbInsertNsj ${params.fusioninspector_limitSjdbInsertNsj}\"" : '' }
ext.args2 = '--annotate --examine_coding_effect'
}

Expand All @@ -146,12 +143,39 @@ process {

withName: 'GATK4_BEDTOINTERVALLIST' {
publishDir = [
path: { "${params.genomes_base}/ensembl" },
path: { "${params.genomes_base}/gencode" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
]
}

withName: 'GATK4_MARKDUPLICATES' {
ext.when = { {!params.skip_qc} && {!params.fusioninspector_only} && ( {params.starfusion}|| {params.all}) }
publishDir = [
path: { "${params.outdir}/picard" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}

withName: 'GFFREAD' {
ext.args = { '-w -S' }
publishDir = [
path: { "${params.genomes_base}/gffread" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}

withName: 'GTF_TO_REFFLAT' {
ext.args = "-genePredExt -geneNameAsName2 -ignoreGroupsWithoutExons"
publishDir = [
path: { "${params.genomes_base}/gencode" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}

withName: 'HGNC_DOWNLOAD' {
publishDir = [
path: { "${params.genomes_base}/hgnc" },
Expand All @@ -161,7 +185,7 @@ process {
}
withName: 'MULTIQC' {
ext.when = { !params.skip_qc }
ext.args = params.multiqc_title ? "--title \"$params.multiqc_title\"" : ''
ext.args = {params.multiqc_title} ? "--title \"$params.multiqc_title\"" : ''
publishDir = [
path: { "${params.outdir}/multiqc" },
mode: params.publish_dir_mode,
Expand All @@ -170,21 +194,12 @@ process {
}

withName: 'PICARD_COLLECTRNASEQMETRICS' {
ext.when = { !params.skip_qc && !params.fusioninspector_only && (params.starfusion || params.all) }
ext.when = { {!params.skip_qc} && {!params.fusioninspector_only} && ( {params.starfusion} || {params.all}) }

}

withName: 'GATK4_MARKDUPLICATES' {
ext.when = { !params.skip_qc && !params.fusioninspector_only && (params.starfusion || params.all) }
publishDir = [
path: { "${params.outdir}/picard" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}

withName: 'PICARD_COLLECTINSERTSIZEMETRICS' {
ext.when = { !params.skip_qc && !params.fusioninspector_only && (params.starfusion || params.all) }
ext.when = { ${!params.skip_qc} && ${!params.fusioninspector_only} && (${params.starfusion} || ${params.all}) }
ext.prefix = { "${meta.id}_collectinsertsize"}
publishDir = [
path: { "${params.outdir}/picard" },
Expand Down Expand Up @@ -215,7 +230,7 @@ process {

withName: 'SAMTOOLS_FAIDX' {
publishDir = [
path: { "${params.genomes_base}/ensembl" },
path: { "${params.genomes_base}/gencode" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
Expand Down Expand Up @@ -375,16 +390,11 @@ process {
]
}

withName: 'UCSC_GTFTOGENEPRED' {
ext.args = "-genePredExt -geneNameAsName2"
publishDir = [
path: { "${params.genomes_base}/ensembl" },
mode: params.publish_dir_mode,
saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
]
}

withName: 'VCF_COLLECT' {
ext.when = { {!params.fusioninspector_only} && {!params.skip_vcf} }
}

withName: '.*' {
ext.when = { !params.references_only || task.process.contains('BUILD_REFERENCES') }
}
}
2 changes: 2 additions & 0 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ params {

// Input data
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/rnafusion/testdata/human/samplesheet_valid.csv'
all = true
no_cosmic = true
}

// Limit and standardize resources for github actions and reproducibility
Expand Down
2 changes: 1 addition & 1 deletion conf/test_build.config
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ params {
config_profile_description = 'Minimal test dataset to check pipeline function'

// Input data
build_references = true
references_only = true
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/rnafusion/testdata/human/samplesheet_valid.csv'
no_cosmic = true
all = true
Expand Down
15 changes: 7 additions & 8 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ The pipeline is divided into two parts:

1. Download and build references

- specified with `--build_references` parameter
- specified with `--references_only` parameter
- required only once before running the pipeline
- **Important**: has to be run with each new release

Expand All @@ -32,7 +32,7 @@ The rnafusion pipeline needs references for the fusion detection tools, so downl
```bash
nextflow run nf-core/rnafusion \
-profile <docker/singularity/.../institute> \
--build_references --all \
--references_only --all \
--cosmic_username <EMAIL> --cosmic_passwd <PASSWORD> \
--genomes_base <PATH/TO/REFERENCES> \
--outdir <PATH/TO/REFERENCES>
Expand All @@ -43,7 +43,7 @@ References for each tools can also be downloaded separately with:
```bash
nextflow run nf-core/rnafusion \
-profile <docker/singularity/.../institute> \
--build_references --<tool1> --<tool2> ... \
--references_only --<tool1> --<tool2> ... \
--cosmic_username <EMAIL> --cosmic_passwd <PASSWORD> \
--genomes_base <PATH/TO/REFERENCES> \
--outdir <OUTPUT/PATH>
Expand All @@ -64,7 +64,7 @@ Use credentials from QIAGEN and add `--qiagen`
```bash
nextflow run nf-core/rnafusion \
-profile <docker/singularity/.../institute> \
--build_references --<tool1> --<tool2> ... \
--references_only --<tool1> --<tool2> ... \
--cosmic_username <EMAIL> --cosmic_passwd <PASSWORD> \
--genomes_base <PATH/TO/REFERENCES> \
--outdir <OUTPUT/PATH> --qiagen
Expand All @@ -81,7 +81,7 @@ If process `FUSIONREPORT_DOWNLOAD` times out, it could be due to network restric
```bash
nextflow run nf-core/rnafusion \
-profile <docker/singularity/.../institute> \
--build_references \
--references_only \
--cosmic_username <EMAIL> --cosmic_passwd <PASSWORD> \
--fusionreport \
--genomes_base <PATH/TO/REFERENCES> \
Expand All @@ -93,7 +93,7 @@ Where the custom configuration could look like (adaptation to local machine nece

```text
process {
withName: 'NFCORE_RNAFUSION:BUILD_REFERENCES:FUSIONREPORT_DOWNLOAD' {
withName: 'NFCORE_RNAFUSION:RNAFUSION:BUILD_REFERENCES:FUSIONREPORT_DOWNLOAD' {
memory = '8.GB'
cpus = 4
}
Expand Down Expand Up @@ -162,7 +162,7 @@ If you are not covered by the research COSMIC license and want to avoid using CO

> **IMPORTANT: Either `--all` or `--<tool>`** is necessary to run detection tools

`--genomes_base` should be the path to the directory containing the folder `references/` that was built with `--build_references`.
`--genomes_base` should be the path to the directory containing the folder `references/` that was built with `--references_only`.

Note that the pipeline will create the following files in your working directory:

Expand Down Expand Up @@ -397,7 +397,6 @@ If `-profile` is not specified, the pipeline will run locally and expect all sof
- `test`
- A profile with a complete configuration for automated testing
- Includes links to test data so needs no other parameters
- Needs to run in two steps: with `--build_references` first and then without `--build_references` to run the analysis
- !!!! Run with `-stub` as all references need to be downloaded otherwise !!!!

### `-resume`
Expand Down
Loading
Loading