Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update pipeline overview #1031

Merged
merged 7 commits into from
May 22, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#1014](https://github.com/nf-core/sarek/pull/1014) - `snpeff_db` is now only the `db` version and not `genome.db`
- [#1015](https://github.com/nf-core/sarek/pull/1015) - Increase default value for `--nucleotides_per_second` to `200000` resulting in 21 groups for `GATK.GRCh38`
- [#1019](https://github.com/nf-core/sarek/pull/1019) - Set a default registry outside of profile scope
- [#1031](https://github.com/nf-core/sarek/pull/1031) - Update pipeline summary

### Fixed

Expand Down
31 changes: 22 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,15 +31,28 @@ It's listed on [Elixir - Tools and Data Services Registry](https://bio.tools/nf-

## Pipeline summary

By default, the pipeline currently performs the following:

- Sequencing quality control (`FastQC`)
- Map Reads to Reference (`BWA mem`)
- Mark Duplicates (`GATK MarkDuplicates`)
- Base (Quality Score) Recalibration (`GATK BaseRecalibrator`, `GATK ApplyBQSR`)
- Preprocessing quality control (`samtools stats`)
- Preprocessing quality control (`mosdepth`)
- Overall pipeline run summaries (`MultiQC`)
Depending on the options and samples provided, the pipeline can currently perform the following:

- Form consensus reads from UMI sequences (`fgbio`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not donw by default

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to include the required options here or shall we link to the usage docs section?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't really have an opinion :D Just because at the top it says: run by default. I think the phrasing should fit together

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. I'll update the top line.

- Sequencing quality control and trimming (`FastQC`, `fastp`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trimming is also not enalbed by default

- Map Reads to Reference (`BWA-mem` or `BWA-mem2` or `dragmap`)
- Process BAM file (`GATK MarkDuplicates`, `GATK BaseRecalibrator`, `GATK ApplyBQSR`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRAM files after MD

- Summarise alignment statistics (`samtools stats`, `mosdepth`)
- Variant calling (enabled by `--tools`, see [compatibility](https://github.com/nf-core/sarek/blob/master/docs/usage.md#which-variant-calling-tool-is-implemented-for-which-data-type)):
- `HaplotypeCaller`
- `freebayes`
- `mpileup`
- `Strelka2`
- `DeepVariant`
- `Mutect2`
- `Manta`
- `TIDDIT`
- `ASCAT`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also lower case similar to the other names

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update so they all use the offical name format.

- `Control-FREEC`
- `CNVkit`
- `MSIsensor-pro`
- Variant filtering and annotation (`SnpEff`, `Ensembl VEP`)
- Summarise and represent QC (`MultiQC`)

<p align="center">
<img title="Sarek Workflow" src="docs/images/sarek_subway.png" width=60%>
Expand Down