Skip to content

Commit

Permalink
Add placeholder sphinx documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Redmar-van-den-Berg committed Oct 30, 2023
1 parent bb997f9 commit 051a36e
Show file tree
Hide file tree
Showing 9 changed files with 233 additions and 0 deletions.
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?= -W
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
30 changes: 30 additions & 0 deletions docs/fusion.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# fusion

The `fusion` module uses [Arriba](https://github.com/suhrig/arriba) to call fusion events.

## Tools
This module uses the bam file from [STAR](https://github.com/alexdobin/STAR) to
call fusion events.

The fusion events are filtered based on the `blacklist` from Arriba itself. Also, only fusions where at least one of the involved genes is in `report_genes` will be included in the final output.

For each fusion event that remains after filtering, we also generate a figure using the `draw_fusions.R` script provided by Arriba.

## Input
The input for this module is a single bam file, generated by STAR per sample, specified in a PEP configuration file, as is shown [here](../test/pep/chrM-bam.csv).

## Output
The output of this module are a JSON file with an overview of the most important results, as well as a number of other output files:
- The final Arriba output file, after filtering.
- One figure per fusion event

## configuration
| Option | description | required |
| --------------------------- | --------------------------------------- | -------- |
| `genome_fasta` | Reference genome, in FASTA format | yes |
| `gtf` | GTF file with transcript information | yes |
| `blacklist` | File of blacklisted variants | yes |
| `known_fusions` | A file of known fusion events | yes |
| `report_genes` | A file of genes to report fusions for | yes |
| `cytobands` | A file with cytoband information | yes |
| `protein_domains` | A file with protein domains | yes |
28 changes: 28 additions & 0 deletions docs/itd.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# itd

The `itd` module is responsible for finding Internal Tandem Duplications in select genes, specifically *FLT3* and *KMT2A*.

## Tools
First, this module uses [bwa]() to align the trimmed reads to a custom reference, which contains the transcript sequence of *FLT3* and *KMT2A*. Next, a custom tool, [rose-dt](https://git.lumc.nl/hem/rose-dt),
is used to detect and visualise Internal Tandem Duplications, using evindence from soft-clipped reads.

## Input
The input for this module is a single pair of FastQ files per sample, specified in a PEP configuration file, as is shown [here](../test/pep/itd.csv).

## Output
The output of this module are a JSON file with an overview of the most important results, as well as a number of other output files:
- For both *FLT3 and *KMT2A*, a .csv file with the detected tandem duplications.
- For both *FLT3* and *KMT2A*, a figure to visualise the detected tandem duplications.

## configuration
The configuration for this module is tailored to the provided reference files, be very careful if you want to modify any of these settings.

| Option | description | required |
| --------------------------- | --------------------------------------------- | -------- |
| `fasta` | The fasta file, which contains FLT3 and KMT2A | yes |
| `flt3_name` | The name of the FLT3 sequence | yes |
| `flt3_start` | The start of the FLT3 region to investigate | yes |
| `flt3_end` | The end of the FLT3 region to investigate | yes |
| `kmt2a_name` | The name of the KMT2A sequence | yes |
| `kmt2a_start` | The start of the KMT2A region to investigate | yes |
| `kmt2a_end` | The end of the KMT2A region to investigate | yes |
35 changes: 35 additions & 0 deletions docs/make.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=build

if "%1" == "" goto help

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.http://sphinx-doc.org/
exit /b 1
)

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%

:end
popd
25 changes: 25 additions & 0 deletions docs/qc-seq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# qc-seq

The `qc-seq` module is responsible for removing adapter sequences and low
quality reads, and generating read-level statistics. It also merges the FastQ
files per sample, so they can be used by the other modules. Every set of FastQ
files can be analysed in parallel.

## Tools
This module uses [cutadapt](https://cutadapt.readthedocs.io/en/stable/) to remove adapter sequences and low quality bases.
[FastQC](https://cutadapt.readthedocs.io/en/stable/) is used to generate detailed quality statistics.

## Input
The input for this module is one or more pairs of FastQ files per sample, specified in a PEP configuration file, as is shown [here](../test/pep/chrM-trio-subsamples.csv).

## Output
The output of this module are one set of merged FastQ files per sample, as well as a JSON file with statistics.

## configuration
The only configurable option for this module is adapter sequences for
[cutadapt](https://cutadapt.readthedocs.io/en/stable/) to remove.

| Option | description | required |
| --------------- | ---------------------------- | -------- |
| `forward_adapter` | The forward adapter sequence | yes |
| `reverse_adapter` | The reverse adapter sequence | yes |
47 changes: 47 additions & 0 deletions docs/snv-indels.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# snv-indels

The `snv-indels` module is responsible for aligning the reads to the reference, and calling SNVs and insertions/deletion.

## Tools
This module uses [STAR](https://github.com/alexdobin/STAR) to align the reads to the reference using twopass mode.[VarDict](https://github.com/AstraZeneca-NGS/VarDictJava) is used to call variants, which are annotated using [VEP](https://www.ensembl.org/info/docs/tools/vep/index.html).
For each variant, this module determines if it is located inside one of the defined `bed_variant_hotspots`.

The variants annotated by VEP are then filtered based on a number of different criteria:
1. Variants that are present on the `blacklist` are excluded.
2. Only variants that are present on one of the specified transcripts in
`ref_id_mapping` are included.
3. Only variants that match one of the consequences defined in
`vep_include_consequence` are included.
4. Variant that have a population frequency of more than 1% in the `gnomADe`
population are excluded.

Picard is used to generate various alignment statistics.

## Input
The input for this module is a single pair of FastQ files per sample, specified in a PEP configuration file, as is shown [here](../test/pep/targetted.csv).

## Output
The output of this module are a JSON file with an overview of the most important results, as well as a number of other output files:
- A .bam and .bai per sample, which contain the aligned reads.
- A VEP output file (`vep_high`), which contains the final set of filtered variants.
- A VEP output file (`vep_target`), which contains the variants on the transcripts of interest. These variants have not been filtered on `vep_include_consequence` terms.
- A VCF file that only contains those variants that fall in one of the `bed_variant_hotspots` regions.

## configuration

| Option | description | required |
| --------------------------- | --------------------------------------- | -------- |
| `forward_adapter` | The forward adapter sequence | yes |
| `reverse_adapter` | The reverse adapter sequence | yes |
| `genome_fasta` | Reference genome, in FASTA format | yes |
| `genome_fai` | .fai index for the reference | yes |
| `genome_dict` | .dict index for the reference | yes |
| `star_index` | STAR index database | yes |
| `ref_id_mapping` | File of transcripts of interest | yes |
| `rrna_refflat` | File of rRNA transcripts | yes |
| `bed_variant_hotspots` | BED file of hotspot regions | yes |
| `bed_variant_call_regions` | BED file of regions to call variants | yes |
| `gtf` | GTF file with transcripts, used by STAR | yes |
| `annotation_refflat` | File used to determine exon coverage | yes |
| `blacklist` | File of blacklisted variants | yes |
| `vep_include_consequence` | List of [VEP consequences](http://www.ensembl.org/info/genome/variation/prediction/predicted_data.html) to include | yes |
35 changes: 35 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Configuration file for the Sphinx documentation builder.

# -- Project information

project = 'HAMLET'
copyright = '2018, LUMC'
author = 'Wibowo Arindrarto, Redmar van den Berg'

release = '2.0'
version = '2.0.0'

# -- General configuration

extensions = [
'sphinx.ext.duration',
'sphinx.ext.autodoc',
'sphinx.ext.autosummary',
'sphinx.ext.intersphinx',
]

intersphinx_mapping = {
'python': ('https://docs.python.org/3/', None),
'sphinx': ('https://www.sphinx-doc.org/en/master/', None),
}
intersphinx_disabled_domains = ['std']

templates_path = ['_templates']
master_doc = 'index'

# -- Options for HTML output

html_theme = 'sphinx_rtd_theme'

# -- Options for EPUB output
epub_show_urls = 'footnote'
5 changes: 5 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Welcome to the documentation for HAMLET
================================================

This is currently a placeholder. You can see the full documentation
on the `HAMLET github page<https://github.com/LUMC/HAMLET/tree/d75f27ef249b1018fa3a2ad8c513bd8fecf3592b>`.
8 changes: 8 additions & 0 deletions test/test_docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Test generating the report in html format
- name: test-docs
tags:
- hamlet
- docs
command: make -C docs/ html
files:
- path: docs/build/html/genindex.html

0 comments on commit 051a36e

Please sign in to comment.