Nextflow pipeline to detect matched BAMs with NGSCheckMate.
Implementation of NGSCheckMate and its underlying subset calling, distibuted per sample.
- Nextflow : for common installation procedures see the IARC-nf repository.
- NGSCheckMate (follow instructions, especially setting up
$NCM_HOME
variable) - samtools
- bcftools
Additionally, the graph output option requires R; see details below about this option.
Type | Description |
---|---|
--input | your input BAM file(s) (do not forget the quotes e.g. --input "test_*.bam" ). Warning : your BAM file(s) must be indexed, and the test_*.bai should be in the same folder. |
--input_folder | Folder with BAM files |
--input_file | Input file (comma-separated) with 3 columns: ID (individual ID), suffix (suffix for sample names; e.g. RNA), and bam (path to bam file). |
A nextflow.config is also included, please modify it for suitability outside our pre-configured clusters (see Nexflow configuration).
Note that the input_file format is tab-delimited text file; this file is used both to provide input bam file locations but also for the generation of the graphs. The ID field must be unique to a subject (e.g. both tumor and normal samples from the same individual must have the same individual identifier). The bam field must be unique to a file name. For example, the following is a valid file:
ID suffix bam
NA06984 _RNA NA06984_T_transcriptome.bam
NA06984 _WGS NA06984_T_genome.bam
Name | Example value | Description |
---|---|---|
--output_folder | results | the folder that will contain NGSCheckMate folder with all results in text files. |
--ref | ref.fasta | your reference in FASTA |
--bed | SNP_GRCh38.bed | Panel of SNP bed file from NGSCheckMate |
Note that a bed file SNP_GRCh38.bed is provided, which is a liftOver of the files at https://github.com/parklab/NGSCheckMate/tree/master/SNP. To use other references, you can provide your own bedfile.
Name | Default value | Description |
---|---|---|
--mem | 16 | Memory requested (in GB) for calling and NGSCheckmate run |
--cpu | 4 | Number of threads for germline calling |
--bai_ext | .bam.bai | Extenstion of bai files |
nextflow run NGSCheckMate-nf/ -r v1.1 -profile singularity --ref ref.fasta --input_folder BAM/
To run the pipeline without singularity just remove "-profile singularity". Alternatively, one can run the pipeline using a docker container (-profile docker) the conda receipe containing all required dependencies (-profile conda).
Type | Description |
---|---|
vcfs | a folder with the vcfs used for the matching |
NCM_output/output*.txt | NGSCheckmate output files with matches between files (see https://github.com/parklab/NGSCheckMate) |
NCM_output/output.pdf | hierarchical clustering plot from https://github.com/parklab/NGSCheckMate |
NCM_output/NCM_graph_wrongmatch.xgmml | graph with only the samples without a match (adapted from https://github.com/parklab/NGSCheckMate/blob/master/graph/ngscheckmate2xgmml.R) |
NCM_output/NCM_graph.xgmml | graph with all samples (adapted from https://github.com/parklab/NGSCheckMate/blob/master/graph/ngscheckmate2xgmml.R) |
Note that we recommend Cytoscape to visualize the .xgmml graphs.
nextflow run iarcbioinfo/NGSCheckMate -profile cobalt --input "/data/test_*.bam" --output_dir /data/cohort_output --ref_fasta /ref/Homo_sapiens_assembly38.fasta --bed /home/user/bin/NGSCheckMate/SNP/SNP_GRCh38.bed
be careful that if bai files are missing for some bam files, the bam files will be ignored without the workflow returning an error
We provide a modified version of the graph/ngscheckmate2xgmml.R R script from https://github.com/parklab/NGSCheckMate to output graphs in .xgmml format. The modifications allow to represent all samples, even those that match, and improve a small glitch in the color palette.
Name | Description | |
---|---|---|
Nicolas Alcala* | AlcalaN@iarc.fr | Developer to contact for support |
Maxime Vallée | Developer |