Skip to content

WIKI single‐cell‐DNA

reJELIN edited this page Sep 4, 2023 · 2 revisions

Welcome to the single-cell-DNA wiki!

Pipeline Goal:

Perform single-cell DNA-seq analysis from FastQ files to figures file for missionbio tapestri data.

Steps available:

  • Alignment
  • Preprocessing (filtering bad quality variants, CNV and cells)
  • SNV_CNV (Normalization dimension Reduction and clustering)
  • PROTEIN (Normalization dimension Reduction and clustering)
  • ALL (Combining DNA-seq analysis & Proteomic analysis)
  • Phylogeny (reconstruction of mutations events)

Usage


Usage on Flamingo, the GR's computing cluster

❗ if you already used the single-cell RNA-seq pipeline it is identical

  • make the parameters file according to your needs (see below how to configure the parameter file)
  • indicate the path to this file in the path_to_configfile variable
  • run the snakemake command
module load singularity

path_to_configfile="<path/to/your_configfile.yaml>"
path_to_pipeline="<path/to/single-cell-dna-seq>"

snakemake --profile ${path_to_pipeline}/profiles/local -s ${path_to_pipeline}/Snakefile --configfile ${path_to_configfile}

Configuration


1. steps & alignment: choose the steps to run

name description example default value possible value
steps steps to run [Aligment,preprocessing,SNV_CNV,ALL,phylogeny] NA Aligment,preprocessing,SNV_CNV,PROTEIN,ALL,phylogeny
tmp temporary directory /tmp NA NA
sample sample(s) to run [sample_1,sample_2] NA NA
reference_genome_path path of the reference genome "/mnt/beegfs/pipelines/single-cell_dna/tapestri_database/v2/hg19/ucsc_hg19.fa" "/mnt/beegfs/pipelines/single-cell_dna/tapestri_database/v2/hg19/ucsc_hg19.fa" "/mnt/beegfs/pipelines/single-cell_dna/tapestri_database/v2/hg19/ucsc_hg19.fa","/mnt/beegfs/pipelines/single-cell_dna/tapestri_database/v3/hg19/ucsc_hg19.fa"
reference_genome reference genome release "hg19" "hg19" "hg19"
type_analysis select your analysis dna or dna+protein" "dna+protein" NA "dna","dna+protein"
panel_path path of your panel of variants "</your/path/to/panel/file/location>" "/mnt/beegfs/pipelines/single-cell_dna/tapestri_database/v2/panels/Myeloid" "/mnt/beegfs/pipelines/single-cell_dna/tapestri_database/v2/panels/Myeloid","<your/path/panel/location>"
panel_protein_path path of the reference fasta for protein "/mnt//beegfs/pipelines/single-cell_dna/tapestri_database/v2/panels/protein" "/mnt//beegfs/pipelines/single-cell_dna/tapestri_database/v2/panels/protein" "/mnt/beegfs/pipelines/single-cell_dna/tapestri_database/v2/panels/protein/","/mnt/beegfs/pipelines/single-cell_dna/tapestri_database/v3/panels/protein/"
design_file path to design file in order to create proper yaml file for aligment "/your/path/to/panel/file/location" NA NA

2.filtering: filtering - remove bad quality variants & preprocess your data

name description example default value possible value
filter_na filtering Missing Value True False True/False
filter_na_percent remove variants which missing value are superior or equal to the threshold 35 25 any integer
predict_missing_value KNN predict missing value variants True False True/false
filtering_variants multiple filter in order to remove bad quality variants & cells NA NA NA
max_vaf_percent filter variants which mean VAF value is superior or equal 95 95 any integer
whitelist variants that must be keep (even if their quality is poor) ["chr20:33868702:T/C"] NA NA

2.1. filtering: filtering_variants

name description example default value possible value
min_dp The minimum depth (DP) for the call to be considered 10 10 any integer
min_gq The minimum genotype quality (GQ) for the call to be considered 30 30 any integer
vaf_ref All reference calls (NGT = 0) with VAF > vaf_ref are converted to no calls (NGT = 3) 5 5 any integer
vaf_het All hetrozygous calls (NGT = 1) with VAF < vaf_het are converted to no calls (NGT = 3) 35 35 any integer
vaf_hom All homozygous calls (NGT = 2) with VAF < vaf_hom are converted to no calls (NGT = 3) 95 95 any integer
min_mut_prct_cells The minimum percent of the total cells in which the variant should be mutated, 1 1 any integer
min_prct_cells The minimum percent of total cells in which the variant should be present 50 50 any integer

3.SNV: snv_norm_dimred

name description example default value possible value
method_dimred select dimension reduction for variants matrix between pca pca fa,pca
max_dims maximum dimensions for the dimension reduction 6 6 any integer
clustering_method clustering method to use leiden dbscan graph-community,leiden,dbscan,hdbscan

3.CNV: cnv_norm_dimred

name description example default value possible value
max_dims maximum dimensions for the dimension reduction 6 6 any integer
clustering_method clustering method to use leiden dbscan graph-community,leiden,dbscan,hdbscan

4.PROTEIN: prot_norm_dimred

name description example default value possible value
normalization normalization method to correct noise DSB CLR CLR,DSB,asinh,NSP
clustering_method clustering method to use leiden dbscan graph-community,leiden,dbscan,hdbscan

5.ALL: all_norm_dimred

name description example default value possible value
snv SNV parameters to keep in order to combine multi-omics data NA NA NA
cnv CNV parameters to keep in order to combine multi-omics data NA NA NA
prot Protein parameters to keep in order to combine multi-omics data NA NA NA
variants_of_interest takes a list of variants of interest in order to label data ["EIF6:20:33868702:T:C","TP53:17:7577559:G:T"] NA
chr_of_interest list of chromsomes to focus on it ["5","17","7"] NA any list of number of chromosomes

5.1. all_norm_dimred - snv

name description example default value possible value
method_dimred reduction method to keep pca pca fa,pca
dims number of dimensions to keep 6 6 any integer
clustering_method clustering method to keep leiden dbscan graph-community,leiden,dbscan,hdbscan
res resolution for clustering to keep NA NA depend of the algorithm leiden and dbscan take float graph-community and hdbscan take integer
predict_missing_value boolean to predict missing value using KNN method True False True/False

5.2. all_norm_dimred - cnv

name description example default value possible value
method_dimred reduction method to keep pca pca pca
dims number of dimensions to keep 6 6 any integer
clustering_method clustering method to keep leiden dbscan graph-community,leiden,dbscan,hdbscan
res resolution for clustering to keep NA NA depend of the algorithm leiden and dbscan take float graph-community and hdbscan take integer

5.3. all_norm_dimred - prot

name description example default value possible value
normalization normalization method to keep CLR CLR CLR,DSB,asinh,NSP
method_dimred reduction method to keep pca pca fa,pca
dims number of dimensions to keep 6 6 any integer
clustering_method clustering method to keep leiden dbscan graph-community,leiden,dbscan,hdbscan
res resolution for clustering to keep NA NA depend of the algorithm leiden and dbscan take float graph-community and hdbscan take integer

6. phylogeny

name description example default value possible value
phylogeny_method list of method to use for mutations events reconstruction ["COMPASS","infSCITE"] NA COMPASS,infSCITE,BiTSC2

6.1 phylogeny - COMPASS

name description example default value possible value
bool_cnv add CNV in the reconstruction mutations events 1 0 0,1

What's coming next ?


  • infSCITE and BiTSC2 are not implemented yet but they will be added soon
  • Currently the version of mosaic used is 2.4.1, it will be updated to the 3.0.1

Questions


Don't hesitate to contact the bioinformatic plateform at bigr@gustaveroussy.fr or Remy.JELIN@gustaveroussy.fr if you have any questions/suggestion.