-
Notifications
You must be signed in to change notification settings - Fork 1
WIKI single‐cell‐DNA
reJELIN edited this page Sep 4, 2023
·
2 revisions
Welcome to the single-cell-DNA wiki!
Perform single-cell DNA-seq analysis from FastQ files to figures file for missionbio tapestri data.
- Alignment
- Preprocessing (filtering bad quality variants, CNV and cells)
- SNV_CNV (Normalization dimension Reduction and clustering)
- PROTEIN (Normalization dimension Reduction and clustering)
- ALL (Combining DNA-seq analysis & Proteomic analysis)
- Phylogeny (reconstruction of mutations events)
❗ if you already used the single-cell RNA-seq pipeline it is identical
- make the parameters file according to your needs (see below how to configure the parameter file)
- indicate the path to this file in the path_to_configfile variable
- run the snakemake command
module load singularity
path_to_configfile="<path/to/your_configfile.yaml>"
path_to_pipeline="<path/to/single-cell-dna-seq>"
snakemake --profile ${path_to_pipeline}/profiles/local -s ${path_to_pipeline}/Snakefile --configfile ${path_to_configfile}
name | description | example | default value | possible value |
---|---|---|---|---|
steps | steps to run | [Aligment,preprocessing,SNV_CNV,ALL,phylogeny] | NA | Aligment,preprocessing,SNV_CNV,PROTEIN,ALL,phylogeny |
tmp | temporary directory | /tmp | NA | NA |
sample | sample(s) to run | [sample_1,sample_2] | NA | NA |
reference_genome_path | path of the reference genome | "/mnt/beegfs/pipelines/single-cell_dna/tapestri_database/v2/hg19/ucsc_hg19.fa" | "/mnt/beegfs/pipelines/single-cell_dna/tapestri_database/v2/hg19/ucsc_hg19.fa" | "/mnt/beegfs/pipelines/single-cell_dna/tapestri_database/v2/hg19/ucsc_hg19.fa","/mnt/beegfs/pipelines/single-cell_dna/tapestri_database/v3/hg19/ucsc_hg19.fa" |
reference_genome | reference genome release | "hg19" | "hg19" | "hg19" |
type_analysis | select your analysis dna or dna+protein" | "dna+protein" | NA | "dna","dna+protein" |
panel_path | path of your panel of variants | "</your/path/to/panel/file/location>" | "/mnt/beegfs/pipelines/single-cell_dna/tapestri_database/v2/panels/Myeloid" | "/mnt/beegfs/pipelines/single-cell_dna/tapestri_database/v2/panels/Myeloid","<your/path/panel/location>" |
panel_protein_path | path of the reference fasta for protein | "/mnt//beegfs/pipelines/single-cell_dna/tapestri_database/v2/panels/protein" | "/mnt//beegfs/pipelines/single-cell_dna/tapestri_database/v2/panels/protein" | "/mnt/beegfs/pipelines/single-cell_dna/tapestri_database/v2/panels/protein/","/mnt/beegfs/pipelines/single-cell_dna/tapestri_database/v3/panels/protein/" |
design_file | path to design file in order to create proper yaml file for aligment | "/your/path/to/panel/file/location" | NA | NA |
name | description | example | default value | possible value |
---|---|---|---|---|
filter_na | filtering Missing Value | True | False | True/False |
filter_na_percent | remove variants which missing value are superior or equal to the threshold | 35 | 25 | any integer |
predict_missing_value | KNN predict missing value variants | True | False | True/false |
filtering_variants | multiple filter in order to remove bad quality variants & cells | NA | NA | NA |
max_vaf_percent | filter variants which mean VAF value is superior or equal | 95 | 95 | any integer |
whitelist | variants that must be keep (even if their quality is poor) | ["chr20:33868702:T/C"] | NA | NA |
name | description | example | default value | possible value |
---|---|---|---|---|
min_dp | The minimum depth (DP) for the call to be considered | 10 | 10 | any integer |
min_gq | The minimum genotype quality (GQ) for the call to be considered | 30 | 30 | any integer |
vaf_ref | All reference calls (NGT = 0) with VAF > vaf_ref are converted to no calls (NGT = 3) | 5 | 5 | any integer |
vaf_het | All hetrozygous calls (NGT = 1) with VAF < vaf_het are converted to no calls (NGT = 3) | 35 | 35 | any integer |
vaf_hom | All homozygous calls (NGT = 2) with VAF < vaf_hom are converted to no calls (NGT = 3) | 95 | 95 | any integer |
min_mut_prct_cells | The minimum percent of the total cells in which the variant should be mutated, | 1 | 1 | any integer |
min_prct_cells | The minimum percent of total cells in which the variant should be present | 50 | 50 | any integer |
name | description | example | default value | possible value |
---|---|---|---|---|
method_dimred | select dimension reduction for variants matrix between | pca | pca | fa,pca |
max_dims | maximum dimensions for the dimension reduction | 6 | 6 | any integer |
clustering_method | clustering method to use | leiden | dbscan | graph-community,leiden,dbscan,hdbscan |
name | description | example | default value | possible value |
---|---|---|---|---|
max_dims | maximum dimensions for the dimension reduction | 6 | 6 | any integer |
clustering_method | clustering method to use | leiden | dbscan | graph-community,leiden,dbscan,hdbscan |
name | description | example | default value | possible value |
---|---|---|---|---|
normalization | normalization method to correct noise | DSB | CLR | CLR,DSB,asinh,NSP |
clustering_method | clustering method to use | leiden | dbscan | graph-community,leiden,dbscan,hdbscan |
name | description | example | default value | possible value |
---|---|---|---|---|
snv | SNV parameters to keep in order to combine multi-omics data | NA | NA | NA |
cnv | CNV parameters to keep in order to combine multi-omics data | NA | NA | NA |
prot | Protein parameters to keep in order to combine multi-omics data | NA | NA | NA |
variants_of_interest | takes a list of variants of interest in order to label data | ["EIF6:20:33868702:T:C","TP53:17:7577559:G:T"] | NA | |
chr_of_interest | list of chromsomes to focus on it | ["5","17","7"] | NA | any list of number of chromosomes |
name | description | example | default value | possible value |
---|---|---|---|---|
method_dimred | reduction method to keep | pca | pca | fa,pca |
dims | number of dimensions to keep | 6 | 6 | any integer |
clustering_method | clustering method to keep | leiden | dbscan | graph-community,leiden,dbscan,hdbscan |
res | resolution for clustering to keep | NA | NA | depend of the algorithm leiden and dbscan take float graph-community and hdbscan take integer |
predict_missing_value | boolean to predict missing value using KNN method | True | False | True/False |
name | description | example | default value | possible value |
---|---|---|---|---|
method_dimred | reduction method to keep | pca | pca | pca |
dims | number of dimensions to keep | 6 | 6 | any integer |
clustering_method | clustering method to keep | leiden | dbscan | graph-community,leiden,dbscan,hdbscan |
res | resolution for clustering to keep | NA | NA | depend of the algorithm leiden and dbscan take float graph-community and hdbscan take integer |
name | description | example | default value | possible value |
---|---|---|---|---|
normalization | normalization method to keep | CLR | CLR | CLR,DSB,asinh,NSP |
method_dimred | reduction method to keep | pca | pca | fa,pca |
dims | number of dimensions to keep | 6 | 6 | any integer |
clustering_method | clustering method to keep | leiden | dbscan | graph-community,leiden,dbscan,hdbscan |
res | resolution for clustering to keep | NA | NA | depend of the algorithm leiden and dbscan take float graph-community and hdbscan take integer |
name | description | example | default value | possible value |
---|---|---|---|---|
phylogeny_method | list of method to use for mutations events reconstruction | ["COMPASS","infSCITE"] | NA | COMPASS,infSCITE,BiTSC2 |
name | description | example | default value | possible value |
---|---|---|---|---|
bool_cnv | add CNV in the reconstruction mutations events | 1 | 0 | 0,1 |
- infSCITE and BiTSC2 are not implemented yet but they will be added soon
- Currently the version of mosaic used is 2.4.1, it will be updated to the 3.0.1
Don't hesitate to contact the bioinformatic plateform at bigr@gustaveroussy.fr or Remy.JELIN@gustaveroussy.fr if you have any questions/suggestion.