GitHub - mckellardw/txg_snake: Snakemake workflow for the alignment, QC, and quantification of all types of 10x Genomics data

Flexible preprocessing, alignment, QC, and quantification workflow for 10x Genomics data (Chromium, Visium, & STRS)

The goal of this project is to build a workflow for assessing different alignment parameterizations, and digging into artifacts that arise from modifying 10x's chemistries. Contributions are welcome! See the companion workflow for Curio's SlideSeq (Seeker) here.

Dependencies & Sources:

cutadapt v4.1
fastqc v0.11.8
STAR v2.7.10b # Important!
kallisto v1.0.7
bustools v0.1.0.dev2
umi-tools v1.1.2
qualimap v2.2.a
vsearch v2.17.0
BLAST

Format for `sample_sheet`:

sampleID	fastq_R1	fastq_R2	chemistry	STAR_rRNA_ref	STAR_ref	genes_gtf	kb_idx	kb_t2g
sample1	/path/to/sample1_L001_R1.fastq.gz /path/to/sample1_L002_R1.fastq.gz	/path/to/sample1_L001_R2.fastq.gz /path/to/sample1_L002_R2.fastq.gz	Visium	/path/to/STAR_reference_rRNA	/path/to/STAR_reference	/path/to/annotations.gtf	/path/to/kallisto/transcriptome.idx	/path/to/kallisto/transcripts_to_genes.txt
sample2	/path/to/sample2_L001_R1.fastq.gz /path/to/sample2_L002_R1.fastq.gz	/path/to/sample2_L001_R2.fastq.gz /path/to/sample2_L002_R2.fastq.gz	STRS	/path/to/STAR_reference_rRNA	/path/to/STAR_reference	/path/to/annotations.gtf	/path/to/kallisto/transcriptome.idx	/path/to/kallisto/transcripts_to_genes.txt

Generating references:

rRNA STAR reference for in silico rRNA depletion/quantification

Ribosomal RNA (rRNA) molecules can make alignment/quantification very difficult because of the number of genomic copies of these genes. We added a first-pass-alignment just to rRNA sequences to enable stratified parameterization for these sequences, but maintain the ability to count and analyze them.

Check out scripts/GRCm39_GENCODEM31_STAR_rRNA.sh for an example script showing how to generate a rRNA-only STAR reference using GENCODE annotations.

Genomic STAR reference

This is a typical STAR reference that you would use for any other alignment job. Here is an example code snippet:

FASTA_GENOME="/path/to/GENCODE_M31/GRCm39.genome.fa"
GENES_DIR="/path/to//GENCODE_M31/gencode.vM31.annotation.gtf"

OUTDIR="/workdir/dwm269/genomes/mm39_all/STAR_GRCm39_GENCODEM31"

mkdir -p ${OUTDIR}
cd ${OUTDIR}

STAR \
--runThreadN 16 \
--runMode genomeGenerate \
--genomeDir ${OUTDIR} \
--genomeFastaFiles ${FASTA_DIR} \
--sjdbGTFfile ${GENES_DIR} \
--sjdbGTFfeatureExon exon

You can find the reference files on GENCODE's website

small/micro RNA analysis

miRge3.0

Tree of Outputs:

{SAMPLE_ID}/
├── log.cutadapt.json
├── postTrim_fastqc_R2
│   ├── {SAMPLE_ID}_R2_final_fastqc.html
│   └── {SAMPLE_ID}_R2_final_fastqc.zip
├── preTrim_fastqc_R1
│   ├── {SAMPLE_ID}_R1_fastqc.html
│   └── {SAMPLE_ID}_R1_fastqc.zip
├── preTrim_fastqc_R2
│   ├── {SAMPLE_ID}_R2_fastqc.html
│   └── {SAMPLE_ID}_R2_fastqc.zip
├── qualimap
│   ├── css
│   │   ├── agogo.css
│   │   ├── ajax-loader.gif
│   │   ├── basic.css
│   │   ├── bgfooter.png
│   │   ├── bgtop.png
│   │   ├── comment-bright.png
│   │   ├── comment-close.png
│   │   ├── comment.png
│   │   ├── doctools.js
│   │   ├── down.png
│   │   ├── down-pressed.png
│   │   ├── file.png
│   │   ├── jquery.js
│   │   ├── minus.png
│   │   ├── plus.png
│   │   ├── pygments.css
│   │   ├── qualimap_logo_small.png
│   │   ├── report.css
│   │   ├── searchtools.js
│   │   ├── underscore.js
│   │   ├── up.png
│   │   ├── up-pressed.png
│   │   └── websupport.js
│   ├── images_qualimapReport
│   │   ├── Coverage Profile Along Genes (High).png
│   │   ├── Coverage Profile Along Genes (Low).png
│   │   ├── Coverage Profile Along Genes (Total).png
│   │   ├── Junction Analysis.png
│   │   ├── Reads Genomic Origin.png
│   │   └── Transcript coverage histogram.png
│   ├── qualimapReport.html
│   ├── raw_data_qualimapReport
│   │   ├── coverage_profile_along_genes_(high).txt
│   │   ├── coverage_profile_along_genes_(low).txt
│   │   └── coverage_profile_along_genes_(total).txt
│   └── rnaseq_qc_results.txt
├── rRNA_filtered_fastqc
│   ├── {SAMPLE_ID}_R1_final_filtered_fastqc.html
│   ├── {SAMPLE_ID}_R1_final_filtered_fastqc.zip
│   ├── {SAMPLE_ID}_R2_final_filtered_fastqc.html
│   └── {SAMPLE_ID}_R2_final_filtered_fastqc.zip
├── STARsolo
│   ├── Aligned.sortedByCoord.out.bam
│   ├── Aligned.sortedByCoord.out.bam.bai
│   ├── Log.final.out
│   ├── Log.out
│   ├── Log.progress.out
│   ├── SJ.out.tab
│   ├── Solo.out
│   │   ├── Barcodes.stats
│   │   ├── Gene
│   │   │   ├── Features.stats
│   │   │   ├── filtered
│   │   │   │   ├── barcodes.tsv
│   │   │   │   ├── features.tsv
│   │   │   │   └── matrix.mtx
│   │   │   ├── raw
│   │   │   │   ├── barcodes.tsv
│   │   │   │   ├── features.tsv
│   │   │   │   ├── matrix.mtx
│   │   │   │   └── UniqueAndMult-EM.mtx
│   │   │   ├── Summary.csv
│   │   │   └── UMIperCellSorted.txt
│   │   ├── GeneFull
│   │   │   ├── Features.stats
│   │   │   ├── filtered
│   │   │   │   ├── barcodes.tsv
│   │   │   │   ├── features.tsv
│   │   │   │   └── matrix.mtx
│   │   │   ├── raw
│   │   │   │   ├── barcodes.tsv
│   │   │   │   ├── features.tsv
│   │   │   │   ├── matrix.mtx
│   │   │   │   └── UniqueAndMult-EM.mtx
│   │   │   ├── Summary.csv
│   │   │   └── UMIperCellSorted.txt
│   │   ├── SJ
│   │   │   ├── Features.stats
│   │   │   ├── raw
│   │   │   │   ├── barcodes.tsv
│   │   │   │   ├── features.tsv 
│   │   │   │   └── matrix.mtx
│   │   │   └── Summary.csv
│   │   └── Velocyto
│   │       ├── Features.stats
│   │       ├── filtered
│   │       │   ├── ambiguous.mtx
│   │       │   ├── barcodes.tsv
│   │       │   ├── features.tsv
│   │       │   ├── spliced.mtx
│   │       │   └── unspliced.mtx
│   │       ├── raw
│   │       │   ├── ambiguous.mtx
│   │       │   ├── barcodes.tsv
│   │       │   ├── features.tsv
│   │       │   ├── spliced.mtx
│   │       │   └── unspliced.mtx
│   │       └── Summary.csv
│   ├── Unmapped.out.mate1.fastq.gz
│   └── Unmapped.out.mate2.fastq.gz
├── STARsolo_rRNA
│   ├── Aligned.sortedByCoord.out.bam
│   ├── Aligned.sortedByCoord.out.bam.bai
│   ├── Log.final.out
│   ├── Log.out
│   ├── Log.progress.out
│   ├── SJ.out.tab
│   └── Solo.out
│       ├── Barcodes.stats
│       └── GeneFull
│           ├── Features.stats
│           ├── filtered
│           │   ├── barcodes.tsv
│           │   ├── features.tsv
│           │   └── matrix.mtx
│           ├── raw
│           │   ├── barcodes.tsv
│           │   ├── features.tsv
│           │   ├── matrix.mtx
│           │   └── UniqueAndMult-EM.mtx
│           ├── Summary.csv
│           └── UMIperCellSorted.txt
├── tmp
│   ├── {SAMPLE_ID}_R1_final_filtered.fq.gz
│   ├── {SAMPLE_ID}_R1_final.fq.gz
│   ├── {SAMPLE_ID}_R1.fq.gz
│   ├── {SAMPLE_ID}_R2_final_filtered.fq.gz
│   ├── {SAMPLE_ID}_R2_final.fq.gz
│   └── {SAMPLE_ID}_R2.fq.gz
└── Unmapped_fastqc
    ├── Unmapped.out.mate1_fastqc.html
    ├── Unmapped.out.mate1_fastqc.zip
    ├── Unmapped.out.mate2_fastqc.html
    └── Unmapped.out.mate2_fastqc.zip

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
resources		resources
rules		rules
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Snakefile		Snakefile
config.yaml		config.yaml
example_sample_sheet.csv		example_sample_sheet.csv
txg_snake_logo.png		txg_snake_logo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dependencies & Sources:

Format for `sample_sheet`:

Generating references:

rRNA STAR reference for in silico rRNA depletion/quantification

Genomic STAR reference

small/micro RNA analysis

miRge3.0

Tree of Outputs:

About

Releases

Packages

Languages

License

mckellardw/txg_snake

Folders and files

Latest commit

History

Repository files navigation

Dependencies & Sources:

Format for sample_sheet:

Generating references:

rRNA STAR reference for in silico rRNA depletion/quantification

Genomic STAR reference

small/micro RNA analysis

miRge3.0

Tree of Outputs:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Format for `sample_sheet`:

Packages