DropseqTools is needed. See Dropseq Tools Drop-seq GitHub Pages.
You also need to downlaod our scripts used in the pipeline to your local machine.
Generate annotations used for DropseqTools, convert gtf file annotation to non-overlapping exon and intron bed files.
sh prepare_annotation_files.sh
## change the directories and parameters in the config.yaml file to your own settings
## move to the working folder, load the config.yaml, samples.tsv and snake files there.
## create subfoler 'output': the path for all the ourput files generated by this pipeline
The data processing pipeline includes the main Drop-seq pipeline, annotating the unique feature(gene) the reads mapped to, annotating the splice status, annotating mismatches, merging reads from the same UMI, and identifing true mismatches from sequencing errors.
snakemake --cores N -s Snakefile_data_processing
This R script only needs to be run once using the no-4sU and 24h-4sU samples. Once you have identified the background conversion rate and 4sU incorporation rate in your experiment, you can omit this step.
The background conversions are identified from control samples (no 4sU). Positions with high confident conversions are removed from all samples when quantify the mismatches for each molecule.
The 4sU incorporation rate is estimated from the long time 4sU labeled samples (4sU 24 hours) where all unspliced molecules are assumed to be newly synthesized from the labeling experiment on-set.
/path_to_Rscript/Rscript background_conversions_incoporation_rate.R
The newly synthesized molecules and the pre-exsiting ones are identified by ultilzing Bayesian inference.
Lastly, gene expression count matrices of four types of RNA molecules (labeled mature, unlabeled mature, labeled precursor and unlabeled precursor) were obtained.
snakemake --cores N -s Snakefile_quantification