install.packages("devtools")
devtools::install_github("hilgers-lab/LATER", build = TRUE, build_vignettes = TRUE)
The vignette contains some examples and interpretation of the results of the analysis
library(LATER)
vignette("LATER")
LATER estimates transcriptional biases in APA using long read sequencing data
- Genome Alignment bam files minimap2 using parameters
minimap2 -ax splice -u f annotation/genome.fa long_read.fastq.gz | samtools sort -@ 4 -o output.bam - samtools index output.bam
- Reference annotation in gtf format. Example file here
First, a database of 5'-3' isoforms is created based on the reference annotation provided. Combinations are computed based on isoform sets TSS and PA sites are merged in a window. This outputs a dataframe with the classification of genes by their TSS and PA site status
ref_path <- system.file("exdata/dmel_reference_annotation.gtf.gz",
package = 'LATER')
reference_annotation <- rtracklayer::import.gff(ref_path)
protein_coding_exons <- reference_annotation[reference_annotation$type == "exon" & reference_annotation$gene_biotype == "protein_coding"]
protein_coding_genes <- reference_annotation[reference_annotation$type == "gene" & reference_annotation$gene_biotype == "protein_coding"]
# Prepare isoformData object
isoformData <- prepareIsoformDatabase(protein_coding_exons,
tss.window = 50,
tes.window = 150)
To account for accurate quantification we develop a counter for long read sequencing data. Aligned reads to the genome are trimmed to their most 5' and 3' end keeping the read identity only reads mapping to both TSS and PA site in the reference, are considered for the analisys. Reads are then summarized in counts per million for further processing.
bamPath <- system.file("exdata/testBam.bam", package = 'LATER')
countData <- countLinks(bamPath, isoformData)
Promoter dominance estimates are calculated as perfomed in (Alfonso-Gonzalez, et al., 2022). This function outputs per promoter biases in expression of a given 3'end of the gene.
Transcriptional biases are calculated by estimating using the joint frequencies of TSS-PA site combinations per gene. Coupling events per gene are estimated using multinomial testing using chi-square. Statistical testing is also available with fisher.test()
using method="fisher".
gene_bias_estimates <- estimatePromoterDominance(countData, isoformData, method="fisher")
Release 0.1.2.1
Release date: 9th Dec 2024
Developer Carlos Alfonso-Gonzalez. For questions or feedback you can contact: alfonso@ie-freiburg.mpg.de