Skip to content

Various functions to inspect, extract, and visualise long-read RNA-seq data.

License

Notifications You must be signed in to change notification settings

eleni-chr/long-read-RNA-seq-plotting-suite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

long-read-RNA-seq-plotting-suite

DOI

Note that some functions may need modification to suit your particular data types or file types.

Contents

  • A. Functions to inspect data, extract data, and prepare for plotting
  • B. Functions to plot the data

A. Functions to inspect data, extract data, and prepare for plotting

fastq_headers This function checks if there are entries with identical Headers in a FASTQ file. Multiple FASTQ files are checked at once.

medianReadLengths This function calculates the median and rounded mean read length in each sample of sequenced reads.

featureCountsSummaryGalaxy This function extracts useful information from the results generated by running featureCounts online on Galaxy.

featureCountsSummary This function extracts useful information from the results generated by running featureCounts in Windows Command Line.

entrezToSymbol This function replaces the Entrez ID of a gene with its gene symbol, in the CSV files produced by running DESeq2 in R, using the featureCounts output when ran on Galaxy. It also replaces them in the TXT file containing gene counts, which is created by running featureCounts on Galaxy.

featureCountsGalaxyMerge This function merges the COUNTS data from multiple TXT files into a single TXT file (COUNTS data are obtained for each sample in a different TXT file, by running featureCounts on Galaxy).

getMappedIDs This function creates a TXT file containing the read IDs of the mapped reads only.

checkGeneID This function prints the gene IDs of a particular gene. Both the gene_id and the db_xref are printed.

appendFC This function caluclates Fold-change (FC) values from log2FoldChange (LFC) values in a CSV file outputted by DESeq2.

getGTFgeneID This function extracts the "gene_id" and "db_xref" fields from column 9 of a GTF file.

fastq_lengths This function calculates the length of each sequence in a FASTQ file. Multiple FASTQ files are checked at once.

fastq_seqs This function extracts the sequence of each entry in a FASTQ file. Multiple FASTQ files are checked at once.

header_qname_comp This function checks how many entries in a SAM file generated by guppy_aligner are NOT present in the origin FASTQ file. Multiple SAM files are checked at once.

lst2txt This function extracts the read IDs from an LST file and saves them in a TXT file in the current directory.

countflags This function displays which flags appear in a SAM file and how many times each flag appears. Multiple SAM files are checked at once.

countmapqs This function checks the mapping quality (MAPQ) of unmapped reads in a SAM file. Multiple SAM files are checked at once.

countqnames This function counts how many times each mapped read mapped to the reference genome.

astq_analysis This function extracts the Header of each entry in a FASTQ file. Multiple FASTQ files are checked at once.

flag_analysis This function gives information about the Flags for each entry in a SAM file. Multiple SAM files are checked at once.

getMappedLengths This function calculates the sequence length of mapped reads.

getMappedQuals This function extracts the basecall qualities of mapped reads.

getUnmappedLengths This function calcualtes the sequence length of unmapped reads.

getUnmappedQuals This function extracts the basecall qualities of unmapped reads.

mapq_analysis This function gives information about the Mapping Quality (MAPQ) for each entry in a SAM file. Multiple SAM files are checked at once.

qname_analysis This function gives information about the QNAME (Query Name) for each read in a SAM file. Multiple SAM files are checked at once. The QNAME should be identical to the Header in a FASTQ file.

B. Functions to plot the data

plotGeneCounts This function creates plots for the top 6 most upregulated and the top 6 most downregulated genes, using data generated by DESeq2. The plots show the number of times each gene is present in each sample, separated by group.

plotDEvolcano This function creates two volcano plots of the differentially expressed genes outputted by DESeq2. Each plot has a different set of cut-off values.

plotReadQualityHist This function creates a histogram of read basecall qualities before filtering out low basecall quality reads. The x-axis scale is logarithmic.

plotReadLengthVsQualScatterhistLogData This function creates a scatterplot with marginal density histograms of the read length vs the basecall quality. The x-axis scale is logarithmic.

plotReadLengthVsQualDensityLogData This function creates a density plot of the read length vs the basecall quality before filtering out low basecall quality reads. The x-axis scale is logarithmic.

plotReadLengthHistLogData This function creates a log-transformed histogram of read lengths before filtering out low basecall quality reads. The data are log-transformed.

plotReadLengthHistLogAxis This function creates a histogram of read lengths before filtering out low basecall quality reads. The x-axis scale is logarithmic.

plotPercentIdentity This function creates a plot of Percent identity histograms.

plotfeatureCountsSummaryGalaxy This function creates a plot of the results of featureCounts, when run on Galaxy.

plotMAPQmapped This function creates a plot of Mapping quality (MAPQ) vs Number of mapped reads.

plotMapVsUnmapLength This function creates a plot showing the distribution of the lengths of the mapped and unmapped reads separately.

plotDEUvolcano This function creates a volcano plot of the differentially used exons outputted by DEXSeq.

plotVariantCounts This function creates a plot of the splicing variant type counts for each group.

plotGeneCountsGalaxy This function creates plots for the top 6 most upregulated and the top 6 most downregulated genes, using data generated by DESeq2. The plots show the number of times each gene is present in each sample, separated by group.

plotNumberOfReads This function creates a plot of the Number of sequenced and Aligned reads.

plotCompPercentAlignReads This function creates a plot comparing the percentage of reads that aligned at least once by two different aligners.

plotCompNumAlignments This function creates a plot comparing the number of alignments generated by two different aligners.

plotfeatureCountsSummary This function creates a plot of the results of featureCounts.

plotSequencedReadLengthVsAlignedReadLength This function creates a plot of Sequenced read length vs Aligned read length.

plotNumberOfMappings This function creates a plot of the Number of alignments for the mapped reads.

plotMappingInfoPercent This function creates a plot of Percentage of mapped reads vs Type of alignment.

plotMappingInfo This function creates a plot of the Number of mapped reads vs the Type of alignment.

plotMapVsUnmapQual This function creates a plot showing the distribution of the basecall qualities of the mapped and unmapped reads separately.

plotCompareAlignments This function creates a plot comparing the number of reads that aligned at least once by two different aligners.