Given a set of BAM files and a gene annotation BED file, calculates the Transcript Integrity Number (TIN) for each transcript.
python calculate-tin.py [-h] [options]
--version show program's version number and exit
-h, --help show this help message and exit
-i INPUT_FILES, --input=INPUT_FILES
Input BAM file(s). "-i" takes these input: 1) a single
BAM file. 2) "," separated BAM files (no spaces
allowed). 3) directory containing one or more bam
files. 4) plain text file containing the path of one
or more bam files (Each row is a BAM file path). All
BAM files should be sorted and indexed using samtools.
[required]
-r REF_GENE_MODEL, --refgene=REF_GENE_MODEL
Reference gene model in BED format. Must be strandard
12-column BED file. [required]
-c MINIMUM_COVERAGE, --minCov=MINIMUM_COVERAGE
Minimum number of read mapped to a transcript.
default=10
-n SAMPLE_SIZE, --sample-size=SAMPLE_SIZE
Number of equal-spaced nucleotide positions picked
from mRNA. Note: if this number is larger than the
length of mRNA (L), it will be halved until it's
smaller than L. default=100
--names=SAMPLE_NAMES sample names, comma separated (no spaces allowed);
number must match the number of provided bam_files
-s, --subtract-background
Subtract background noise (estimated from intronic
reads). Only use this option if there are substantial
intronic reads.
-p NRPROCESSES, --processes=NRPROCESSES
Number of child processes for the parallelization.
Default: 1
Sample output (TSV):
transcript sample_name
ENST00000303113 80.6328743265
ENST00000427445 0
ENST00000430792 59.7324017312
ENST00000647504 84.8860204563
ENST00000398647 64.4764470574
ENST00000400202 69.6331415873
ENST00000455813 85.3605191157
ENST00000397854 92.3965306733
ENST00000630077 72.8829044591
The tool was forked off the script tin.py
(v2.6.4) of the
RSeQC
package to achieve some speed-up.
This program calculates transcript integrity number (TIN) for each transcript (or gene) in BED file. TIN is conceptually similar to RIN (RNA integrity number) but provides transcript level measurement of RNA quality and is more sensitive to measure low quality RNA samples:
- TIN score of a transcript is used to measure the RNA integrity of the transcript.
- Median TIN score across all transcripts can be used to measure RNA integrity of that "RNA sample".
- TIN ranges from 0 (the worst) to 100 (the best). TIN = 60 means: 60% of the transcript has been covered if the reads coverage were uniform.
- TIN will be assigned to 0 if the transcript has no coverage or covered reads is fewer than cutoff.
Additionaly, this repository has been updated with three simple Python scripts.
Merge TIN score tables for multiple samples.
python merge-tin.py [-h] [options]
-h, --help show this help message and exit
-v {DEBUG,INFO,WARN,ERROR,CRITICAL}, --verbosity {DEBUG,INFO,WARN,ERROR,CRITICAL}
Verbosity/Log level. Defaults to ERROR
-l LOGFILE, --logfile LOGFILE
Store log to this file.
--input-files INFILES
Space-separated paths to the input tables.
--output-file OUTFILE
Path for the outfile with merged TIN scores
Output file is formatted in a TSV table as well.
Create per-sample boxplots of TIN scores.
python plot-tin.py [-h] [options]
-h, --help show this help message and exit
-v {DEBUG,INFO,WARN,ERROR,CRITICAL}, --verbosity {DEBUG,INFO,WARN,ERROR,CRITICAL}
Verbosity/Log level. Defaults to ERROR
-l LOGFILE, --logfile LOGFILE
Store log to this file.
--input-file INFILE Path to the table with merged TIN scores
--output-file-prefix OUTFILE_PREFIX
Prefix for the path to the TIN boxplots.
The boxplots are generated in PDF and
PNG formats under
output-file-prefix
+.pdf
and output-file-prefix
+.png
.
Calculate simple summary statistics for the per-sample TIN scores.
python summarize-tin.py [-h] [options]
-h, --help show this help message and exit
-v {DEBUG,INFO,WARN,ERROR,CRITICAL}, --verbosity {DEBUG,INFO,WARN,ERROR,CRITICAL}
Verbosity/Log level. Defaults to ERROR
-l LOGFILE, --logfile LOGFILE
Store log to this file.
--input-file INFILE Path to the table with merged TIN scores
--output-file OUTFILE
Path for the output table with TIN statistics.
Output file is formatted in a TSV table as well.
In order to use the scripts you will need to clone this repository and install the dependencies:
git clone https://github.com/zavolanlab/tin-score-calculation
cd tin-score-calculation
pip install .
Alternatively you can install it via pypi by:
pip install tin-score-calculation
Alternatively you can install it via conda by:
conda install -c bioconda -c conda-forge tin-score-calculation
NOTES:
- You may want to install dependencies inside a virtual environment, e.g., using
virtualenv
. Alternatively, if you useconda
we provide an environment recipe too - in such case just runconda env create
.- Some of the dependencies require specific system libraries to be installed, this however should be taken care of by the package manager.
You can then find the scripts in directory scripts/
and run it as described in
the Main usage and Extended usage sections.
To run the tool with minimum test files, try:
calculate-tin.py \
-i .test/calculate-tin/sample.bam \
-r .test/calculate-tin/transcripts.bed \
--names "sample_name" \
1> .test/calculate-tin/test.tsv
merge-tin.py \
--input-files .test/merge-tin/sample_1.tsv .test/merge-tin/sample_2.tsv \
--output-file .test/merge-tin/test.tsv
plot-tin.py \
--input-file .test/plot-tin/merged.tsv \
--output-file-prefix .test/plot-tin/test
summarize-tin.py \
--input-file .test/summarize-tin/merged.tsv \
--output-file .test/summarize-tin/test.tsv
If you have Docker installed, you can also pull the Docker image:
docker pull quay.io/biocontainers/tin-score-calculation:0.6--pyh5e36f6f_0
You can execute the scripts as following:
docker run -it quay.io/biocontainers/tin-score-calculation:0.6--pyh5e36f6f_0 calculate-tin.py --help
docker run -it quay.io/biocontainers/tin-score-calculation:0.6--pyh5e36f6f_0 merge-tin.py --help
docker run -it quay.io/biocontainers/tin-score-calculation:0.6--pyh5e36f6f_0 plot-tin.py --help
docker run -it quay.io/biocontainers/tin-score-calculation:0.6--pyh5e36f6f_0 summarize-tin.py --help
NOTE: To run the tool on your own data in that manner, you will probably need to mount a volume to allow the container read input files and write persistent output from/to the host file system.
0.6.3
Please see the list of contributors for contact information.