seek for circular RNA in transcriptome (identifies deferentially expressed circRNAs between two samples)

Version: v1.0.0.b

Last Modified: 2019-04-25

Authors: Bioinformatics Lab, University of Louisville, Kentucky Biomedical Research Infrastructure Network (KBRIN)

A schematic flow shows the pipeline


Different Junction Counts (linear junction counts and circular junction counts) used in the circular RNA expression level estimation.


Percent Back-spliced In (PBI) calculation (circular junction counts with respect to all junction counts)


Example PBI calculation for a differentially expressed circular RNA.

circular RNA derived from exon 2 of Cdp gene, Rat, Hippocampus, Somata and Neuropil. This circular RNA is highly expressed in Neuropil (31.9%) than Somata (11.9%). DifferentJunctionCounts


Software / Package

  • STAR Aligner: v2.5.2b



1 Download seekCRIT

git clone

cd seekCRIT

2 Install required packages (Some prerequisite packages might require admin access rights. Please contact your system admin to install such packages.)

pip3 install -r Prerequisites.txt

3 Install seekCRIT

python install

4 testing seekCRIT with

In order to run

 ./ gtf/Rattus_norvegicus.Ensembl.rn6.r84.gtf fasta/rn6.fa


usage: [-h] -s1 S1 -s2 S2 -gtf GTF -o OUTDIR -t {SE,PE} 
               --genomeIndex GENOMEINDEX -fa FASTA -ref REFSEQ
               [--threadNumber numThreads]
               [--aligner aligner]
               [--deltaPSI DELTAPSI] [--highConfidence HIGHCONFIDENCE]
               [--libType {fr-unstranded,fr-firststrand,fr-secondstrand}]
               [--keepTemp {Y,N}]

Identifying and Characterizing Differentially Spliced circular RNAs between
two samples

Required arguments:
  -s1 S1, --sample1 S1  fastq files for sample_1. Replicates are separated by
                        comma. Paired-end reads are separated by colon.
                        e.g.,s1-1.fastq,s1-2.fastq for single-end read. s1-1.R
                        1.fastq:s1-1.R2.fastq,s1-2.R1.fastq:s1-2.R2.fastq for
                        single-end read
  -s2 S2, --sample2 S2  fastq files for sample_2. Replicates are separated by
                        comma. Paired-end reads are separated by colon.
                        e.g.,s2-1.fastq,s2-2.fastq for single-end read. s2-1.R
                        1.fastq:s2-1.R2.fastq,s2-2.R1.fastq:s2-2.R2.fastq for
                        single-end read
  -gtf GTF, --gtf GTF   The gtf annotation file. e.g., hg38.gtf
  -o OUTDIR, --output OUTDIR
                        Output directory
  -t {SE,PE}, --readType {SE,PE}
                        Read type. SE for Single-end read, PE for Paired-end read
  --genomeIndex GENOMEINDEX
                        Genome indexes for the aligner
  -fa FASTA, --fasta FASTA
                        Genome sequence. e.g., hg38.fa
  -ref REFSEQ, --refseq REFSEQ
                        Transcriptome in refseq format. e.g., hg38.ref.txt
 optional arguments:

   -h, --help            show this help message and exit
   --threadNumber numberOfThreadsk
                        Number of threads for multi-threading feature [default = 4]
  --aligner aligner     aligner to use(for now it supports only STAR but we are working on it to support more aligners)

  --deltaPSI DELTAPSI   Delta PSI cutoff. i.e., significant event must show
                        bigger deltaPSI than this cutoff [default = 0.05]
  --highConfidence HIGHCONFIDENCE
                        Minimum number of circular junction counts required [default = 1]
  --libType  {fr-unstranded,fr-firststrand,fr-secondstrand}
                        library type used by Tophat aligner [default ='fr-unstranded']
  --keepTemp {Y,N}      Keep temp files or not  [default='Y']


Paired-end reads

python3 -o PEtest -t PE -fa fa/hg19.fa -ref ref/hg19.ref.txt --genomeIndex /media/bio/data/STARIndex/hg19 -s1 testData/231ESRP.25K.rep-1.R1.fastq:testData/231ESRP.25K.rep-1.R2.fastq,testData/231ESRP.25K.rep-2.R1.fastq:testData/231ESRP.25K.rep-2.R2.fastq -s2 testData/231EV.25K.rep-1.R1.fastq:testData/231EV.25K.rep-1.R2.fastq,testData/231EV.25K.rep-2.R1.fastq:testData/231EV.25K.rep-2.R2.fastq -gtf testData/test.gtf --threadNumber 12 

Single-end reads

python3 -o SEtest -t SE -fa fa/hg19.fa -ref ref/hg19.ref.txt --genomeIndex /media/bio/data/STARIndex/hg19 -s1 testData/231ESRP.25K.rep-1.R1.fastq,testData/231ESRP.25K.rep-1.R2.fastq,testData/231ESRP.25K.rep-2.R1.fastq,testData/231ESRP.25K.rep-2.R2.fastq -s2 testData/231EV.25K.rep-1.R1.fastq,testData/231EV.25K.rep-1.R2.fastq,testData/231EV.25K.rep-2.R1.fastq,testData/231EV.25K.rep-2.R2.fastq -gtf testData/test.gtf --threadNumber 12 


  • Transcriptome should be in refseq format below (see more details in the example ):
Field Description
geneName Name of gene
isoform_name name of isoform
chrom chromosme
strand strand (+/-)
txStart Transcription start position
txEnd Transcription end position
cdsStart Coding region end
exonCount Number of exons
exonStarts Exon start positions
exonEnds Exon end positions
  • It is not obligatory to provide REFSEQ file, we made script (GTFtoREFSEQ) to convert from gtf to refseq that is used in the main code if no refseq file is provided.


See details in the example file

Field Description
chrom chromosome
circRNA_start circular RNA 5' end position
circRNA_end circular RNA 3' end position
strand DNA strand (+/-)
exonCount number of exons included in the circular RNA transcript
exonSizes size of exons included in the circular RNA transcript
exonOffsets offsets of exons included in the circular RNA transcript
circType circRNA, ciRNA, ccRNA
geneName name of gene
isoformName name of isoform
exonIndexOrIntronIndex Index (start from 1) of exon (for circRNA) or intron (for ciRNA) in given isoform
FlankingIntrons Left intron/Right intron
CircularJunctionCount_Sample_1 read count of the circular junction in sample # 1
LinearJunctionCount_Sample_1 read count of the linear junction in sample # 1
CircularJunctionCount_Sample_2 read count of the circular junction in sample # 2
LinearJunctionCount_Sample_2 read count of the linear junction in sample # 2
PBI_Sample_1 Percent Backsplicing Index for sample # 1
PBI_Sample_2 Percent Backsplicing Index for sample # 2
deltaPBI(PBI_1-PBI_2) difference between PBI values of two samples
pValue pValue

To calculate the significane of differentially expressed circular RNAs, use the criteria:

  • At least 5% changes in percent back-spliced in (PBI) or |deltaPBI|>=5%

  • FDR < 0.05


Copyright (C) 2017 . See the LICENSE file for license rights and limitations (MIT).


