Skip to content

Parser for annotation files in General Feature Format (gff) written in Python

Notifications You must be signed in to change notification settings

Jverma/GFF-Parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 

Repository files navigation

GFF-Parser

A python GFF parser.

General Feature Format (GFF) also known as Gene-Finding Format is a file format which describes the features of genomic and protein sequences. A GFF file is a tab delimited text file where each feature is described on a single line.

More information about GFF format can be found at Wellcome Trust Sanger Institute.

e.g. for maize, the GFF file I used looks like -

9	ensembl	chromosome	1	156750706	.	.	.	ID=9;Name=chromosome:AGPv2:9:1:156750706:1
9	ensembl	gene	66347	68582	.	-	.	ID=GRMZM2G354611;Name=GRMZM2G354611;biotype=protein_coding
9	ensembl	mRNA	66347	68582	.	-	.	ID=GRMZM2G354611_T01;Parent=GRMZM2G354611;Name=GRMZM2G354611_T01;biotype=protein_coding
9	ensembl	intron	68433	68561	.	-	.	Parent=GRMZM2G354611_T01;Name=intron.1

Usage:

from gff import gffParser
import sys
    
input_file = sys.argv[1]
out = gffParser(input_file)

## get genes in the chromosome 1
out.getGenes("1")

## get mRNA corresponding to a gene
out.getmRNA(chrom, gene)

## get coding regions in the mRNA
out.getCDS(chrom, mRNA)

## get introns/exons
out.getInrons(chrom, mRNA)
out.getExonss(chrom, mRNA) 

About

Parser for annotation files in General Feature Format (gff) written in Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages