Genomes_project

Statistical analysis of annotated genomes

Goals and objectives

Find correlation between genomic features (like SNPs, methylation, TFBS) and functional genomic regions in different genomes

Plot sequence features such as TFBS, SNPs, methylation, RNA-seq coverage
Map it on functional genomic regions
Find correlation and check reproducibility for different genomes
Consider annotation quality and outcomes for functional features (like promoters)prediction for not annotated genomes

Data:

Graphs for Oryza sativa [1]

Arabidopsis thaliana

reference genome TAIR10_toplevel (ftp://ftp.ensemblgenomes.org/pub/plants/release-39/fasta/arabidopsis_thaliana/dna/)
annotation TAIR10_GFF3_genes.gff3
variation vcf file 1001 genome TAIR
methylation data

Drosophila melanogaster

reference assembly dmel_r5.57_FB2014_03 from FlyBase, dmel-all-chromosome-r5.57.fasta.gz
annotation dmel_r5.57_FB2014_03 dmel-all-filtered-r5.57.gff.gz
variation downloaded for each chromosome for all populations in one file in .vcf formatPopFly Browser Hervas S, Sanz E, Casillas S, Pool JE, and Barbadilla A (2017) PopFly: the Drosophila population genomics browser. Bioinformatics, 33, 2779-2780;

Danio rerio

Scripts for data preprocessing:

get_ATGs.py
get_4tss.py
get_4tts.py
get_promoters.py
get_fin_anno.py

Data preprocessing:

to create file with ATGs: python3 get_ATGs.py annotation.gff
to create file with tss: python3 get_4tss.py annotation.gff
to create files with promoter regions (.bed + .txt): python3 get_promoters.py 4tss.txt
to obtain promoter regions sequences: sed 's/^>1.*$/>Chr1/' Arabidopsis_thaliana.TAIR10.dna.toplevel.fa | sed 's/^>2.*$/>Chr2/' | sed 's/^>3.*$/>Chr3/'| sed 's/^>4.*$/>Chr4/'| sed 's/^>5.*$/>Chr5/'| sed 's/^>Mt.*$/>ChrM/'| sed 's/^>Pt.*$/>ChrC/' > new_ref.fa in order to get names of chromosomes in fasta consistent with names in bed file, then bedtools getfasta -fi corrected_reference.fasta -bed promoters.bed -name -s -fo promoters_sequences.fasta
to create fin_anno: python3 get_fin_anno.py annotation.gff

Plots visualization:

first (and the most important) file is snp_custom_annotation.r, which contains a function that create custom annotation of snps - all other scripts use these function
ATG_plot.r is used for visualization SNP distribution around start codon (required packages are dplyr, scales)
intron_exon_junctions.r is used for visualization of SNP distribution around exon-intron boundary
promoter-terminator.r is used for visualization of SNP distribution around terminator
transcr_stop_plot.r is used for visualization of SNP distribution around transcription stop codon
transfac.r is used for visualization distribution of TFBSs in promoter region (+-500 nucleotides around TSS)

Several results:

Arabidopsis thaliana

Medicago truncatula

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
Arabidopsis_thaliana		Arabidopsis_thaliana
Danio_rerio		Danio_rerio
Drosophila_melanogaster		Drosophila_melanogaster
Felis_catus		Felis_catus
Homo_sapiens		Homo_sapiens
Medicago_truncatula		Medicago_truncatula
Mus_musculus		Mus_musculus
Oryza_sativa		Oryza_sativa
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
all_promoters_for TRANSFAC.zip		all_promoters_for TRANSFAC.zip

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Genomes_project

Statistical analysis of annotated genomes

Goals and objectives

Data:

Arabidopsis thaliana

Medicago truncata

Homo sapiens

Mus musculus

Felis catus

Drosophila melanogaster

Danio rerio

Scripts for data preprocessing:

Data preprocessing:

Plots visualization:

Several results:

About

Releases

Packages

Contributors 3

Languages

danchurova/Genomes_project

Folders and files

Latest commit

History

Repository files navigation

Genomes_project

Statistical analysis of annotated genomes

Goals and objectives

Data:

Arabidopsis thaliana

Medicago truncata

Homo sapiens

Mus musculus

Felis catus

Drosophila melanogaster

Danio rerio

Scripts for data preprocessing:

Data preprocessing:

Plots visualization:

Several results:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages