Skip to content

Analysis and Classification of Anopheles Populations in AG1000-3R dataset using Intergenic SNPs. Target-Malaria Group in Burt Lab, Imperial College London

Notifications You must be signed in to change notification settings

JARACH-209/TargetMalaria_UROP_Imperial_College

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Analysis and Classification of Anopheles Populations in AG1000-3R dataset using Intergenic SNPs

Target-Malaria Group in Burt Lab, Imperial College London

Objectives Covered

  • Exploratory Data Analysis
    • SNPs Filtering
      • Mega Base Pair Selection
      • MAF Filtering
      • LD Pruning
    • Minor Allele Filtering - Exploring the right MAF threhsold for rare allele filtering while preserving private alleles
    • Unsupervised exploration - PCA and UMAP visualizations for 4.8 Million SNPs and samples from 16 populations. UMAP hyperparameter tuning for chromosome arm 3R.
  • Classification of 13 Populations
    • Pipeline for population classification using genetic sequences
    • Futher improvement through dimensionality reduction and domain related techniques
  • Pairwise analysis of 66 population pairs
  • Exploring SNP contribution and importance for population differentiation
  • Generic Python functions to reproduce and automate most of the analyses

Dataset

  • MalariaGen AG1000 Phase 2 AR1 release
  • 2,284 Haplotypic samples or 1,142 individual samples from 16 populations -

BFcol, BFgam, AOcol, CIcol, CMgam, FRgam, GAgam, GHcol, GHgam, GM, GNcol, GNgam, GQgam, GW, KE, and UGgam

  • 4,836,295 Intergenic SNPs from chromosome arm 3R
  • Phased Haplotype data/biallelic (0 or 1)

About

Analysis and Classification of Anopheles Populations in AG1000-3R dataset using Intergenic SNPs. Target-Malaria Group in Burt Lab, Imperial College London

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published