Target-Malaria Group in Burt Lab, Imperial College London
- Exploratory Data Analysis
- SNPs Filtering
- Mega Base Pair Selection
- MAF Filtering
- LD Pruning
- Minor Allele Filtering - Exploring the right MAF threhsold for rare allele filtering while preserving private alleles
- Unsupervised exploration - PCA and UMAP visualizations for 4.8 Million SNPs and samples from 16 populations. UMAP hyperparameter tuning for chromosome arm 3R.
- SNPs Filtering
- Classification of 13 Populations
- Pipeline for population classification using genetic sequences
- Futher improvement through dimensionality reduction and domain related techniques
- Pairwise analysis of 66 population pairs
- Exploring SNP contribution and importance for population differentiation
- Generic Python functions to reproduce and automate most of the analyses
- MalariaGen AG1000 Phase 2 AR1 release
- 2,284 Haplotypic samples or 1,142 individual samples from 16 populations -
BFcol, BFgam, AOcol, CIcol, CMgam, FRgam, GAgam, GHcol, GHgam, GM, GNcol, GNgam, GQgam, GW, KE, and UGgam
- 4,836,295 Intergenic SNPs from chromosome arm 3R
- Phased Haplotype data/biallelic (0 or 1)