Skip to content

Yet another Python module for reading BGEN files

Notifications You must be signed in to change notification settings

nadavbra/bgen_parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

What is bgen_parser

bgen_parser is a simple, lightweight and (hopefully) efficient Python parser for the BGEN format. It is nothing more than a Python wrapper to the bgenix C++ library of Gavin Band.

The main motivation for developing this package was that, at the time, I couldn't find a decent BGEN parser that would parse the imputed genotypes of the UK Biobank in a reasonable time (it took them too long to initially load the data). I needed a parser that would work in real time.

Usage

For example, to parse the imputed genotypes of the UK Biobank on chromosome 14:

import os
from bgen_parser import BgenParser

UKBB_IMPUTATION_V3_DIR = '/path/to/uk_biobank/EGAD00010001474'
chrom = '14'

bgen_file_path = os.path.join(UKBB_IMPUTATION_V3_DIR, 'ukb_imp_chr%s_v3.bgen' % chrom)
bgi_file_path = os.path.join(UKBB_IMPUTATION_V3_DIR, 'ukb_imp_chr%s_v3.bgen.bgi' % chrom)
sample_file_path = os.path.join(UKBB_IMPUTATION_V3_DIR, 'ukb26664_imp_chr%s_v3.sample' % chrom)

chrom_imputation_data = BgenParser(bgen_file_path, bgi_file_path, sample_file_path)

chrom_imputation_data.sample_ids # A series with the sample IDs
chrom_imputation_data.variants # A dataframe of all the variants
chrom_imputation_data.read_variant_probs(4) # Will read the genotyping of the fifth variant, returning a numpy array of shape (n_samples, 3)

Installation

Python dependencies

  • cython
  • numpy
  • pandas

Step 1: Install bgenix

The following instructions worked at the time they were written, but it could very well be that bgenix has since changed. If it doesn't work for you, please refer to their website for instructions.

To install bgenix at ~/third_party/bgenix, do the following:

cd /tmp
wget http://bitbucket.org/gavinband/bgen/get/master.tar.gz
tar xvfz master.tar.gz
mv gavinband-bgen-44fcabbc5c38 ~/third_party/bgenix
cd ~/third_party/bgenix
./waf configure
./waf

Step 2: Install bgen_parser

  1. Set the BGENIX_DIR environment variable to whatever directory you have installed bgenix at. For example, in cshell it would look like:
setenv BGENIX_DIR /cs/phd/nadavb/third_party/bgenix
  1. Run:
python setup.py install

Cite us

If you use bgen_parser as part of work contributing to a scientific publication, we ask that you cite our paper: Brandes, N., Linial, N. & Linial, M. PWAS: proteome-wide association study—linking genes and phenotypes by functional variation in proteins. Genome Biol 21, 173 (2020). https://doi.org/10.1186/s13059-020-02089-x

About

Yet another Python module for reading BGEN files

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published