-
Notifications
You must be signed in to change notification settings - Fork 6
Code to do data processing and create figures for "The geography of recent genetic ancestry across Europe", http://arxiv.org/abs/1207.3815
License
petrelharp/euroibd
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This is the collection of codes used in analysis of european POPRES data (see Nelson et al, http://www.ncbi.nlm.nih.gov/pubmed/18760391; obtain access to data at http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000145.v1.p1 .) for the paper "The geography of recent genetic ancestry across Europe", by Peter Ralph and Graham Coop, published May 7, 2013, in PLoS Biology, at http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001555 To see what makes what, see the makefile. Most of the files are pretty clean, but a few have some clutter. None will actually do anything without obtaining the POPRES data and running BEAGLE http://faculty.washington.edu/browning/beagle/beagle.html on them. Please use these if they are useful, and credit us if appropriate. -------------- If you are looking to use the inference method, look for the functions sinv() and squoosh(). Also, linv() gives the least-squares approximation, useful for finding a good starting point. For instance, see the block in inversion-writeup-plots.R that sets things up to infer histories for all the pairs of populations: # Set-up: the operator L and the false positive rate estimates: gens <- cumsum( rep(1:36, each=10) ) L <- error.disc.trans( lenbins=lenbins, gens=gens, chrlens=.chrlens[setdiff(seq_along(.chrlens),omitchroms)] )) # this takes a while!! do it once and save. fp <- disc.fp(lenbins, chrlens=.chrlens) fp <- runmed(fp,k=15) # Set-up for intial least-squares estimate: S <- as.vector( smooth( hist( blocks$maplen, breaks=c(lenbins,100), plot=FALSE )$counts ) ) S <- S/mean(S) S[lenbins>20] <- exp( predict( lm( log(S) ~ lenbins + I(lenbins^2), subset=(lenbins>20) & (S>0) ), newdata=list(lenbins=lenbins[lenbins>20]) ) ) est.Sigma <- function ( np, X, alpha=100 ) { # estimate Sigma by combining the population average with the smoothed, observed data # where Sigma is the covariance matrix of X/np smX <- as.vector( smooth(X) ) ppp <- (alpha*np/(2543640+alpha*np)) # 2543640 = total # pairs Sigma <- ppp*smX + (1-ppp)*sum(smX)*S/sum(S) return( Sigma/np^2 ) } Ksvd <- svd( 1/sqrt(as.vector(S)) * L ) # pre-compute SVD of L total.npairs <- choose( length(unique(blocks$id1)), 2 ) linverse <- linv( X=lendist/total.npairs, S=as.vector(S), Sigma=est.Sigma(np=total.npairs,X=lendist), Ksvd=Ksvd, maxk=30, fp=fp ) sminverse <- smoothed.nonneg( lS=linverse, Ksvd=Ksvd, bestk=15, lam=1 ) # Here is where we get the actual estimate: ans <- sinv( lendist, sminverse, L, fp, npairs=total.npairs, lam=0, gamma=1e6 ) # squoosh iterates sinv, looking for the smoothest solution that drops no more than 2 units of log-likelihood smooth = squoosh( lendist, xinit=ans, L, fp, npairs=ans$npairs, minlen=minlen, gamma=100, lam=0, fitfn="loglik", relthresh=2/ans$npairs, twid=.05 ) -------------- Notes: * The .gmap files provide a mapping from the endpoint indices that BEAGLE uses to genetic distances (e.g. in centiMorgans). Important: BEAGLE uses 0-based indices. * BEAGLE 4.0 is out, and improved over the version we used (3.3).
About
Code to do data processing and create figures for "The geography of recent genetic ancestry across Europe", http://arxiv.org/abs/1207.3815
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published