Skip to content

Latest commit

 

History

History
65 lines (34 loc) · 1.94 KB

04.Analyses_ANGSD.md

File metadata and controls

65 lines (34 loc) · 1.94 KB

Analyses with ANGSD

wd: /newhome/aj18951/G3_Hesperia_comma/ANGSD

1. Test the best Genotype Likelihood model to use

I will test this with G1, G2, and G3 following the example on ANGSD

Where 1=Samtools, 2=GATK (original model), 3=SOAPsnp, 4=SYK

ANGSD parallelises analyses, but not file reading (-P specifies the number of threads). See here for comments on the issue. The most efficient solution is to split the regions file (-rf) - similar to the approach used to speed up variant calling with samtools/bcftools.

I'll use the same regions files, but they need to be modified for ANGSD input.

cp ~/G3_Hesperia_comma/03_variants/regions* ~/G3_Hesperia_comma/ANGSD/

for i in $(ls regions); do sed -i 's/,/\n/g' $i; done

For the initial tests I will use just a small subset of regions to minimise computing time. For G3 each regions file contains 5200 regions and the last file (regionsbe) contains 300. I'll split regionsbe into 3 files and run scripts in parallel. For the final script I'll run arrays of 100 regions in each job

split -l 100 regionsbe regions100

And make three ANGSD input files

~/software/angsd/angsd -bam ~/G3_Hesperia_comma/mapped.modern.core.names -minInd 18 -minQ 20 -minMapQ 20 -uniqueOnly 1 -remove_bads 1 -only_proper_pairs 1 -GL 1 -doSaf 1 -out G3.GL1.test -rf regions100aa -anc ../RefGenome/Hesperia_comma

Following this tutorial by Fumagalli.

I've downloaded ANGSD into by bin folder and added to PATH

PATH="$PATH:~bin/angsd/"

Download NGSTools to bin folder and add to PATH

module load tools/git-2.18.0
module load languages/gcc-5.0
module load tools/zlib-1.2.8

module load languages/R-3.0.2
module load languages/perl-5.14.2


Calculate Fst and PBS (Population Branch statistic)