wd: /newhome/aj18951/G3_Hesperia_comma/ANGSD
I will test this with G1, G2, and G3 following the example on ANGSD
Where 1=Samtools, 2=GATK (original model), 3=SOAPsnp, 4=SYK
ANGSD parallelises analyses, but not file reading (-P specifies the number of threads). See here for comments on the issue. The most efficient solution is to split the regions file (-rf) - similar to the approach used to speed up variant calling with samtools/bcftools.
I'll use the same regions files, but they need to be modified for ANGSD input.
cp ~/G3_Hesperia_comma/03_variants/regions* ~/G3_Hesperia_comma/ANGSD/
for i in $(ls regions); do sed -i 's/,/\n/g' $i; done
For the initial tests I will use just a small subset of regions to minimise computing time. For G3 each regions file contains 5200 regions and the last file (regionsbe) contains 300. I'll split regionsbe into 3 files and run scripts in parallel. For the final script I'll run arrays of 100 regions in each job
split -l 100 regionsbe regions100
And make three ANGSD input files
~/software/angsd/angsd -bam ~/G3_Hesperia_comma/mapped.modern.core.names -minInd 18 -minQ 20 -minMapQ 20 -uniqueOnly 1 -remove_bads 1 -only_proper_pairs 1 -GL 1 -doSaf 1 -out G3.GL1.test -rf regions100aa -anc ../RefGenome/Hesperia_comma
Following this tutorial by Fumagalli.
I've downloaded ANGSD into by bin folder and added to PATH
PATH="$PATH:~bin/angsd/"
Download NGSTools to bin folder and add to PATH
module load tools/git-2.18.0
module load languages/gcc-5.0
module load tools/zlib-1.2.8
module load languages/R-3.0.2
module load languages/perl-5.14.2
Calculate Fst and PBS (Population Branch statistic)