Skip to content

Latest commit

 

History

History
60 lines (47 loc) · 3.44 KB

14.ehh_pbs.md

File metadata and controls

60 lines (47 loc) · 3.44 KB

Population Branch Statistic (PBS) in EHH sweep regions

Regions identified as sweeps via EHH statistics also tended to coincide with extreme values of the population branch statistic (PBS)

Figure 1: Manhattan plots showing the coincidence of extreme values of the population branch statistic (PBS) and regions under selection identified by EHH based scans. PBS estimates for each population are shown as points with the other two considered as outgroups. Points are shown in black and grey to indicate transitions between alternating pseudo-chromosomes via mapping to the A. millepora assembly from Fuller et al (Fuller et al. 2020). The purple shaded baseline shows the location of regions identified as candidates for positive selection using EHH-based scans. Blue points indicate windows where outlying PBS values are coincident with EHH scans.

Genes associated with the intersection of significant PBS and EHH regions

We used a python script to identify regions where the population branch statistic for each population exceeded the significance threshold for a false discovery rate of 1%.

cat data/hpc/selection2/pbs/plink2_noheader.pbs | awk 'BEGIN{OFS="\t"}{print $1,$2,$3}' | ./scripts/pbs2bed.py -t 0.76 > data/hpc/selection2/pbs/pbs_in.bed
cat data/hpc/selection2/pbs/plink2_noheader.pbs | awk 'BEGIN{OFS="\t"}{print $1,$2,$4}' | ./scripts/pbs2bed.py -t 0.49 > data/hpc/selection2/pbs/pbs_no.bed
cat data/hpc/selection2/pbs/plink2_noheader.pbs | awk 'BEGIN{OFS="\t"}{print $1,$2,$5}' | ./scripts/pbs2bed.py -t 0.44 > data/hpc/selection2/pbs/pbs_so.bed

# 10% FDR
cat data/hpc/selection2/pbs/plink2_noheader.pbs | awk 'BEGIN{OFS="\t"}{print $1,$2,$3}' | ./scripts/pbs2bed.py -t 0.6 > data/hpc/selection2/pbs/pbs_in.bed
cat data/hpc/selection2/pbs/plink2_noheader.pbs | awk 'BEGIN{OFS="\t"}{print $1,$2,$4}' | ./scripts/pbs2bed.py -t 0.47 > data/hpc/selection2/pbs/pbs_no.bed
cat data/hpc/selection2/pbs/plink2_noheader.pbs | awk 'BEGIN{OFS="\t"}{print $1,$2,$5}' | ./scripts/pbs2bed.py -t 0.41 > data/hpc/selection2/pbs/pbs_so.bed

Bedtools (v2.30.0) was then used to find genes that intersected with these significant intervals and these in turn were then intersected with sweep regions identified via EHH statistics.

cd data/hpc/selection2/
bedtools intersect -wo -b pbs_in.bed -a ../../../genome/adig-v2-ncbi.gff | awk '$14>0 && $3=="gene"' > pbs_genes_in.gff
bedtools intersect -wao -a pbs_genes_in.gff -b ../../tracks/sweeps.gff3 | grep 'inshore' > pbs_genes_in_sweeps.tsv

bedtools intersect -wo -b pbs_no.bed -a ../../../genome/adig-v2-ncbi.gff | awk '$14>0 && $3=="gene"' > pbs_genes_no.gff
bedtools intersect -wao -a pbs_genes_no.gff -b ../../tracks/sweeps.gff3 | grep 'northoffshore' > pbs_genes_no_sweeps.tsv

bedtools intersect -wo -b pbs_so.bed -a ../../../genome/adig-v2-ncbi.gff | awk '$14>0 && $3=="gene"' > pbs_genes_so.gff
bedtools intersect -wao -a pbs_genes_so.gff -b ../../tracks/sweeps.gff3 | grep 'southoffshore' > pbs_genes_so_sweeps.tsv

Bedtools was also used to simply find ehh-based sweeps overlapping with significant pbs regions

bedtools intersect -u -b pbs_in.bed -a ../../tracks/sweeps.gff3  | grep 'inshore' > sweeps_in_pbs.gff
bedtools intersect -u -b pbs_no.bed -a ../../tracks/sweeps.gff3  | grep 'northoffshore' > sweeps_no_pbs.gff
bedtools intersect -u -b pbs_so.bed -a ../../tracks/sweeps.gff3  | grep 'southoffshore' > sweeps_so_pbs.gff