Scripts for Lin, Y., Darolti, I., Furman, B. L. F., Almeida, P., Sandkam, B. A., Breden, F., Wright, A. E., Mank, J. E. (2022) Gene duplication to the Y chromosome in Trinidadian Guppies. Molecular Ecology. 31, 1853–1863. https://doi.org/10.1111/mec.16355
Note: All Python scripts are written with Python3, and are NOT compatible with Python2.
-
Quality control using FastQC and Trimmomatic
python 01.quality_control_fqs.py -fqs [/path/fq_files/] \ -o [./qc/] \ -t [10] \ -trim \ -qc
-
Align high-quality reads to female reference genome using BWA MEM
python 02.alignment_bwa.py -r [/path/ref_genome/] \ -bwa [bwa] \ -fq [./trim/] \ -t [10] \ -a
-
Fixmate, sort and mark duplications using SAMtools
sh 03.align_postprocess.sh [sample_ID]
-
Genotyping, SNP filtering and calculating intersexual Fst
#(1) Genotyping using BCFtools
sh 04.GenotypingSNPFiltering/SNP_calling.sh [sample_ID] [region_name]
#(2) SNP filtering using VCFtools, here we only keep SNPs in CDS regions by specifying
--bed guppy_cds_coords.txt
vcftools --vcf [input.vcf] \ --maf 0.05 \ --mac 1 \ --min-alleles 2 \ --max-alleles 2 \ --max-missing 0.9 \ --min-meanDP 10 \ --max-meanDP 100 \ --bed guppy_cds_coords.txt \ --recode --recode-INFO-all \ --minGQ 25 \ --out [outprefix]
#(3) Calculating intersexual Fst for each SNP using VCFtools
vcftools --vcf [input.vcf] \ --weir-fst-pop [female.txt] \ --weir-fst-pop [male.txt] \ --out [outprefix]
#(4) SNPs with top 1% Fst
# we simply do 1% cut-off here, for example in Python3, we use quantile function from numpy pkg numpy.quantile (data, 0.01)
#(5) Fisher's exact test using PLINK1.9, please check details in 05.FisherExactTest
python 05.FisherExactTest/changeid.py [original.vcf] [newid.vcf] sh 05.FisherExactTest/fisher_cds_vcf.sh [newid.vcf] [outprefix]
#(6) Permutation test, please check details in 06.PermutationTest
sh 06.permutation_test/run_permutation_sex.sh [filtered.cds.vcf]
-
Male-to-Female read depth ratio, please check details in 07.M2FReadDepth
sh 07.M2FReadDepth/MFReadDepth.sh
-
Tajima's D
#(1) exclude genes with immune and MHC function in reference genome based on Ensembl Biomart info.
python 08.TajimaD/exclude_immune_MHC.py [immune_MHC.bed] [outprefix.csv]
#(2) calculate gene-based Tajima's D
python 08.TajimaD/gene_TajimaD.py [gene_boundary.csv] [site.TajimaD] [outprefix.csv]
-
Relatedness inference using ngsRelate and KING
#(1) ngsRelate, here, please use filtered VCF and DON'T exclude non-CDS regions
/path/ngsrelate -h [filtered.vcf.gz] -O [vcf.res]
#(2) KING, before using KING, please convert VCF to
.bed
,.fam
and.bim
using PLINK1.9/path/king -b [my.bed] \ --fam [my.fam] \ --bim [my.bim] \ --related