We used a subset of the data to develop the app: variants and genes in chr21. The different dbVar versions (GRCh37 and GRCh38) and genes were subset as shown in the extract-chr21-genes-variants.ipynb notebook.
The AF_extract.py script reads a gnomAd_SV vcf file and extracts the allele frequencies of all variants and outputs them into a CSV file
Usage:
python AF_extract.py -vcf <path to the gnomAD_SV vcf file> -out <path to the output file>
The gnomad-variants-af.ipynb notebook shows how we matched variants in dbVar (GRCh37) and the gnomAD-SV. Briefly: 50% reciprocal overlap per SV type.
R/Bioconductor packages used include:
- GenomicRanges
- dplyr
We used annotGeneSV.R (at first linkGeneWithSV.R) to read variants from dbVar and Gencode annotation and extract relevant information.
The variant-gene-overlap.ipynb notebook shows how we overlapped variants in chr21 with Gencode to extract gene impact information.
R/Bioconductor packages used include:
- rtracklayer
- GenomicRanges
- dplyr
We overlapped SVs with ClinVar using bedtools. The overlaps were then summarize in a TSV file using code in the make-misc-tsvs.ipynb notebook.
We used study nstd102 that contain clinical SVs. The code to make the TSV for this annotation is part of the make-misc-tsvs.ipynb notebook.