Tested with Hail version 0.2.107 and Snakemake 7.32
data/
contains Hail and Snakemake code that requires execution in Google Cloud and saves files to a Google Storage (GS) bucket. Copies of the generated files are available in files/
.
- Create a new cluster:
hailctl dataproc start <cluster_name> --packages snakemake --requester-pays-allow-buckets gnomad-public-requester-pays --project <project_name> --bucket <bucket_name> --region <region> --num-workers <N> --image-version=2.0.27-debian10
- Connect to the cluster:
gcloud beta compute ssh <user_name>@<cluster_name>-m --project "<project_name>"
git clone
this repository and navigate todata/
- Run the pipeline:
snakemake --cores all --configfile config.yaml --config gcp_rootdir="<bucket_name>/some_directory/"
Alternatively, in Step 4 you can submit the pipeline as a job. Create job.py
containing the following:
import snakemake
snakemake.main(
[
"--snakefile",
"/path/to/Snakefile",
"--cores",
"all",
"--configfile",
"/path/to/config.yaml",
"--config",
'gcp_rootdir="<bucket_name>/some_directory/"',
]
)
Submit the script with hailctl dataproc submit <cluster_name> job.py
analysis/
contains scripts that calculate and visualise CAPS scores using files created in data/
.
- Navigate to
CAPS/analysis/
snakemake --cores all --config gcp="False"
(faster: uses copies fromfiles/
) orsnakemake --cores all --config gcp="True" gcp_rootdir="<bucket_name>/some_directory/"
(slower: uses GS files)
To get CAPS estimates for your set of variants, use the template
file: snakemake -s template -c1 -C [KEY=VALUE ...]
. The required values are
obs
(grouped variants annotated with at leastcontext
,ref
,alt
,methylation_level
,singleton_count
andvariant_count
fields)exp
(expected proportions, one percontext
-ref
-alt
-methylation_level
group)var
(variable of interest, must be a valid field inobs
)calculate_caps_script
(calculate_caps.R
)viz_scores_script
(viz_scores.R
)scores
(filename for the output scores)plot
(filename for the output plot)
For example, snakemake -s template -c1 -C obs=analysis/canonical_splice_site_vars.tsv exp=model/phat.tsv var=worst_csq calculate_caps_script=analysis/calculate_caps.R viz_scores_script=analysis/viz_scores.R scores=scores.tsv plot=plot.pdf
.