The DataLad
data set contains the follwing structure:
+---genome-data +---pre-trained-models +---patientVcf +---liftover +---scratch +---README.org
The DataLad
data set can be seen as a software package, that comes with the necessary data and code pre-configured.
It contains two docker images:
gatk
image: to run thepicard
LiftoverVcf
tool, which allows to create ‘lift’hg19
annotation of a patient.vcf
file toh38
. This is necessary, as the model data is annotated with thehg38
data.varpp-predict-utils
image: this docker image contains the scripts we need for runningVARPP-RuleFit
,predict
or a combination of the two for a combined project workflow.
In order to run the model for a project, run the following in your desired folder location on the command line:
conda activate datalad # in case you have datalad installed on the system without conda, you do not need this step
git clone https://github.com/Hobbeist/varpp-project-datalad.git
cd varpp-project-datalad