HEAL (Hierarchical Estimate from Agnostic Learning)

Machine learning-based genome analysis and risk prediction framework.

supported file type

csv file (Mutation burden matrix)
- row: sample ID, column: gene name

requirements

python3
pandas
numpy
scikit-learn
scipy

How to run

Step0 Prepare mutation burden matrix.

Annotate VCF file of whole exome or genome sequencing data with gene name, deleteriousness score, and allele frequency info.
Preprosess annotated genotype data to calculate mutation burden. Sample mutation burden file is available in toy_data.

Step 1 Run the HEAL script

Input file: Mutation burden matrix.

Step 2 The model outputs

Disease gene lists.
Genetic risk prediction model.
Prediction performance summary.

Usage

Run the HEAL script from the command line with the following arguments:

python HEAL.py --file_path <path_to_input_file> [options]

Command-line Arguments

Argument	Type	Required	Default	Description
`--file_path`	str	Yes	-	Full path to the input file
`--output`	str	No	Current working directory	Output path
`--splits`	int	No	5	Number of splits for cross-validation
`--trials`	int	No	1	Number of trials to run
`--l1`	float	No	1.0	Lower bound of lambda candidates
`--l2`	float	No	40.0	Upper bound of lambda candidates
`--lfidelity`	int	No	5	Fidelity of linspace of lambda candidates
`--scoring`	str	No	'roc_auc'	Scoring metric to maximize
`--random_state`	int	No	42	Random state to start from
`--tts`	bool	No	False	Use train_test_split instead of StratifiedKFold for outer CV

Citation

Please cite the following paper

Hirotaka Ieki, Kaoru Ito, Sai Zhang, Satoshi Koyama, Martin Kjellberg, Hiroki Yoshida, Ryo Kurosawa, Hiroshi Matsunaga, Kazuo Miyazawa, Nobuyuki Enzan, Changhoon Kim, Jeong-Sun Seo, Koichiro Higasa, Kouichi Ozaki, Yoshihiro Onouchi, Koichi Matsuda, Yoichiro Kamatani, Chikashi Terao, Fumihiko Matsuda, Michael Snyder, Issei Komuro "Machine Learning Reveals the Contribution of Rare Genetic Variants and Enhances Risk Prediction for Coronary Artery Disease in the Japanese Population" medRxiv 2024 doi.org/10.1101/2024.08.13.24311909

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
toy_data		toy_data
HEAL.py		HEAL.py
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HEAL (Hierarchical Estimate from Agnostic Learning)

supported file type

requirements

How to run

Step0 Prepare mutation burden matrix.

Step 1 Run the HEAL script

Step 2 The model outputs

Usage

Command-line Arguments

Citation

About

Releases

Packages

Contributors 2

Languages

License

pirocv/HEAL

Folders and files

Latest commit

History

Repository files navigation

HEAL (Hierarchical Estimate from Agnostic Learning)

supported file type

requirements

How to run

Step0 Prepare mutation burden matrix.

Step 1 Run the HEAL script

Step 2 The model outputs

Usage

Command-line Arguments

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages