Skip to content

This repo reimplements deepCRE using tensorflow 2.10.0 and SHAP. Thus replacing tensorflow 1.14.0 and DeepLIFT

Notifications You must be signed in to change notification settings

gnoffl/deepCRE_reimplemented

 
 

Repository files navigation

deepCRE-core

This repo reimplements deepCRE using shap and modisco. It also upgrades to tensorflow 2.10.0.

Required directory structure

Within the current directory containing all python scripts, you should have the following sub directories:

  • genome : Contains the reference genomes you wish to train deepCRE on.

  • gene_models: Contains the reference gtf files matching the genomes.

  • tmp_counts: Contains target files. This file must have at least two columns named exactly: gene_id and target.

    The target column should be the already assigned true labels. This reimplementation allows you the freedom to assign true labels as you see fit.

Examples

I have provided two example input files:

example_input.csv: For training. This is an example of how your input training file should look. You can add more rows, where each row will be one species of interest. This file contains primarily 6 columns:

genome| gtf | tpm_target | output_name |number of chromosomes | pickle_key_id |

example_predict_input.csv: For predictions, modisco, shap runs. his file contains the following columns:

genome| gtf | tpm_target | output_name |number of chromosomes |

Usage

To train models use the following commands

python train_ssr_models.py --input example_input.csv --pickle validation_genes.pickle --model_case SSR --ignore_small_genes yes
python train_ssr_models.py --input example_input.csv --pickle validation_genes.pickle --model_case SSC --ignore_small_genes yes

To run predictions use

python deepcre_predict.py --input example_predict_input.csv --model_case SSR --ignore_small_genes yes

To compute shap importance scores. You also get a ..shap_meta.csv file for genes and predictions This file is a one-to-one match with scores in the ..shap_score.h5 output

python deepcre_interprete.py --input example_predict_input.csv --model_case SSR --ignore_small_genes yes

To run modisco

python deepcre_motifs.py --input example_predict_input.csv --model_case SSR --ignore_small_genes yes

About

This repo reimplements deepCRE using tensorflow 2.10.0 and SHAP. Thus replacing tensorflow 1.14.0 and DeepLIFT

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%