EPCOT (comprehensively predicting EPigenome, Chromatin Organization and Transcription) is a comprehensive model to jointly predict epigenomic features, gene expression, high-resolution chromatin contact maps, and enhancer activities from DNA sequence and cell-type specific chromatin accessibility data.
We have developed resources to assist users in predicting other genomic modalities from ATAC-seq. These include a Google Colab notebook and a webpage https://liu-bioinfo-lab.github.io/EPCOT_APP.github.io/.
- einops (0.3.2)
- kipoiseq (0.5.2)
- numpy (1.19.5)
- torch (1.10.1)
- scipy (1.7.3)
- scikit-learn (1.0.2)
You can use conda
and pip
to install the required packages
conda create -n epcot python==3.9
conda activate epcot
pip install -r requirements.txt
Please go to the directory Input/ for how to generate the inputs to EPCOT (one-hot repsentations of DNA sequences and normalized DNase-seq). All the human data used in EPCOT are in reference genome hg38 and the data processing codes are also for hg38 version.
You can download EPCOT models trained on DNA sequence and DNase-seq or ATAC-seq from Google Drive or
For the trained downstream models and how to train downstream models from scratch, you can go to each correspoding directory GEP/, COP/, and EAP/.
We prepare a Google Colab Notebook EPCOT_usage.ipynb to introduce how to use EPCOT to predict multiple modalities.
We prepare a GitHub page to share our TF sequence binding patterns along with Tomtom motif comparison results, and we also summarize the results in an EXCEL file motif_comparison_summary.xls.