MAO QSAR

This repository contains data and code that can be used to train machine learning model for the MAO-A and MAO-B QSAR study. This study is based on activity data downloaded from ChEMBL and the results of molecular docking (using Smina). The prepared datasets for both targets and pre-computed docking scores can be found in the data folder. The code for training QSAR models is included in the qsar directory.

Conda Environment

Install our conda environment by using the following command inside the repo directory:

conda env create -f environment.yml

Example Code

To train machine learning models on the datasets pulled from ChEMBL (reported activity or pre-computed docking scores), adapt the code snippet below:

from qsar.data import load_data, split_data
from qsar.fingerprints import calculate_fingerprints
from qsar.ml import train_rf


dataset = load_data(
    "data/mao_a_docking_score.csv",
    "docking_score"
)
dataset = calculate_fingerprints(dataset, "morgan")
data_train, data_valid, data_test = split_data(
    dataset,
    method='random'
)
desc_cols = [
    column not in ("smiles", "y")
    for column in data_train.columns
]
score_valid, score_test, parameters = train_rf(
    data_train.iloc[:, desc_cols],
    data_valid.iloc[:, desc_cols],
    data_test.iloc[:, desc_cols],
    data_train["y"],
    data_valid["y"],
    data_test["y"],
)
print(score_test)

KS Data Splitting Method

This repository contains our custom method for regression data stratification that makes the distribution of training labels similar to the distribution of testing labels. This method is based on the Kolmogorov-Smirnov D statistic and can be combined with the scaffold splitting method.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
qsar		qsar
tests		tests
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MAO QSAR

Conda Environment

Example Code

KS Data Splitting Method

About

Releases

Packages

Languages

marcin-cieslak/mao-qsar

Folders and files

Latest commit

History

Repository files navigation

MAO QSAR

Conda Environment

Example Code

KS Data Splitting Method

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages