Getting started

Package to annote binding type of bioactivity measures based on keyword search of

abstracts from PubMed, PubChem assay description, CrossRef or Google Patents
assay descriptions from ChEMBL assay descriptions

The annotation is currently supported for two types of targets

Class A GPCRs with a 3-level hierarchical keyword search to annotate compounds as orthosteric, allosteric, bitopic, covalent or unknown based work in Burggraaf et al., J. Chem. Inf. Model. (2020)
Protein Kinases with 1-level keyword search to annotate compounds as allosteric or unknown based (extended) keywords from Christmann-Franck et al., J. Chem. Inf. Model. (2016)

Getting started

Install

pip install git+https://github.com/sohviluukkonen/BindingType.git@main

Usage

The package has both an API and a CLI which can process either

Papyrus datasets
lists of document and/or assay IDs

Papyrus data

In the case of Papyrus-dataframe, the annotation will a new BindingType column to the dataframe and can be done from the command line with

bindtype_papyrus -i <dataset.csv/.tsv> -tt <GPCR/Kinase>

or with the API with

from bindtype.papyrus import add_binding_type_to_papyrus
df = add_binding_type_to_papyrus(df, target_type=GPCR/Kinase)

There is also an option to annotate all 'unknown' compounds that based on their Tanimoto similarity to the annotated compounds: -sim, --similarity flag in the CLI and similarity=True in the API.

General usage

In the more general case, the annotation will create dictionaries based list of document IDs and/or assays IDs. This can be done either from the command line with

bindtype -did <document_id_file_path> -aid <assay_id_file_path> -tt <GPCR/Kinase>

or with the API with

# for the GPCRs
from bindtype import ClassA_GPCR_HierachicalBindingTypeAnnotation
parser = ClassA_GPCR_HierachicalBindingTypeAnnotation()

# for the kinases
from bindtype import Kinase_AllostericAnnotation
parser = Kinase_AllostericAnnotation()

# Only abstracts
dct_doc_annotations = parser(document_ids=list_of_document_ids)

# Only assay descriptions
dct_assay_annotations = parser(assay_ids=list_of_assay_ids)

# Both
dct_doc_annotations, dct_assay_annotations = parser(document_ids=list_of_document_ids, assay_ids=list_of_assay_ids)

As the scripts were developed with data from Papyrus and uses document and assay description IDs should be in the format used in the all_doc_ids and AID columns: PMID:<pubchem_id>, PubChemAID:<pubchem_assay_id>, DOI:, PATENT:<patent_id> and <chembl_assay_id>.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
data		data
src/bindtype		src/bindtype
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting started

Install

Usage

Papyrus data

General usage

About

Releases

Packages

Languages

License

LindeSchoenmaker/BindingType

Folders and files

Latest commit

History

Repository files navigation

Getting started

Install

Usage

Papyrus data

General usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages