Text-Mining Based Feature Selection for Anticancer Drug Response Prediction

Abstract

Motivation: Predicting anticancer treatment response from baseline genomic data is a critical obstacle in personalized medicine. Machine learning methods are commonly used for predicting drug response from gene expression data. In the process of constructing these machine learning models, one of the most significant challenges is identifying appropriate features among a massive number of genes.

Results: In this study, we utilize features (genes) extracted using the text-mining of scientific literatures. Using two independent cancer pharmacogenomic datasets, we demonstrate that text-mining-based features outperform traditional feature selection techniques in machine learning tasks. In addition, our analysis reveals that text-mining feature-based machine learning models trained on in vitro data also perform well when predicting the response of in vivo cancer models. Our results demonstrate that text-mining based feature selection is an easy to implement approach that is suitable for building machine learning models for anticancer drug response prediction.

Descriptions of some scripts in /R

For each argument, refer to "Reference for valid arguments" table below, for all supported inputs.

/code/init_train.R

Purpose: call functions in train_functions.R to produce models
Call with arguments: pSet, method, problem, drugname (optional)
Example: rScript init_train.R CCLE glmnet regression
Note: If drugname is specified, model will only be trained for that drug. Otherwise, model will be trained for all drugs in /R/genes

/code/crossvalidate.R

Purpose: use models produced by init_train.R to test on another dataset
Call with arguments: pSet, method, problem, drugname (optional)
Example: rScript crossvalidate.R GDSC glmnet regression
Note: If drugname is specified, model will only be tested for that drug. Otherwise, model will be tested for all drugs in /R/genes

/code/plot_data.R

Purpose: produces box and whisker plot from train or test results
Call with arguments: pSet, method, problem, drugname, metric, type
Example: rScript plot_data.R GDSC glmnet regression Adavosertib COR test

code/Fig-*.R

Purpose: produces the figures used in the manuscript. No call arguments required.

Reference for valid arguments

Argument	Valid inputs
pSet	GDSC, CCLE
method	glmnet, rf
problem	regression, class
metric	Accuracy, BalancedAccuracy, AUC, COR
type	train, test, all

Contact:

Arvind Mer: Email

Name		Name	Last commit message	Last commit date
Latest commit History 132 Commits
code		code
data		data
result		result
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-Mining Based Feature Selection for Anticancer Drug Response Prediction

Abstract

Descriptions of some scripts in /R

/code/init_train.R

/code/crossvalidate.R

/code/plot_data.R

code/Fig-*.R

Reference for valid arguments

Contact:

About

Releases

Packages

Contributors 4

Languages

spocks/text_features

Folders and files

Latest commit

History

Repository files navigation

Text-Mining Based Feature Selection for Anticancer Drug Response Prediction

Abstract

Descriptions of some scripts in /R

/code/init_train.R

/code/crossvalidate.R

/code/plot_data.R

code/Fig-*.R

Reference for valid arguments

Contact:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages