STAT 844 Statistical Learning - Advanced Regression
Empirical Evaluation Of Elastic Net In Cancer Gene Analysis
We did a data analytic and applied statistics project in the context of high-dimensional cancer gene data modeling. Variable selection is of paramount importance in gene data analysis because this kind of data often has extremely high dimensions relative to the sample size. Among all kinds of variable selection techniques, the Elastic Net is excellent for its parameter estimation ability, the balance of fitting and penalizing, and the unique advantage of the grouping effect.
In our project, we trained several gene TP53 inactivation classifiers, compared the performance of Lasso, Ridge, and Elastic net in sparse logistic regression and support vector machines. We verified that logistic regression is a better model in this project and Elastic Net performs better than Lasso and Ridge regularization, with a higher test AUC (93.6%) and a reasonable proportion of selected variables. In addition, we did further analysis to make sure this Elastic Net model has biological significance and good interpretability.