In this module, we evaluate the final and shuffled baseline ML models.
After training the models in 2.train_model, we use these models to predict the labels of the training and testing datasets and evaluate their predictive performance.
We evaluate each model for each combination of model type (final
, shuffled_baseline
), feature type (CP
, DP
, CP_and_DP
, CP_zernike_only
, CP_areashape_only
), balance type (balanced
, unbalanced
), dataset type (ic
, no_ic
) and dataset (train
, test
).
See 2.train_model/README.md for more information on model combinations.
In get_model_predictions.ipynb, we derive the predicted and true phenotypic class for each model, feature type, and dataset combination. These predictions are saved in predictions.
In confusion_matrices.ipynb, we evaluate these sets of predictions with a confusion matrix to see the true/false positives and negatives (see sklearn.metrics.confusion_matrix for more details). The confusion matrix data are saved to confusion_matrices.
In F1_scores.ipynb, we evaluate each model to determine phenotypic and weighted f1 scores. F1 score measures the models precision and recall performance for each phenotypic class (see sklearn.metrics.f1_score for more details). The f1 score data are saved to F1_scores.
In class_PR_curves.ipynb, we use sklearn.metrics.precision_recall_curve to derive the precision-recall curves for each model, feature type, and dataset combination. These PR curves are created for each label type of the logistic regression model. E.g. each multi-class model has 15 labels (1 for each phenotypic class) and 15 PR curves while each single-class model has 2 labels (positive and negative label for its respective phenotype) and 2 PR curves. The precision recall curves and their data are saved to precision_recall_curves.
In get_LOIO_probabilities.ipynb, we use the optimal hyperparameters from each final logistic regression model to fit and evaluate new models in a Leave One Image Out (LOIO) fashion. These optimal hyper parameters are found with Grid Search Cross Validation in train_model.ipynb and are saved with model data in models/. LOIO evaluation gives an idea of how well the model will perform on cells that are in an image the model has never seen before. If the model performs well in LOIO evaluation, we can be confident it will generalize well to images it has never seen before. Our LOIO is within the family of leave one out cross validation. The LOIO evaluation procedure is as follows:
- Load in entire MitoChecks labeled cell dataset (from labeled_data.csv.gz)
- For each image in MitoCheck labeled cell dataset (as specified by the
Metadata_DNA
field):- Train a logistic regression model with optimal hyperparameters (
C
andl1_ratio
) determined for a particular model in train_model.ipynb on every cell that is not in the specific image. - Predict probabilities on every cell that is in the specific image.
- Train a logistic regression model with optimal hyperparameters (
We save these probabilities to LOIO_probas.
Notes:
- Intermediate
.tsv
data are stored in tidy format, a standardized data structure (see Tidy Data by Hadley Wickham for more details). - SCM stands for "single cell model(s)" and is used as an abbrevation for the binary, sinlge-class models throughout this module.
Use the commands below to evaluate the ML models:
# Make sure you are located in 3.evaluate_model
cd 3.evaluate_model
# Activate phenotypic_profiling conda environment
conda activate phenotypic_profiling
# Evaluate model
bash evaluate_model.sh