This project aims to classify a new melanoma picture as malignant or benign.
(Be careful when interpreting the results, it will never replace a doctor's opinion.)
Don't forget to modify the CONFIG.txt file (default configuration works)
!pip install -U efficientnet
from API import *
CFG = get_config()
- First of all, before extracting features, we need to segment all images from the dataset. Here is the API call:
# CFG : the config dictionnary
# dataframe: a pandas.DataFrame with at least 1 column: "filename" containing the name(including the extension)
# for each image, and another column "target"(OPTIONNAL) with the labels "benign" or "malignant"
# images_path: the path where all inputs images are located
# segmentations_path: the path where all outputs segmentations will be saved
get_segmentations(CFG, dataframe, images_path, segmentations_path)
- Finally, we can compute tabular features:
# CFG : the config dictionnary
# dataframe: a pandas.DataFrame with at least 1 column: "filename" containing the name(including the extension)
# for each image, and another column "target"(OPTIONNAL) with the labels "benign" or "malignant"
# images_path: the path where all inputs images are located
# segmentations_path: the path where all outputs segmentations are located
df = get_tabular_dataframe(CFG, dataframe, images_path, segmentations_path)
The returned DataFrame is normalized (MinMax scaler)
Features are calculated with region properties. I also tried to reproduce the famous ABCD rule (Assymetry, Border irregularity & Colors Descriptors). (In the following tab, A = Area, P = Perimeter, d = minor axis length, D = major axis length. Others variables are explained in references)
Morphological formulas | Assymetry ref | Border irregularity ref | Colors ref |
---|---|---|---|
extent | F4, F5, F6 | ||
solidity | F10, F11, F12 | ||
F13, F14, F15 | |||
- Writing a TFRecord
# CFG : the config dictionnary
# df: a pandas.DataFrame containing columns in this order: "filename" containing the name(including the extension)
# for each image, "target"(OPTIONNAL) with the labels "benign" or "malignant", and all other columns are the features.
# In our exemple, we will have 21 columns ("filename","target",+19 features)
# images_path: the path where all inputs images are located
# output_path: path where TFRecord files will be stored)
# nb: (OPTIONNAL, default=1) the number of TFRecord files to create
# preprocess_function: (OPTIONNAL) a function to preprocess images. None = no preprocessing. Default: whit-balancing + hair removal.
write_tfrecord(CFG, df, images_path, output_path, preprocess_function)
- Reading a TFRecord
# CFG : the config dictionnary
# tfrecord_train(test): the path containing tfrecord files built with training(test) data
# labeled: if images are labeled (True if training, False if testing)
# augment (OPTIONNAL): if Images should be augmented (only if labeled=True for training)
dataset_train = read_tfrecord(CFG, tfrecord_train, labeled=True)
dataset_test = read_tfrecord(CFG, tfrecord_test, labeled=False)
# dataset_train(test) (output): a dataset to give to our model.
Here we have pre-trained weights for different configurations:
- B0 (net_count=1, fine_tune=False, preprocess_function=None): Download
- B0 (net_count=1, fine_tune=True, preprocess_function=None): Download
- B0-2 (net_count=3, fine_tune=True, preprocess_function=None): Download
- B0-4 (net_count=5, fine_tune=False, preprocess_function=None): Download
# CFG : the config dictionnary
# fine_tune(OPTIONNAL, default:False): True:Fine-tuning, False:Transfer-learning
# model_weights(OPTIONNAL): the path containing valid weights for the model
model = get_model(CFG, fine_tune=False, model_weights="???")
# model(output): a tensorflow model
- Fit
callbacks = [tf.keras.callbacks.ReduceLROnPlateau(),]
#tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=5),
#tf.keras.callbacks.ModelCheckpoint("models/best.h5", save_best_only=True, monitor='val_auc', mode='max', save_weights_only=True)]
history = model.fit(
dataset_train,
steps_per_epoch = len(df)/(CFG['batch_size'] * CFG['REPLICAS']),
epochs = CFG['epochs'],
callbacks = callbacks,
)
- Predict
preds = model.predict(
dataset_test,
steps = len(df)/(CFG['batch_size'] * CFG['REPLICAS']),
)
- Easier way to implement his own preprocessing function (when writing a TFRecord file)
- TPU support (It needs to be tested, not sure at all)
- Improve the segmentation part to improve features' quality
CC 4.0 Attribution-NonCommercial International
The software is for educational and academic research purposes only.
Ce(tte) œuvre est mise à disposition selon les termes de la Licence Creative Commons Attribution - Pas d’Utilisation Commerciale 4.0 International.