Abstract: this repo includes a pipeline using Catalyst for training UNet with different encoders for the problem of steel defect detection. Moreover, weights for trained models are provided, the result are:
- UNet with ResNet-50 - IoU 0.413
- UNet with EfficientNet-B3 - IoU 0.541
- UNet with EfficientNet-B4 - IoU 0.592
Important: balanced (in the meaning of defect classes) dataset includes 1000 images, where each class includes roughly 250 images. With the whole dataset the metrics might be better.
I have not included EDA here, in general data seems to be clear (having our new dataset balanced).
First, let's identify the main architecture. UNet is a bit better for this problem than Mask R-CNN. It is enough to complete the task, without the need to use more complex instance segmentation like Mask R-CNN. I've conducted a research on several Kaggle kernels and papers from sources like arxiv.com.
So:
- Architecture: UNet
- Encoder: EfficientNet-B3,B4; ResNet-50
- Loss function: DiceBCELoss, TverskyLoss (alpha=0.1, beta=0.9)
- Optimizer: Adam (learning rate for encoder 1e-3, learning rate for decoder 1e-2), as encoder is much deeper
- learning scheduler: ReduceLROnPlateau(factor=0.15, patience=2)
Important to notice that we have quite imbalanced dataset in the meaning of classes defect/no_defect (True Positive and True Negative). Thus, it is important to pick the appropriate loss. I've tried DiceBCELoss and Tversky Loss (alpha=0.1 and beta=0.9). The best results have been obtained with DiceBCELoss in this case.
Both of the encoders were pretrained on ImageNet. However, I do believe there is one more trick that can be fruitful: we can fine-tune encoders on the whole dataset (classification defect/no_defect). This way we can get some better results, but there were no images of class no_defect in the train.csv at all.
Also, in this situation of imbalanced classes there is point in using only images including True Positive. But, as I said above, there were no other pictures at all in the train.csv.
Moreover, we can try some multi-scale training methods to increase image resolution from small to large, but I haven't done that.
I need to add I've been bounded with Cuda memory capacity, so basicaly I could not try bigger encoders for batch size > 8.
Encoder | IoU | DiceBCELoss | Mask Resolution | Epochs |
---|---|---|---|---|
ResNet-50 | 0.4132 | (256, 1600) | ||
EfficientNet-B3 | 0.513 | 0.444 | (256, 768) | 11 |
EfficientNet-B4 | 0.597 | 0.36 | (256, 768) | 37 |
Link to TensorBoard for EfficientNet-B4: tap here
Inferences for validation data:
- EfficientNet-B4
Required libraries are catalyst, segmentation_models and albumentations.
P.S. I've used segmentation_models for fast prototyping.
Installation:
!pip install git+https://github.com/qubvel/segmentation_models.pytorch
!pip install -U git+https://github.com/albu/albumentations
!pip install catalyst
The directory tree should be:
├── Predict_masks.py ├── Train.py ├── config.py ├── data │ ├── results #results │ ├── test.csv │ ├── test_images #download test images here │ ├── train.csv │ ├── train_balanced.csv │ └── train_images #download train images here ├── images │ ├── readme.md ├── utils │ ├── losses.py │ └── utils.py └── weights ├── UnetEfficientNetB4_IoU_059.pth └── UnetResNet50_IoU_043.pth
There is a Predict_masks.py script which can be used to evaluate the model and predict masks for the test dataset (from test.csv). The weights are stored in the ./weights directory.
Pictures with predicted masks and source images will be stored in data/results folder.
Important: masks for ResNet-50 are of (256, 1600)px and masks for EfficientNet-B3,B4 are of (256, 768)px. Free Colab doesn't allow to use more Cuda memory:(
Usage example:
python3 Predict_masks.py -dir /Users/user/Documents/steel_defect_detection/data/ -weights_dir /Users/user/Documents/steel_defect_detection/data/weights
-dir : Pass the full path of a directory containing a folder "train" and "train.csv".
-num_of_images : Number of test image from test.csv for segmentation.
-weights_dir : Pass a weights directory.
Predict.py doesn't save binary masks, it saves pictures with image and predicted mask for better presentation.
The model is supposed to be trained on the dataset from the Kaggle competition. You can choose which encoder to use and a batch size. The default is EfficientNet-B4. Mask size is set as (256, 768) in config.py, you can set your own.
It is necessary to point the directory where the train folder and train.csv are stored.
Usage example:
python3 Train.py -dir /Users/user/Documents/steel_defect_detection/data/ -num_of_workers 4
-dir : Pass the full path of a directory containing a folder "test" and "test.csv".
-encoder : Backbone to use as encoder for UNet, default='efficientnet-b3'.
-batch_size : Batch size for training, default=8.
-num_of_workers : Number of workers for training, default=0.