This repository is for greyscale scene image classification from the in-class Kaggle challenge and NCTU Computer Vision HW.
The dataset is a little different:
- Kaggle challenge: 3859 grey images with 13 categories (train:2819, test:1040)
- CV HW: 1650 grey images with 15 categories (train:1500, test:150)
- VGG16 (imagenet pretrain) + 2*FC layers & Dropout
- ResNet50 (imagenet pretrain) on Keras 2.2.4 Broken BatchNorm Freeze
- Image Size: 224, VGG16 preprocess_input + horizontal_flip (on-the-fly data augmentation)
- Train on spilt training set(loss some of training data)
- Ensemble prediction on Kaggle 0.899 accuracy
- ResNet50 (imagenet pretrain) on TF2.2 classification_models
- CosineAnnealingScheduler
- Image Size: 256, + horizontal_flip + brightness + zoom + rotation (on-the-fly data augmentation)
- Train on whole training set
- Single model prediction on CV HW 0.98 accuracy
Model |
Batch_size |
Accuracy |
Extra |
EfficientNetB0 |
64 |
0.92 |
|
EfficientNetB0 |
64 |
0.906 |
noisy-student pretrain |
EfficientNetB1 |
64 |
0.926 |
|
EfficientNetB1 |
64 |
0.906 |
noisy-student pretrain |
EfficientNetB4 |
16 |
0.92 |
|
EfficientNetB4 |
32 |
0.95 |
|
EfficientNetB4 |
32 |
0.89 |
Freeze 1st Block(Conv+BN+Activation) |
EfficientNetB4 |
32 |
0.9 |
Freeze 1~2 Blocks(Conv+BN+Activation) |
EfficientNetB5 |
16 |
0.926 |
|
EfficientNetB6 |
16 |
0.9 |
Freeze 1st Block(Conv+BN+Activation) |
EfficientNetB6 |
16 |
0.926 |
Freeze 1~2 Blocks(Conv+BN+Activation) |
EfficientNetB6 |
16 |
0.94 |
Freeze 1~3 Blocks(Conv+BN+Activation) |
EfficientNetB6 |
16 |
0.85 |
Freeze 1~4 Blocks(Conv+BN+Activation) |
Freeze first 12 layers (0~47 layers in the implment)
Model |
Batch_size |
Accuracy |
Extra |
ResNet50 |
64 |
0.953 |
Generate New Data |
ResNet50 |
64 |
0.966 |
on-the-fly |
ResNet50 |
64 |
0.946 |
on-the-fly + constrast_pil |
ResNet50 |
64 |
0.98 |
on-the-fly + rotation 5 |
ResNet50 |
64 |
0.96 |
on-the-fly + rotation 7 |
ResNet50 |
64 |
0.953 |
on-the-fly + rotation 10 |
- BiT-M (pre-trained on ImageNet-21k), on-the-fly
Model |
Batch_size |
Accuracy |
Extra |
R50x1 |
64 |
0.966 |
|
R50x3 |
64 |
0.96 |
|
R101x1 |
64 |
0.96 |
|
R101x3 |
64 |
0.953 |
|
- Use ResNet50 with imagenet pretrain and freeze first 12 layers
- Large batch size might be helpful
- Use on-the-fly (random) instead of generate new data on data augmentation
- Use Brightness, Zoom and Rotation instead of Equalize and RandomResizedCropped
- Use TF2 if you want to freeze BN layers
- Sparse labels might help on accuracy (Dense without softmax, class_mode='sparse', loss=SparseCategoricalCrossentropy)