Skip to content

tituslhy/FoodVision-Tensorflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FoodVision 101


This project is a personal project to develop a computer vision classification model for food images. Thankfully, this means that the FoodVision data is availale as a tensorflow dataset object (To get an overview of all datasets available in tensorflow datasets, go to https://www.tensorflow.org/datasets/overview). There are altogether 101 different classes, some of which look similar - for e.g. Baklava and Apple Pie. We aim to beat the state-of-the-art classification accuracy performance of 77% reported in the DeepFood paper (https://www.researchgate.net/publication/304163308_DeepFood_Deep_Learning-Based_Food_Image_Recognition_for_Computer-Aided_Dietary_Assessment).

Our final model achieved a validation accuracy of 84.4%.

The approach - harnessing the power of transfer learning

  1. Preprocess data by ensuring that all images are scaled to the same dimensions with normalized pixel values.
  2. Develop data loader to feed for model training.
  3. Experiment with different pre-trained image classification models from Tensorflow as feature extractors.
  4. Unfreeze 5-10 convolution layers in the best performing feature extractor model and finetune the model for 5 more epochs.
  5. Model evaluation - ascertain the classes that the model struggles to understand.

To instantiate environment

git clone https://github.com/tituslhy/FoodVision-Tensorflow
pip -r requirements.txt

Using the model to run inferences

The flags of the python3 run are:

  1. --img: This is the path of the image and is a required argument
  2. --pplot: Boolean. Defaults to False. If True, the script plots the image and its predicted classifcation
python3 main.py --img "img.jpeg" --plot "True"

Notes

  1. There is a bug when using mixed precision training with EfficientNet. The current fix is to use Tensorflow version 2.4 instead.
  2. EfficientNet models have an input normalization in-built, so there is no need to scale image pixels between 0-255 during preprocessing.
  3. Model training and annotations are in under Notebooks > FoodVision101 Model Development.ipynb
  4. Though we compare our performance to the DeepFood paper, this is not a like-for-like comparison. Researchers in the DeepFood paper use object detection algorithms to identify multiple objects within the same image. This project is only trained on images with one object and should only be used for single object classification.

Experimental results and findings

We experimented with the following feature extractors before finetuning:

Experiment Feature extractor Validation accuracy (on 15% of test data) Findings
InceptionResNetV2 with Data Augmentation InceptionResNetV2: The InceptionResNetV2 model applies residual connections to Google's in-house InceptionV3 model. https://arxiv.org/pdf/1602.07261.pdf 55% The use of data augmentation causes the model to severely underfit instead of helping it generalize better.
InceptionResNetV2 InceptionResNetV2: The InceptionResNetV2 model applies residual connections to Google's in-house InceptionV3 model. https://arxiv.org/pdf/1602.07261.pdf 61% Performance improved significantly, but it's still not great. This could be due to choice of feature extractor model
EfficientNetB0 EfficientNetB0: EfficientNet boasts very high accuracy despite very few parameters. The backbone of EfficientNet models is similar to MobileNetV2 in that it uses mobile inverted bottleneck convolution. Where it differs from other models is the 'efficiency' when scaling width, depth, resolution or a scaling combination. (https://ai.googleblog.com/2019/05/efficientnet-improving-accuracy-and.html) 75% Performance improved significantly with much faster training times than prior experiments.
EfficientNetB4 EfficientNetB4: This is a larger version of EfficientNetB0. The thinking behind this experiment is: if some is good, more must be better. 72% Performance is slightly lower with much slower training times.
EfficientNetV2B0 EfficientNetV2B0: EfficientNetV2 uses the concept of progressive learning. This means that although the image sizes are originally small when the training starts, they increase in size progressively. Scaling and regularization are done dynamically throughout model training. (https://towardsdatascience.com/google-releases-efficientnetv2-a-smaller-faster-and-better-efficientnet-673a77bdd43c) 75.5% Performance for this experiment is the best. We therefore choose to finetune this experiment further
EfficientNetV2B0 Finetuned EfficientNetV2B0 with all layers set to trainable. 84.6% We successfully exceed the target we set ourselves!

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published