GitHub - tituslhy/FoodVision-Tensorflow

FoodVision 101

This project is a personal project to develop a computer vision classification model for food images. Thankfully, this means that the FoodVision data is availale as a tensorflow dataset object (To get an overview of all datasets available in tensorflow datasets, go to https://www.tensorflow.org/datasets/overview). There are altogether 101 different classes, some of which look similar - for e.g. Baklava and Apple Pie. We aim to beat the state-of-the-art classification accuracy performance of 77% reported in the DeepFood paper (https://www.researchgate.net/publication/304163308_DeepFood_Deep_Learning-Based_Food_Image_Recognition_for_Computer-Aided_Dietary_Assessment).

Our final model achieved a validation accuracy of 84.4%.

The approach - harnessing the power of transfer learning

Preprocess data by ensuring that all images are scaled to the same dimensions with normalized pixel values.
Develop data loader to feed for model training.
Experiment with different pre-trained image classification models from Tensorflow as feature extractors.
Unfreeze 5-10 convolution layers in the best performing feature extractor model and finetune the model for 5 more epochs.
Model evaluation - ascertain the classes that the model struggles to understand.

To instantiate environment

git clone https://github.com/tituslhy/FoodVision-Tensorflow
pip -r requirements.txt

Using the model to run inferences

The flags of the python3 run are:

--img: This is the path of the image and is a required argument
--pplot: Boolean. Defaults to False. If True, the script plots the image and its predicted classifcation

python3 main.py --img "img.jpeg" --plot "True"

Notes

There is a bug when using mixed precision training with EfficientNet. The current fix is to use Tensorflow version 2.4 instead.
EfficientNet models have an input normalization in-built, so there is no need to scale image pixels between 0-255 during preprocessing.
Model training and annotations are in under Notebooks > FoodVision101 Model Development.ipynb
Though we compare our performance to the DeepFood paper, this is not a like-for-like comparison. Researchers in the DeepFood paper use object detection algorithms to identify multiple objects within the same image. This project is only trained on images with one object and should only be used for single object classification.

Experimental results and findings

We experimented with the following feature extractors before finetuning:

Experiment	Feature extractor	Validation accuracy (on 15% of test data)	Findings
InceptionResNetV2 with Data Augmentation	InceptionResNetV2: The InceptionResNetV2 model applies residual connections to Google's in-house InceptionV3 model. https://arxiv.org/pdf/1602.07261.pdf	55%	The use of data augmentation causes the model to severely underfit instead of helping it generalize better.
InceptionResNetV2	InceptionResNetV2: The InceptionResNetV2 model applies residual connections to Google's in-house InceptionV3 model. https://arxiv.org/pdf/1602.07261.pdf	61%	Performance improved significantly, but it's still not great. This could be due to choice of feature extractor model
EfficientNetB0	EfficientNetB0: EfficientNet boasts very high accuracy despite very few parameters. The backbone of EfficientNet models is similar to MobileNetV2 in that it uses mobile inverted bottleneck convolution. Where it differs from other models is the 'efficiency' when scaling width, depth, resolution or a scaling combination. (https://ai.googleblog.com/2019/05/efficientnet-improving-accuracy-and.html)	75%	Performance improved significantly with much faster training times than prior experiments.
EfficientNetB4	EfficientNetB4: This is a larger version of EfficientNetB0. The thinking behind this experiment is: if some is good, more must be better.	72%	Performance is slightly lower with much slower training times.
EfficientNetV2B0	EfficientNetV2B0: EfficientNetV2 uses the concept of progressive learning. This means that although the image sizes are originally small when the training starts, they increase in size progressively. Scaling and regularization are done dynamically throughout model training. (https://towardsdatascience.com/google-releases-efficientnetv2-a-smaller-faster-and-better-efficientnet-673a77bdd43c)	75.5%	Performance for this experiment is the best. We therefore choose to finetune this experiment further
EfficientNetV2B0 Finetuned	EfficientNetV2B0 with all layers set to trainable.	84.6%	We successfully exceed the target we set ourselves!

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Images		Images
Model		Model
Notebook		Notebook
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.MD		README.MD
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FoodVision 101

The approach - harnessing the power of transfer learning

To instantiate environment

Using the model to run inferences

Notes

Experimental results and findings

About

Releases

Packages

Languages

License

tituslhy/FoodVision-Tensorflow

Folders and files

Latest commit

History

Repository files navigation

FoodVision 101

The approach - harnessing the power of transfer learning

To instantiate environment

Using the model to run inferences

Notes

Experimental results and findings

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages