This project aims to develop a product classification system for grocery store shelves using convolutional neural networks. Such a system can be beneficial in practical applications, including customer assistance helping customers, including those with visual impairments, quickly locate specific products on shelves.
- Single Product Classification: Implement a neural network model that recognizes individual products based on images taken from store shelves.
- Optimization through Fine-Tuning: Enhance the performance of pre-trained models like ResNet-18 to adapt them specifically for grocery products.
To improve model accuracy, images go through several preprocessing steps:
- Resizing and Center Cropping: Images are resized to 224x224 pixels with a center crop to maintain aspect ratios.
- Data Augmentation: Transformations such as RandomHorizontalFlip, RandomRotation, and ColorJitter are applied to make the model more robust to variations in lighting and orientation.
The base model, GroceryModelFull, is inspired by the VGG architecture and includes:
- Three convolutional blocks with pooling layers.
- A global average pooling layer to reduce complexity.
This architecture balances representational capacity with computational efficiency, making it suitable for product images in a retail setting.
An ablation study was conducted to understand the importance of various architectural components by creating the following model variations:
- GroceryModelNoBN: Base model without Batch Normalization.
- GroceryModelLessChannels: Model with reduced channels in each block.
- GroceryModelLessConvs: Model with a single convolution per block instead of multiple.
- GroceryModelLessBlocks: Model with two convolutional blocks instead of three.
The results of the ablation study provide insights into the impact of each architectural modification on model performance.
In this part, we fine-tune a pretrained ResNet-18 model on the GroceryStoreDataset to improve classification accuracy for grocery products. This fine-tuning process is divided into two stages:
- Initial Fine-Tuning: Applying the training hyperparameters from the best model in Part 1.
- Hyperparameter Adjustment: Further tuning hyperparameters to achieve a validation accuracy target of 80%-90%.
To further enhance performance, specific adjustments were made to the model configuration:
- Fully Connected Layer with Dropout: • The fully connected layer (fc) was replaced with a Sequential block that includes Dropout layer (0.3) followed by a fully connected layer. The dropout helps reduce overfitting by randomly deactivating neurons during training.
- Batch Size Adjustment: • The batch size was reduced from 64 to 32 to introduce more variability, helping the model avoid overfitting.
After applying the fine-tuning adjustments, the model showed a marked improvement in validation accuracy. The addition of a dropout layer and batch size adjustment helped the model generalize better to unseen data, achieving an accuracy within the target range on the validation set.