Skip to content

Convolutional Neural Networks and Support Vector Machine based multi label classification for restaurants.

Notifications You must be signed in to change notification settings

amwat/Yelp-Photo-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Yelp Restaurant Photo Classification

Description:

In this project, we build a model that automatically tags restaurants with multiple labels using a dataset of user-submitted photos. Currently, restaurant labels are manually selected by Yelp users when they submit a review. Selecting the labels is optional, leaving some restaurants un- or only partially-categorized. In an age of food selfies and photo-centric social storytelling, it may be no surprise to hear that Yelp's users upload an enormous amount of photos every day alongside their written reviews.

You must have ..

  • Numpy - For handling the datasets (pip install numpy)
  • Pandas - For handling the datasets (pip install pandas)
  • Scikit Learn - To use classification algorithms like SVM (pip install -U scikit-learn)
  • Python

The following dependencies are only required if you wish to extract image and business features from scratch. But we have already done that for you, you just need to download them from the links provided below in the table. Make sure that you put these files in "features" directory.

  • H5Py - To store the features extracted from CNN (pip install h5py)
  • Caffe - To extract features from the images (Refer to the link)

Folder description:

  • code/ - contains programs to extract features and perform the final classification.
  • data/ - contains training and testing images + metadata from Yelp dataset (We have already extracted and stored the features for east of project execution).
  • features/ - contains the extracted features from images and restaurants (For ease of project execution).
  • models/ - contains trained SVM model which can be used for future predictions without retraining (Will be generated automatically when classify.py is run for the first time; for ease of project execution, we have included this model as well).

Dataset:

Again if you choose to extract image and business features from scratch, you will need this dataset. It is available here. Dataset description is also available. Download and extract the files/folders in the "data" directory.

For ease of project execution, we have already extracted the features and stored in the following files:

Filename Size Description Command that was used for generation
train_features.h5 3.59 GB Format: [PhotoId, ImageFeatures] This file contains ImageNet features of training dataset python extract_image_features_train.py
test_features.h5 18.2 GB Format: [PhotoId, ImageFeatures] This file contains ImageNet features of test dataset python extract_image_features_test.py
train_business_features.csv 91.7 MB Format: [BusinessId, BusinessFeatures, ClassLabels] This file contains features extracted for businesses in training dataset. These features are extracted using train_features.h5. python extract_business_features_train.py
test_business_features.csv 460 MB Format: [BusinessId, BusinessFeatures] This file contains features extracted for businesses in test dataset. These features are extracted using test_features.h5. python extract_business_features_test.py

To perform final classification:

$ cd code
$ python classify.py

About

Convolutional Neural Networks and Support Vector Machine based multi label classification for restaurants.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages