Classification involves predicting which class an item belongs to. For example, classifying dogs and cats or defective and okay components.
When we want to segregate objects based on complex features into two or more classes or groups, we use the classification module. This is a black box classifier and the system learns important features on it's own. For example:
For simple images with simple features, use shallow models - C0, C1, C2. For images with complex features and patterns, use deeper models - C3, C4, C5. A shallow model will be faster during training as well as deployment. Best practice is to start with shallow models and move to deeper models only when accuracy is low.
Image sizes corresponding to each of the models:
- C0 - 224*224
- C1 - 240*240
- C2 - 260*260
- C3 - 300*300
- C4 - 380*380
- C5 - 456*456
To ensure higher accuracy, the system carries out basic preprocessing and image augmentation. If required, the user can take control and set the following parameters:
shear_range - 0-1, Default = 0
zoom_range- 0-1, Default = 0
horizontal_flip - True/False, Default = True
vertical_flip - True/False, Default = True
rotation_range - 0-360 degrees, Default = 10
width_shift_range - 0-1, Default = 0
height_shift_range - 0-1, Default = 0
Common examples are provided in Example.py
Takes 2 or more arguments.
* required
classifier(Model_Name*, Number_of_Classes*,
Path_to_Data*, Model_Save_Path="./Models/")
classifier(Path_to_Model*, List_of_Class_Labels=[1,2,3...n])
This function attempts to automatically load as much data as possible. If the function finds separate Train and Validation folders, it will use them as is. If the function finds only a Train folder, the user can define the Validation split to ensure the model does not overfit. It also contains augmentation options.
data_loader(Path_to_Data*, batch_size=16,
rescale_val=1./255, shear_range=0, zoom_range=0,
rotation_range=0, width_shift_range=0.0, height_shift_range=0.0, brightness_range=None,
horizontal_flip=True, vertical_flip=True,
validation_split=0.0)
This function is used to train the model. For most cases, start with 10 epochs. Based on the loss and accuracy curves, decide on whether to increase or decrease number of epochs.
train(Number_of_Epochs*, plot=True)
This method is used to check the real world performance of the model. It uses data that is in the same structure as the Train data but has never been seen by the data. It is essentaially data that has already been labeled by the user but has never been shown to the model. This function provides a loss and accuracy score for this test set. evaluate() will not dispaly individual sample outputs. It will only provide overall accuracy and loss.
evaluate(Path_to_Data*)
This function can be used to test both a folder containing images as well as single images. In the case of folder testing, a csv file is saved with the file name, the predicted class and the confidence score.
predict(Path/to/Folder/*)
predict(Path/to/File.jpg*)
Only the Train folder is compulsary. Validation and Test folders are optional but recommended. Number and names of folders within Train and Validation must be the same.
Root
|---- Train
|---- Class One
|---- img1.jpg
|---- img2.jpg
|---- img3.jpg
|---- ...
|---- Class Two
|---- img1.jpg
|---- img2.jpg
|---- img3.jpg
|---- ...
|---- Class Three
|---- img1.jpg
|---- img2.jpg
|---- img3.jpg
|---- ...
|---- ....
|---- Validation
|---- Class One
|---- img1.jpg
|---- img2.jpg
|---- img3.jpg
|---- ...
|---- Class Two
|---- img1.jpg
|---- img2.jpg
|---- img3.jpg
|---- ...
|---- Class Three
|---- img1.jpg
|---- img2.jpg
|---- img3.jpg
|---- ...
|---- ...
|---- Test
|---- img1.jpg
|---- img2.jpg
|---- img3.jpg
|---- ...