-
Notifications
You must be signed in to change notification settings - Fork 0
/
Machine Learning for Computer Vision
1722 lines (1695 loc) · 84 KB
/
Machine Learning for Computer Vision
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Machine Learning for Computer Vision
Meet Your Instructors
Welcome to Computer Vision for Engineering and Science. We’re excited you’re here to gain skills in this rapidly expanding field. We are all passionate about teaching and are excited to bring you a fun, interactive, and inclusive learning experience.
You’ll see and hear some of us in the videos, but everyone below developed examples, quizzes, and content to help you learn about computer vision. We had a lot of fun building this course and hope you’ll find the material useful in your career. If you get stuck, just post your question on the forums. Good luck!
Amanda Wang is an Online Content Developer at MathWorks. She earned a B.S. in Mathematics with Computer Science and a B.S. in Business Analytics from MIT in 2020. In addition to developing MATLAB-based courses with the Online Course Development team, she is currently pursuing an M.S. in Computer Science from the University of Illinois Urbana-Champaign.
Isaac Bruss is a Senior Online Content Developer at MathWorks. He earned his Ph.D. from the University of Massachusetts Amherst in 2015, performing research in a number of projects related to biophysics. One such project involved using confocal microscope videos to track the migration of nanoparticles tethered to a surface using DNA. Most recently, he taught undergraduate physics at Hampshire College. Now at MathWorks, he happily supports and designs MATLAB-based online courses.
Matt Rich is a Senior Online Content Developer at MathWorks. He holds a Ph.D. and M.S. in Electrical Engineering from Iowa State University. His Ph.D. research developed new methods to design control systems over networks with communication interrupted by random processes. His MS research focused on system identification and robust control methods for small UAVs with uncertain physical parameters. Prior to his current role, he worked supporting MathWorks Model-Based Design tools in industry and academia.
Megan Thompson is a Senior Online Content Developer at MathWorks. She earned her Ph.D. in bioengineering from the University of California at Berkeley and San Francisco in 2018. As a medical imaging research scientist, she used image processing to study concussions in football, dementia, schizophrenia and more. Now at MathWorks, she designs and supports MATLAB-based online courses to help others analyze data and chase their own answers.
Brandon Armstrong is a Senior Team Lead in Online Course Development at MathWorks. He earned a Ph.D. in physics from the University of California at Santa Barbara in 2010. His research in magnetic resonance has been cited over 1000 times, and he is a co-inventor on 4 patents. He is excited to create courses on image and video processing as he owns a green screen just for fun!
Course files and MATLAB
This course is part of the Computer Vision for Engineering and Science specialization and assumes you have prior experience with image processing.
To gain access to MATLAB, visit the Introduction to Computer Vision course, which includes a license to MATLAB that is valid for the whole specialization.
Additionally, the required files for this course are accessible from Introduction to Computer Vision. Visit course 1 of this specialization to access the required course files.
Machine-learning is essential for
many computer vision applications
from automated driving,
to fresh food distribution, to disease diagnosis.
This video introduces a
machine-learning workflow that you will follow
throughout this course to train models that
perform image classification and object detection.
Think of this workflow as a roadmap that you can
refer to as you perform
machine-learning on your datasets.
The final goal is to make predictions on
new images that the model has never seen.
Ultimately, given a new unlabeled image,
you'll be able to extract features that
the trained model can interpret, to assign a label.
You create a model by training it
with data called predictor features.
In computer vision,
these predictor features are
extracted from a set of images.
Before you can extract features,
you must prepare your images.
Classification and object detection models
require two things to train,
a collection of image features
and a label for each of these images.
Creating labels for your dataset
is the first step of preparing your data.
They will serve as a ground truth.
The next preparation task is to split
your labeled data into training and test sets.
The training set will be used to train your model.
The test set is put aside until later.
With any machine-learning application,
perform this step before proceeding with the workflow.
This is to avoid inadvertently biasing your test set,
giving you misleading or incorrect results.
The last step of data preparation is to
perform image processing as necessary.
For example, spatial filtering
or contrast adjustment can
improve results for the next part of
the workflow, extracting features.
You've already learned some ways to
extract features in Course 1 of the specialization.
You'll learn new ways here.
Once you have extracted features,
you can use them in conjunction with the image or
object labels to train your model.
You will try a variety of different model types and learn
how to improve your results by tuning model parameters.
The last part of the workflow is
to evaluate your trained model.
In this step, you determine
which trained model works best for
your application using evaluation metrics
such as accuracy and confusion matrices.
Remember the test set you set aside earlier,
now's the time to use it.
Apply your model to the test images
and evaluate the results.
This gives you an estimate of how well the model
will perform on new unlabeled images.
It's important to remember that
you're not meant to go straight through
this workflow only once per
application because
machine-learning is an iterative process,
you'll often need to update your strategy and
retry various steps for the best results.
It's common to try a variety of approaches,
including choosing different models to train,
tuning model parameters,
and selecting different types of features.
In this course, we focus on traditional machine-learning.
However, deep learning follows a very similar workflow.
The key difference is that with deep learning,
the extract features step is
performed by the model during training,
taking the prepared images as inputs directly.
For this reason, machine-learning is well-suited to
applications where your images
have discernible features in common.
In this course, you'll learn to use a variety of
tools to develop the best models for your datasets.
As you perform each of these steps,
think about them in the context
of the machine-learning workflow.
Glossary of Common Terms
As you progress through the course, you’ll be introduced to many new terms and concepts. Use this reference to help you define them and where they fit into the Machine Learning Workflow.
### The Machine Learning Workflow
Glossary of Terms
machine learning model: Generally, an algorithm that predicts a response using a set of predictor features. A model is trained on existing data and used to make predictions about new data.
deep learning model: A type of machine learning model that employs multiple layers of neural networks. These can have higher potential accuracy for complex problems but at the cost of computational resources and time compared to traditional machine learning.
image classification model: A model that predicts an image's label. For example, image classification models could use medical images to predict a disease diagnosis, "healthy" or "cancerous".
object detection model: A model that locates and labels objects within an image, usually with a bounding box. For example, locating signs in dashcam footage.
training data: Data used to train a model.
validation data: A portion of the training data used during model training to properly evaluate performance and tune model hyperparameters.
test data: Data used to simulate new observations. Test data is split from the full dataset early in the machine learning process and is not used during model training. It is only used to evaluate the final model.
predictor features: The variables used by the model to make predictions. These can be hand-selected, like an image's average intensity, or generated automatically, like the frequency of similar SURF features.
model parameters: Values that determine how a model is trained. Some model parameters are learned by the machine learning algorithm during training. Other parameters are set by the user prior to training.
model hyperparameters: Parameters that the user specifically sets before training. Hyperparameters are often determined through an optimization process of training multiple models.
bag of features: Clusters of similar image feature descriptors that are used to create predictor features for image classification. Also known as "bag of visual words".
ground truth: The labeled data used to train or evaluate a detection model. This consists of labels and bounding box coordinates of the objects being detected in each image.
model evaluation: The process of quantitatively assessing a model's performance. Often done by comparing multiple types of trained models with the goal of iterative improvement.
Previously, you learned that
classification models predicted
the discrete value or values of
a response using one or more predictors.
You can think of this as categorizing
some unknown items into a discrete set of classes.
Classification is widely applicable.
It is used for email filtering, speech
and handwriting recognition,
medical diagnosis, and much more.
So, how does classification work?
In this video, you'll learn the basics of
two popular image classification models.
Let's begin with the K-Nearest Neighbors model,
also known as KNN.
KNN, like all classification models,
can work with any number of
predictor features and response classes,
but for the following examples,
we'll just have two of each.
This classification model assumes that
similar things exist in close proximity,
or in other words, are near to each other.
KNN predicts a response by looking at
a given number K of neighboring observations.
To better understand how KNN works,
consider an example in which you set K equal to 3.
For a new data point,
the classification will take into
account the point's three nearest neighbors.
Here, notice two of the data points are labeled as
class 2 and one is labeled as class 1.
Since the majority of the neighbors belong into class 2,
the new data point is classified as class 2.
This is also known as a majority voting mechanism.
A KNN model is different from
most other classification models in that making
a new prediction requires referencing it to
all existing data rather
than running it through a mathematical equation.
Therefore, KNN models can be
computationally expensive for large datasets.
Also, you need to be mindful of
the right value for K. A value
of one might lead to
predictions that are less robust to noise or outliers.
Larger values of K will produce
more stable predictions due to majority voting,
but eventually, a very large value of K will make
less accurate predictions as it
becomes difficult to capture complex behavior.
You'll need to adjust K to find
the most appropriate value for a particular dataset.
Depending on the value of K,
it is common to use the terms fine,
medium, and coarse when describing KNN classifications.
In general, the KNN classification model
is among the easiest to understand and interpret.
KNN's main disadvantage is that it becomes
significantly slower as the volume of data increases.
This can make it an impractical choice
in environments where predictions must be made
rapidly or where there are
tight memory constraints since
all the data must be available when making a prediction.
The next type of model covered in this video is
called Support Vector Machine or SVM.
SVM models are also
a popular choice for
classification because of their flexibility.
In the binary classification problem,
suppose you want to separate
the orange squares representing
Class 1 from the blue circles representing Class 2.
Any line shown on this plot is a viable option.
They would all perfectly separate
the orange squares from the blue circles,
but is there an optimal line or decision boundary?
In order to best capture the behavior of the data,
the goal is to find the line that will most accurately
classify new observations into one of the two classes.
You would probably want a line that is evenly spaced
between these two classes
and provides a buffer for each class.
That's exactly what SVM does.
The algorithm tries to find a line
that's right in the middle of your two classes,
maximizing the distance between
the two called the margin.
To find the line that maximizes the margin,
the SVM algorithm first finds
the points closest to the line from both classes.
These points are called support vectors.
Thus, the SVM algorithm
tries to find a decision boundary in
such a way that the separation between
the two classes is as wide as possible.
In this two-dimensional case,
that decision boundary corresponds to a line,
but this boundary is generally known as a hyperplane,
which is applicable in higher dimensions.
In a short, a support vector machine
is a classifier that finds
an optimal hyperplane that
maximizes the margin between two classes.
In real examples, it's usually impossible to
find a hyperplane that
perfectly separates the two classes.
A point inside the margin but correctly
classified is called margin error.
A point on the wrong side of
the separating boundary is a classification error.
The total error is the sum of
the margin error and the classification error.
What happens when the data cannot be separated by
a straight line or hyperplane, as shown here?
In these situations, you can use
a kernel method which projects
the data into an extra dimension.
Instead of a decision line,
there is now a decision surface
that separates the points.
This concept can be generalized to higher dimensions.
With the Kernel Method,
you map data into
a higher-dimensional space where
the data is linearly separable.
The mathematical function used for
the transformation is known as
the kernel function and there are
different types of kernel functions.
Linear is the most common,
but other options include polynomial,
radial basis function, and particularly Gaussian.
Each of these functions has
its own characteristics and its own expression.
The Kernel Method is a real strength of
SVM as it enables you to
handle non-linear data efficiently.
However, the kernel function must be properly
chosen to avoid increasing the training time drastically.
There are many more classification models
available for you to choose from.
Each model has its own advantages and
disadvantages in terms of accuracy,
speed, and memory requirements.
The best way to know how they perform on
a particular dataset is to try them out.
Next, you'll learn how to quickly train
models in MATLAB and compare their results.
To train a classification model,
you first need to turn
your images into a collection of numbers.
In this video, you will prepare
your data and extract features so
that you are ready to train and later
evaluate a classification model.
First, you'll learn how to
adjust the labels of an Image Datastore,
and split the data into training and testing datasets.
The dataset you'll be working with does not
require extensive image processing.
However, for some datasets,
this step will include
additional processing to
enable effective feature extraction,
such as spatial filtering to remove noise.
Next, you'll create
features to use for machine learning.
Here, you'll use simple features that are
easy to interpret, like standard deviation.
The end goal is to turn the images of concrete,
some with cracks and some without,
into a table where
each image has a label and features describing it.
Let's begin by assigning labels.
Fortunately, with this concrete dataset,
your images have already been sorted into
two folders with names describing the images.
In this case, "Positive" for images with
cracks and "Negative" for images without cracks.
When you create a datastore
from this collection of images,
you can use folder names as labels.
The current labels are "Negative" and "Positive".
To make these labels more descriptive,
use the renamecats function to change
the existing labels to "No Crack" and "Crack".
Next, split the datastore into training and test sets.
The splitEachLabel command
takes a subset of each label,
and assigns them to a new datastore.
The fraction you input
determines the size of each subset.
For a dataset of this size,
it's common to use 80 percent of
your images for training and
the remaining 20 percent as the test
set to later evaluate your final model.
To avoid biasing which images within
a label go into the training and test sets,
use the "randomized" option.
For the rest of this video,
you'll only be working with the training set of images.
Now that your dataset is prepared,
the next step is to extract features.
Looking at several of the images,
notice how the cracks are quite
dark compared to the lighter concrete.
Therefore, you might expect images without cracks to have
higher average intensities and
fewer intensity differences compare
it to images with cracks.
Let's use these observations to
create features based on intensity.
Read the first image from the datastore.
Recall that before most computations,
you need to convert images to datatype double.
Then convert to grayscale to
isolate just the intensities of the pixels.
Based on the intensity differences
we observed between the labels,
it makes sense to use the mean and
standard deviation of each image's
intensity as features.
To link the features back to their original images,
extract the file name using the "fileparts" function.
Because you want to repeat this process many times,
initialize the variables and
use a while loop to extract
features for every image in the training set.
Finally, create a table with all your outputs.
Include the image labels, names, and features.
You may want to perform the exact same steps
every time you extract features
from a new set of concrete images,
so it's a good idea to save this code as a function.
Save the function as a code file in
the same folder as the rest of the code from this video.
Give it a descriptive name,
such as extractConcreteFeatures.
You can now call this function to extract these features
anytime you need to.
Make sure to save the table your function created.
You're going to use it later to
train your image classification model.
Also, save your training and test datastores.
Let's test out your new function
by calling it on the training set.
Perfect!
Now you have a table with
two features and a label for each image.
Now for the important question,
are you ready for classification?
It's a good idea to investigate
your features before you train your model.
In this case, you might want to
know if your chosen features are descriptive
enough to reasonably differentiate
concrete images with cracks from those without.
To explore whether the features you
extracted will help distinguish between these labels,
use gscatter, to plot both mean intensity
and standard deviation of
intensity, color-coded by label.
Voila! You can see a clear distinction between
the orange "Crack" class and the blue "No Crack" class.
It seems likely that you'll be able to find
a decision boundary that can
mostly separate these classes.
Now that you've prepared
your data and extracted features,
you've got everything you need to
start training machine learning models.
Preparing the Concrete Images for Classification
You now know how to prepare a collection of images and extract basic features. So, how can you extract and visualize features from the images of concrete?
Navigate to the Module 1 folder and open the file preparingYourImagesForClassification.mlx. Work through the live script to extract features and save them for future use training and evaluating a classification model.
Previously, you extracted some intensity-based features from the concrete
image dataset.
### Glossary of Terms
machine learning model: Generally, an algorithm that predicts a response using a set of predictor features. A model is trained on existing data and used to make predictions about new data.
deep learning model: A type of machine learning model that employs multiple layers of neural networks. These can have higher potential accuracy for complex problems but at the cost of computational resources and time compared to traditional machine learning.
image classification model: A model that predicts an image's label. For example, image classification models could use medical images to predict a disease diagnosis, "healthy" or "cancerous".
object detection model: A model that locates and labels objects within an image, usually with a bounding box. For example, locating signs in dashcam footage.
training data: Data used to train a model.
validation data: A portion of the training data used during model training to properly evaluate performance and tune model hyperparameters.
test data: Data used to simulate new observations. Test data is split from the full dataset early in the machine learning process and is not used during model training. It is only used to evaluate the final model.
predictor features: The variables used by the model to make predictions. These can be hand-selected, like an image's average intensity, or generated automatically, like the frequency of similar SURF features.
model parameters: Values that determine how a model is trained. Some model parameters are learned by the machine learning algorithm during training. Other parameters are set by the user prior to training.
model hyperparameters: Parameters that the user specifically sets before training. Hyperparameters are often determined through an optimization process of training multiple models.
bag of features: Clusters of similar image feature descriptors that are used to create predictor features for image classification. Also known as "bag of visual words".
ground truth: The labeled data used to train or evaluate a detection model. This consists of labels and bounding box coordinates of the objects being detected in each image.
model evaluation: The process of quantitatively assessing a model's performance. Often done by comparing multiple types of trained models with the goal of iterative improvement.
Previously, you learned that
classification models predicted
the discrete value or values of
a response using one or more predictors.
You can think of this as categorizing
some unknown items into a discrete set of classes.
Classification is widely applicable.
It is used for email filtering, speech
and handwriting recognition,
medical diagnosis, and much more.
So, how does classification work?
In this video, you'll learn the basics of
two popular image classification models.
Let's begin with the K-Nearest Neighbors model,
also known as KNN.
KNN, like all classification models,
can work with any number of
predictor features and response classes,
but for the following examples,
we'll just have two of each.
This classification model assumes that
similar things exist in close proximity,
or in other words, are near to each other.
KNN predicts a response by looking at
a given number K of neighboring observations.
To better understand how KNN works,
consider an example in which you set K equal to 3.
For a new data point,
the classification will take into
account the point's three nearest neighbors.
Here, notice two of the data points are labeled as
class 2 and one is labeled as class 1.
Since the majority of the neighbors belong into class 2,
the new data point is classified as class 2.
This is also known as a majority voting mechanism.
A KNN model is different from
most other classification models in that making
a new prediction requires referencing it to
all existing data rather
than running it through a mathematical equation.
Therefore, KNN models can be
computationally expensive for large datasets.
Also, you need to be mindful of
the right value for K. A value
of one might lead to
predictions that are less robust to noise or outliers.
Larger values of K will produce
more stable predictions due to majority voting,
but eventually, a very large value of K will make
less accurate predictions as it
becomes difficult to capture complex behavior.
You'll need to adjust K to find
the most appropriate value for a particular dataset.
Depending on the value of K,
it is common to use the terms fine,
medium, and coarse when describing KNN classifications.
In general, the KNN classification model
is among the easiest to understand and interpret.
KNN's main disadvantage is that it becomes
significantly slower as the volume of data increases.
This can make it an impractical choice
in environments where predictions must be made
rapidly or where there are
tight memory constraints since
all the data must be available when making a prediction.
The next type of model covered in this video is
called Support Vector Machine or SVM.
SVM models are also
a popular choice for
classification because of their flexibility.
In the binary classification problem,
suppose you want to separate
the orange squares representing
Class 1 from the blue circles representing Class 2.
Any line shown on this plot is a viable option.
They would all perfectly separate
the orange squares from the blue circles,
but is there an optimal line or decision boundary?
In order to best capture the behavior of the data,
the goal is to find the line that will most accurately
classify new observations into one of the two classes.
You would probably want a line that is evenly spaced
between these two classes
and provides a buffer for each class.
That's exactly what SVM does.
The algorithm tries to find a line
that's right in the middle of your two classes,
maximizing the distance between
the two called the margin.
To find the line that maximizes the margin,
the SVM algorithm first finds
the points closest to the line from both classes.
These points are called support vectors.
Thus, the SVM algorithm
tries to find a decision boundary in
such a way that the separation between
the two classes is as wide as possible.
In this two-dimensional case,
that decision boundary corresponds to a line,
but this boundary is generally known as a hyperplane,
which is applicable in higher dimensions.
In a short, a support vector machine
is a classifier that finds
an optimal hyperplane that
maximizes the margin between two classes.
In real examples, it's usually impossible to
find a hyperplane that
perfectly separates the two classes.
A point inside the margin but correctly
classified is called margin error.
A point on the wrong side of
the separating boundary is a classification error.
The total error is the sum of
the margin error and the classification error.
What happens when the data cannot be separated by
a straight line or hyperplane, as shown here?
In these situations, you can use
a kernel method which projects
the data into an extra dimension.
Instead of a decision line,
there is now a decision surface
that separates the points.
This concept can be generalized to higher dimensions.
With the Kernel Method,
you map data into
a higher-dimensional space where
the data is linearly separable.
The mathematical function used for
the transformation is known as
the kernel function and there are
different types of kernel functions.
Linear is the most common,
but other options include polynomial,
radial basis function, and particularly Gaussian.
Each of these functions has
its own characteristics and its own expression.
The Kernel Method is a real strength of
SVM as it enables you to
handle non-linear data efficiently.
However, the kernel function must be properly
chosen to avoid increasing the training time drastically.
There are many more classification models
available for you to choose from.
Each model has its own advantages and
disadvantages in terms of accuracy,
speed, and memory requirements.
The best way to know how they perform on
a particular dataset is to try them out.
Next, you'll learn how to quickly train
models in MATLAB and compare their results.
To train a classification model,
you first need to turn
your images into a collection of numbers.
In this video, you will prepare
your data and extract features so
that you are ready to train and later
evaluate a classification model.
First, you'll learn how to
adjust the labels of an Image Datastore,
and split the data into training and testing datasets.
The dataset you'll be working with does not
require extensive image processing.
However, for some datasets,
this step will include
additional processing to
enable effective feature extraction,
such as spatial filtering to remove noise.
Next, you'll create
features to use for machine learning.
Here, you'll use simple features that are
easy to interpret, like standard deviation.
The end goal is to turn the images of concrete,
some with cracks and some without,
into a table where
each image has a label and features describing it.
Let's begin by assigning labels.
Fortunately, with this concrete dataset,
your images have already been sorted into
two folders with names describing the images.
In this case, "Positive" for images with
cracks and "Negative" for images without cracks.
When you create a datastore
from this collection of images,
you can use folder names as labels.
The current labels are "Negative" and "Positive".
To make these labels more descriptive,
use the renamecats function to change
the existing labels to "No Crack" and "Crack".
Next, split the datastore into training and test sets.
The splitEachLabel command
takes a subset of each label,
and assigns them to a new datastore.
The fraction you input
determines the size of each subset.
For a dataset of this size,
it's common to use 80 percent of
your images for training and
the remaining 20 percent as the test
set to later evaluate your final model.
To avoid biasing which images within
a label go into the training and test sets,
use the "randomized" option.
For the rest of this video,
you'll only be working with the training set of images.
Now that your dataset is prepared,
the next step is to extract features.
Looking at several of the images,
notice how the cracks are quite
dark compared to the lighter concrete.
Therefore, you might expect images without cracks to have
higher average intensities and
fewer intensity differences compare
it to images with cracks.
Let's use these observations to
create features based on intensity.
Read the first image from the datastore.
Recall that before most computations,
you need to convert images to datatype double.
Then convert to grayscale to
isolate just the intensities of the pixels.
Based on the intensity differences
we observed between the labels,
it makes sense to use the mean and
standard deviation of each image's
intensity as features.
To link the features back to their original images,
extract the file name using the "fileparts" function.
Because you want to repeat this process many times,
initialize the variables and
use a while loop to extract
features for every image in the training set.
Finally, create a table with all your outputs.
Include the image labels, names, and features.
You may want to perform the exact same steps
every time you extract features
from a new set of concrete images,
so it's a good idea to save this code as a function.
Save the function as a code file in
the same folder as the rest of the code from this video.
Give it a descriptive name,
such as extractConcreteFeatures.
You can now call this function to extract these features
anytime you need to.
Make sure to save the table your function created.
You're going to use it later to
train your image classification model.
Also, save your training and test datastores.
Let's test out your new function
by calling it on the training set.
Perfect!
Now you have a table with
two features and a label for each image.
Now for the important question,
are you ready for classification?
It's a good idea to investigate
your features before you train your model.
In this case, you might want to
know if your chosen features are descriptive
enough to reasonably differentiate
concrete images with cracks from those without.
To explore whether the features you
extracted will help distinguish between these labels,
use gscatter, to plot both mean intensity
and standard deviation of
intensity, color-coded by label.
Voila! You can see a clear distinction between
the orange "Crack" class and the blue "No Crack" class.
It seems likely that you'll be able to find
a decision boundary that can
mostly separate these classes.
Now that you've prepared
your data and extracted features,
you've got everything you need to
start training machine learning models.
Preparing the Concrete Images for Classification
You now know how to prepare a collection of images and extract basic features. So, how can you extract and visualize features from the images of concrete?
Navigate to the Module 1 folder and open the file preparingYourImagesForClassification.mlx. Work through the live script to extract features and save them for future use training and evaluating a classification model.
Previously, you extracted some intensity-based features from the concrete
image dataset.
### Automated Hyperparameter Optimization in MATLAB
For the remainder of this reading, you'll continue training a model to classify images with cracks in the concrete image dataset. Specifically, you'll optimize the value of K that results in the highest-accuracy KNN classifier.
1. Choosing Hyperparameters to Optimize
Load up a previously saved session in the Classification Learner App, or start a new one using the concrete data training table. Then select the Optimizable KNN Model from the dropdown menu.
In the model's Summary window, you'll see multiple hyperparameters available to optimize. Hover your cursor over each one to see a quick summary of how it affects the model.
Leave just the Number of neighbors hyperparameter checked, as this is the only one we want to optimize for now. Select "Read more about KNN model options" to learn more about the other hyperparameters.
2. Choosing the Optimization Method
You can also choose the automation method for finding the optimal hyperparameter values from this window.
The Optimizer and Acquisition function options specify the algorithm used to determine which hyperparameter values to adjust. For most cases, you should stick with the default choice of "Bayesian optimization," which allows the optimizer to "intelligently" choose the hyperparameter values of each successive model based on the results of the previous ones. You can learn more about these options by selecting "Read more about Optimizer options".
The number of Iterations determines the total number of models that will be trained. If you're only optimizing a few hyperparameters you can leave this number at its default value.
Additionally, if you have a large dataset, you can set a Training time limit to specify how long you want the optimizer to run. Overall, you'll need to balance decreasing the training time limit with the possibility of not finding better hyperparameter values.
3. Training the Optimized Model
You're now ready to train your model using your customized optimization parameters. During training, you'll be able to watch the optimization progress in the "Minimum Classification Error Plot". Each iteration (along the x-axis) is a trained model, and the performance of the best model so far is measured using the minimum classification error (along the y-axis).
The light blue points are used by the Bayesian optimization algorithm to decide successive model hyperparameter values, while the dark blue points represent the minimum classification error seen by any trained model.
4. Final Results
Once complete, the final results can be interpreted using the Confusion Matrix. During this run, we achieved a validation accuracy of 97.4% for a KNN model with an optimized K = 5.
If you perform these steps yourself, you may notice slight differences in your accuracies compared to what is shown here. This is because the validation and test datasets are chosen randomly each time, leading to slightly different results. However, for larger datasets like this one, the differences should be fairly negligible (within a few percentage points of accuracy), and the optimized value of K should be similar.
Next Steps
Try out this process yourself. The accuracy here is already pretty high, but can you do better? There are additional KNN hyperparameters that haven't yet been optimized.
Later in the course, you'll encounter more datasets and predictor features with which to train additional classifiers. See if you can optimize your models then too.
The Upcoming Assessments
Congratulations on reaching the end of the module! Over the following two quizzes, you'll complete a small project described below.
An image dataset titled Roadside Ground Cover is provided in the "Data/MathWorks Images/Roadside Ground Cover" folder of the course files. The images are organized into two subfolders: "Snow" and "No Snow." Here are some example images from each category:
For the first quiz, you'll follow the machine learning workflow to prepare your images for classification and extract "mean saturation" and "standard deviation of saturation" as predictor features. In the following quiz, you'll develop a model in MATLAB that classifies these images as having or not having snow.
You can attempt these quizzes an unlimited number of times, so we strongly encourage you to submit this quiz after each question to confirm your progress at each step.
Graded Quiz: Preparing Images for Classification
Quiz30 minutes • 30 min
Review Learning Objectives
To train a classification model,
you need every image in your dataset
to have a single value for each feature.
This way, there is a quantitative means to relate
images to each other based on
their relative values for shared features.
We will call these ready to train features,
predictor features.
But how do you get these predictor features?
Sometimes, you can use simple predictor features
based on the calculations you perform yourself.
However, often, you will be unable to find features
that are distinguishing
enough to successfully classify images.
For example,
it was highly effective to use intensity-based
predictor features to differentiate
cracked and uncracked concrete,
but those same features were less
effective at classifying traffic signs.
You can frequently get better results using
one of the many algorithms available
for extracting feature descriptors.
Due to small variations in the images,
such as changes in lighting,
angle and surroundings,
even similar features will
have different descriptor values.
With no shared feature vectors,
it is difficult to create the matrix
we need to train a model.
In this video, you'll see how the bag of
features algorithm extracts
feature descriptors from images,
and uses them to produce predictor features.
Play video starting at :1:38 and follow transcript1:38
Consider this landscape image and its descriptors.
Sometimes, similar descriptors
commonly occur in other images.
Sometimes, they are rarely present in other images,
and some are unique to just one image.
The bag of features algorithm creates
a way to compare the images
by extracting descriptors across all of them
and clustering them into groups
based on how similar they are.
This process is called creating a visual vocabulary.
Descriptors with similar feature vectors will be
closer than descriptors with dissimilar feature vectors.
In practice, this comparison
takes place in many dimensions,
but only two are shown here for simplicity.
The bag of features method uses the k-means algorithm
to cluster the feature descriptors into groups.
Similar descriptors will be
clustered into the same group,
while descriptors with dissimilar values
will be assigned to different groups.
Each group is called a visual word.
Collectively, the groups are
the dataset's visual vocabulary.
Because of this, bag of features is
sometimes called bag of visual words.
This terminology comes from a similar technique
developed for text retrieval called bag of words.
After creating a visual vocabulary,
the algorithm revisits each individual image.
Each feature descriptor from
a given image is assigned to a visual word.
The occurrences of visual words are tallied
and then scaled by
the number of descriptors in that image.
These values will be your predictor features.
Visual words that occur frequently in an image,
will have a predictor feature value closer to one,
and visual words that do not appear in the image,
will have a value of zero.
These visual word rates are recorded in an M by N matrix,
where m is the number of images in the dataset,
and n is the number of groups or visual words
that appear across all images in the set.
These are your predictor features.
In summary, the bag of features algorithm
prepares a dataset of images for classification
by extracting feature descriptors,
clustering them into visual words,
and tallying visual word occurrence rates in each image.
Once you have this data,
you are ready to train your model.
The bag of features algorithm takes a collection of
images, extracts feature descriptors,
creates a visual vocabulary,
and tabulates the visual word occurrence in
each image to create predictor features.
If this sounds exhausting, don't worry!
MATLAB performs all these steps with
a single function, bagOfFeatures.
In this video, you'll use bagOfFeatures in MATLAB
to create predictor features from
images of roadside ground cover.
These images contain either snow or no snow.
You will explore several algorithm parameters
to create predictor features.
Specifically, you'll choose a point selection method,
alter the grid size,
and change the block width.
Then you'll use these predictor features to
train a classification model. Let's get started.
First, perform some data preparation in MATLAB.
Use the folder of ground cover images
to create a labeled image datastore,
and then split the data into training and test sets.
Now it's time to extract features.
Input the training image datastore
into the bagOfFeatures function.
This single line of code
performs the entire extract feature step.
The function outputs a bagOfFeatures object or bag.
Create a matrix of predictor features by passing
this bag and the training image datastore
into the encode function.
The resulting matrix will have one row for
each image and one column for each feature.
500 features is the default.
To prepare these features for
the Classification Learner App,
go ahead and convert this matrix into a table.
Use the variable names
parameter to give each feature column a name.
Later, when you create a test set,
you'll use the same variable names.
Then add class labels.
Before you run this code,
there are some optional parameters
that you might consider
changing to tailor
the bagOfFeatures algorithm to your data.
The point selection parameter lets you choose how
the algorithm decides where to extract SURF features.
It has two options.
detector, which uses
image characteristics to find extraction points,
or grid, which extracts
descriptors from pre-specified locations.
Because SURF feature detection finds regions with
high contrast and a specific size,
detecting SURF features works best when
distinct details are
the most important parts of your image.
However, in many images,
the most distinguishing characteristics
don't have high contrast.
There are large areas with similar textures,
for example, landscapes or road signs like this one.
Despite their importance,
these areas would have few detector features.
This issue occurs with the ground cover images.
SURF detection identifies many points
in the branches and rocks,
but almost none in the snow,
which is the part of the image
that you are most interested in.
In cases such as these,
it's a good idea to extract
feature descriptors along a grid.
With this method, you extract
descriptors at a series of pre-specified locations.
This skips the detection step in favor of
collecting information uniformly across the whole image.
Grid is the default point selection method
used by bagOfFeatures.
It works well for many images
and it's good to use when you're in doubt.
You can set the size of the grid, and by extension,
how many feature descriptors are
collected with the grid step parameter.
Smaller step sizes are useful in
low resolution images or
in images where there are a lot of different textures.
In these cases, you need
the smaller step size to extract
enough descriptors to adequately describe your image.
In the ground cover dataset,
the images have a high resolution and
have a lot of large areas with similar textures.
The default grid step size of
eight pixels would give
you more feature descriptors than you
probably need to train a model and
would take a long time to extract from every image.
Back in MATLAB, speed things
up by increasing the grid step size to 24.
Recall that with SURF,
the gradient values of the surrounding neighborhood of
pixels are used to calculate feature descriptors.
The BlockWidth parameter determines
the size of this neighborhood.
By default, SURF uses
four block sizes to extract feature descriptors.
Because you're looking for large regions of snow,
you can use only the two largest
default block sizes to decrease training time.
If the accuracy of your trained models does suffer,
you can always create a new bag with
a smaller grid size and more blocks.
To give you an idea of the implications of creating
a bag with a larger grid and fewer blocks,
when I created a bag with these parameters,
it ran more than 15 times
faster than with the default settings.
Run your code and create a table of predictor features.
Depending on your dataset and parameters,
this could take some time.
Now you're ready to train a model using these features.
Open the Classification Learner App,
and start a new session with your predictor features.
As you did in the Training
Image Classification Models video,
train a few SVM and KNN models.
Play video starting at :6:40 and follow transcript6:40
You can see that our best result
is with one of the SVM models.
The last step is to evaluate your model with
the test data you set aside
at the beginning of this video.
Back in MATLAB, prepare a set of
predictor features using your bag and the test data.
Do not create a new bag from the test data!
This will create new clusters unique to the test set,
so new predictor features wouldn't be
comparable to the training
predictor features from earlier.
It would be as if the training and
test sets were speaking different languages.
Once you've created a matrix of
predictor features for your test set,
convert it to a table and add labels.
Remember the feature names you
created for the training predictor features?
Include them again as the VariableNames perimeter.
This way, the model can
find the right features to make its prediction.
Finally, import the test predictor feature table
into the app and test all your models.
It looks like your training accuracy
is comparable to our test accuracy.
Great! With just a few lines of code,
you automatically extracted
predictor features tailored to your data.
For more ways to customize the bagOfFeatures function,
check out the documentation.
Practice Using Bag of Features
You have seen how the Bag of Features algorithm can be used to automatically create predictor features from a collection of images. Here, you will practice using the bagOfFeatures function with different input arguments to create these features.
Navigate to the Module 2 folder and open practiceUsingBagOfFeatures.mlx. Work through the live script to create predictor features and use them to train models in.
Project: Introduction to Ground Cover Classification
You have seen multiple approaches to extracting features from images of roadside ground cover. Specifically:
• Hand-selecting features based on image saturation values
• Automatically generating features using bagOfFeatures
Now, you will use two models trained with features generated using the above approaches to classify a new, unlabeled image:
To perform this task, read this project introduction. Then, open predictUnlabeledGroundCoverImage.mlx to get started.
To classify the new, unlabeled image using hand-selected features, you should:
1. Create a table of saturation-based predictor features for the unlabeled image. This table should be named gcTableSaturation and have variables named avgSat and stdSat that contain the mean saturation and standard deviation of saturation, respectively. Note: To extract saturation-based predictor features, you must first convert the unlabeled image into the HSV color space.
2. Use gcClassifierSaturation.predictFcn to classify the unlabeled image. Attach the output of this classification to gcTableSaturation as a new variable named prediction. The new, predicted label should be a categorical variable detailing "Snow" or "No Snow". To review using predictor functions, revisit the "Training Image Classification Models" video in Week 1 of this course.
Then, to classify the new, unlabeled image using bag of features, you should:
1. Create a table of predictor features for the unlabeled image by encoding it using the provided bag of visual words object, bag. This table should be named gcTableBag and contain predictor feature variables named f1 through f500. To review using predictor functions, revisit the "Training Image Classification Models" video in Week 1 of this course.
2. Use gcClassifierBag.predictFcn to classify the unlabeled image. Attach the output of this classification to gcTableBag as a new variable named prediction. The new, predicted label should be a categorical variable detailing "Snow" or "No Snow".
When you are ready, confirm that your code behaves as expected by submitting it for grading using the online MATLAB Grader following this reading. If you need help, post to the discussion forums.