Check the TFeat descriptor from BMVC 2016.
Results should be much better than PN-Net introduced below
Code for the arxiv paper PN-Net: Conjoined Triple Deep Network for Learning Local Image Descriptors.
The network extracts feature descriptors from grayscale local patches
of size 32x32
.
nn.Sequential {
[input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> output]
(1): cudnn.SpatialConvolution(1 -> 32, 7x7)
(2): cudnn.Tanh
(3): cudnn.SpatialMaxPooling(2,2,2,2)
(4): cudnn.SpatialConvolution(32 -> 64, 6x6)
(5): cudnn.Tanh
(6): nn.View
(7): nn.Linear(4096 -> 128)
(8): cudnn.Tanh
}
For optimization details refer to the arxiv publication. Training code is now also available.
Get the Phototour datasets in .t7 format from UBC-Phototour-Patches-Torch
and extract liberty
yosemite
and notredame
Run th eval.lua
The script will print a series of evaluation results for the patch
pairs from the notredame
100k
dataset e.g.
1 4.3915
0 10.6367
1 5.6122
0 10.5520
0 10.4561
0 10.1167
0 9.8624
1 3.2972
1 2.1507
1 2.9709
1 3.4437
1 7.6362
The first column contains the patch pair label (0 negative, 1 positive), the second row contains the L2 distance between the two patches of the pair, based on the features extracted from the last layer of our CNN.
From this output, one can compute the ROC curve (e.g. cppROC).
In the train
folder, run th run.lua
.
When training with 1.2M
triplets on a GTX TITAN X
, each epoch takes approximately 2mins
Examples of the training triplets used
Examples of positive nearest neighbour patch matching using the pnnet descriptor in the Oxford matching dataset.
Efficiency comparison with MatchNet and deepcompare, both from CVPR 2015. For more results refer to the arxiv paper.