forked from psaris/funq
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathknn.q
44 lines (37 loc) · 1.49 KB
/
knn.q
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
\c 20 100
\l funq.q
\l pendigits.q
-1"referencing pendigits data from global namespace";
`X`Xt`y`yt set' pendigits`X`Xt`y`yt;
k:4
df:`.ml.edist2
-1"checking accuracy of using ",string[k], " nearest neigbors and df=", string df;
-1"and equal weight the points";
-1"using .ml.f2nd to peach across the 2nd dimension of Xt to build distance matrix";
avg yt=p:.ml.knn[0<=;k;y] d:.ml.f2nd[df X] Xt
-1"alternatively, we can peach the combination of knn+distance calculation";
avg yt=p:.ml.f2nd[.ml.knn[0<=;k;y] df[X]@] Xt
-1"we can also change the weighting function to be 1/distance";
avg yt=p:.ml.f2nd[.ml.knn[sqrt 1%;k;y] df[X]@] Xt
-1"using pairwise distance (squared) function uses matrix algebra for performance";
avg yt=p:.ml.knn[sqrt 1%;k;y] d:.ml.pedist2[X;Xt]
-1"computing the accuracy of each digit";
show avg each (p=yt)[i] group yt i:iasc yt
-1"viewing the confusion matrix, we can see 7 is often confused with 1";
show .util.totals[`TOTAL] .ml.cm[yt;p]
ks:1+til 10
-1"compare different choices of k: ", -3!ks;
t:([]k:ks)
t:update mdist:avg yt=.ml.knn[1%;k;y] .ml.f2nd[.ml.mdist X] Xt from t
t:update edist:avg yt=.ml.knn[1%;k;y] .ml.f2nd[.ml.edist X] Xt from t
show t;
n:5
-1"cross validate with ", string[n], " buckets";
Xs:flip (n;0N)#/:X
ys:(n;0N)#y
ff:{[y;X].ml.knn[sqrt 1%;ks;y] .ml.pedist2[X]::}
e:ys=p:(.ml.kfxvyx[ff;(::);ys;Xs]0N!) each til n
-1"find k with maximum accuracy";
k:0N!ks .ml.imax avg avg each e
-1"confirm accuracy against test dataset";
avg yt=p:.ml.knn[sqrt 1%;k;y] .ml.pedist2[X] Xt