You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, thank you for the excellent package and companion articles.
While looking over the aoa code it occurred to me that some of the complexity associated with handling categorical variables can be simplified by switching to a different distance metric. Gower's generalized distance metric is ideal because it can integrate mixtures of ratio, nominal, and ordinal data types. Also, the metric automatically includes scaling / centering. There are a couple of implementations:
The interface and resulting objects aren't directly compatible, but it does seem like gower::gower_dist() is a reasonable candidate in terms of speed. The main reason to consider cluster::daisy is that it can accommodate all variable types, while gower::gower_dist() does not yet differentiate between nominal / ordinal factors.
Unit: microseconds
expr min lq mean median uq max neval cld
gower 395.7 444.70 523.737 497.35 559.0 874.3 100 a
knn 772.6 794.05 892.615 842.70 925.2 1382.7 100 a
daisy 56398.0 73496.70 100253.478 78571.80 88727.8 276262.1 100 b
Profiling data for aoa run in a single thred:
This was performed with a model based on 1,030 observations as applied to a raster stack dimensions : 3628, 2351, 8529428, 18 (nrow, ncol, ncell, nlayers)
I'll follow-up with a small example dataset that contains nominal and ordinal variables.
The text was updated successfully, but these errors were encountered:
First of all, thank you for the excellent package and companion articles.
While looking over the
aoa
code it occurred to me that some of the complexity associated with handling categorical variables can be simplified by switching to a different distance metric. Gower's generalized distance metric is ideal because it can integrate mixtures of ratio, nominal, and ordinal data types. Also, the metric automatically includes scaling / centering. There are a couple of implementations:It would appear that the
knnx.dist
function does all of the heavy lifting inaoa
.A quick benchmark of a couple candidate methods.
The interface and resulting objects aren't directly compatible, but it does seem like
gower::gower_dist()
is a reasonable candidate in terms of speed. The main reason to considercluster::daisy
is that it can accommodate all variable types, whilegower::gower_dist()
does not yet differentiate between nominal / ordinal factors.Profiling data for
aoa
run in a single thred:This was performed with a model based on 1,030 observations as applied to a raster stack
dimensions : 3628, 2351, 8529428, 18 (nrow, ncol, ncell, nlayers)
I'll follow-up with a small example dataset that contains nominal and ordinal variables.
The text was updated successfully, but these errors were encountered: