You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The prims_balltree, prims_kdtre, boruvka_balltree and sparse precomputed algorithms have min_sample size off by one, presumably due to how the kth nearest neighbour is counted.
This can be seen using the following code snippet,
For prims_kdtree and prims_balltree, this can be fixed by changing min_samples to min_samples + 1 in the tree.query() call.
For boruvka_balltree I think that it should be fixed by changing min_samples to min_samples + 1 in _compute_bounds(). As a comment, this makes the _compute_bounds() function of BallTreeBoruvkaAlgorithm better match that of KDTreeBoruvkaAlgorithm.
For sparse precomputed it should be fixed by changing min_samples to min_samples - 1 in sparse_mutual_reachability().
I also noticed that min_samples is changed when the match_reference_implementation flag is set. If the comparison to the reference implementation was done using one of the changed algorithms then that code may also need to be changed.
I've linked a pull request with these suggested changes (excluding the possible change to match_reference_implementation).
The text was updated successfully, but these errors were encountered:
I'm not sure about sparse_mutual_reachability, for the case where the sparse distances matrix contain the self loop.
So when it looks at : core_distance[i] = sorted_row_data[min_points], if you take off the self loop it actually looks at min_points neighbors (and not min_points+1), as it should be.
My assumption is that min_points refer to the number of neighbors without the data point itself.
The prims_balltree, prims_kdtre, boruvka_balltree and sparse precomputed algorithms have min_sample size off by one, presumably due to how the kth nearest neighbour is counted.
This can be seen using the following code snippet,
which outputs
For prims_kdtree and prims_balltree, this can be fixed by changing min_samples to min_samples + 1 in the tree.query() call.
For boruvka_balltree I think that it should be fixed by changing min_samples to min_samples + 1 in _compute_bounds(). As a comment, this makes the _compute_bounds() function of BallTreeBoruvkaAlgorithm better match that of KDTreeBoruvkaAlgorithm.
For sparse precomputed it should be fixed by changing min_samples to min_samples - 1 in sparse_mutual_reachability().
I also noticed that min_samples is changed when the match_reference_implementation flag is set. If the comparison to the reference implementation was done using one of the changed algorithms then that code may also need to be changed.
I've linked a pull request with these suggested changes (excluding the possible change to match_reference_implementation).
The text was updated successfully, but these errors were encountered: