You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Extract winequality-white.csv and put in the project folder.
Replace all ; with , and remove the header info (as required by this lib).
Add println! to print the MST weight (this line seems to be a good place for simplicity):
let mut weight = A::zero();
for (_, _, w) in &mst {
weight += *w;
}
println!("weight: {:?}", weight);
Change the example code to disable Boruvka: boruvka = true => boruvka = false
Run the example code: cargo run --example hdbscan winequality-white.csv
This prints the following (which is the ground truth exact MST):
weight: 26787.419129474838
========= Report =========
# of events processed: 4898
# of features provided: 12
# of clusters: 8
# of events clustered: 1564
# of outliers: 3334
Change the example code to enable Boruvka: boruvka = false => boruvka = true:
Run the example code again: cargo run --example hdbscan winequality-white.csv
This will output:
weight: 26788.24022864788
========= Report =========
# of events processed: 4898
# of features provided: 12
# of clusters: 8
# of events clustered: 1566
# of outliers: 3332
A quick fix would be not to use the lower bound function as suggested here, but this may increase the running time of Boruvka.
Interestingly, Python HDBSCAN does not have this bug if we run the same experiment (with the same configurations). It may be possible that Ball-tree implementation might be causing this (e.g. erroneous lower bounding between tree nodes due to loss of precision?).
The text was updated successfully, but these errors were encountered:
azizkayumov
changed the title
inconsistent Boruvka and Prim's MSTs
inconsistent Boruvka MST
Feb 27, 2024
As discussed in #67, there is a small difference between MSTs computed by Boruvka and Prim's algorithms in HDBSCAN.
Here are the steps to reproduce:
winequality-white.csv
and put in the project folder.;
with,
and remove the header info (as required by this lib).println!
to print the MST weight (this line seems to be a good place for simplicity):boruvka = true
=>boruvka = false
cargo run --example hdbscan winequality-white.csv
This prints the following (which is the ground truth exact MST):
boruvka = false
=>boruvka = true
:cargo run --example hdbscan winequality-white.csv
This will output:
A quick fix would be not to use the lower bound function as suggested here, but this may increase the running time of Boruvka.
Interestingly, Python HDBSCAN does not have this bug if we run the same experiment (with the same configurations). It may be possible that Ball-tree implementation might be causing this (e.g. erroneous lower bounding between tree nodes due to loss of precision?).
The text was updated successfully, but these errors were encountered: