inconsistent Boruvka MST #69

azizkayumov · 2024-02-27T05:01:39Z

As discussed in #67, there is a small difference between MSTs computed by Boruvka and Prim's algorithms in HDBSCAN.
Here are the steps to reproduce:

Download the Wine dataset.
Extract winequality-white.csv and put in the project folder.
Replace all ; with , and remove the header info (as required by this lib).
Add println! to print the MST weight (this line seems to be a good place for simplicity):

        let mut weight = A::zero();
        for (_, _, w) in &mst {
            weight += *w;
        }
        println!("weight: {:?}", weight);

Change the example code to disable Boruvka: boruvka = true => boruvka = false
Run the example code:
cargo run --example hdbscan winequality-white.csv
This prints the following (which is the ground truth exact MST):

weight: 26787.419129474838
========= Report =========
# of events processed: 4898
# of features provided: 12
# of clusters: 8
# of events clustered: 1564
# of outliers: 3334

Change the example code to enable Boruvka: boruvka = false => boruvka = true:
Run the example code again:
cargo run --example hdbscan winequality-white.csv
This will output:

weight: 26788.24022864788
========= Report =========
# of events processed: 4898
# of features provided: 12
# of clusters: 8
# of events clustered: 1566
# of outliers: 3332

A quick fix would be not to use the lower bound function as suggested here, but this may increase the running time of Boruvka.

Interestingly, Python HDBSCAN does not have this bug if we run the same experiment (with the same configurations). It may be possible that Ball-tree implementation might be causing this (e.g. erroneous lower bounding between tree nodes due to loss of precision?).

The text was updated successfully, but these errors were encountered:

azizkayumov changed the title ~~inconsistent Boruvka and Prim's MSTs~~ inconsistent Boruvka MST Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inconsistent Boruvka MST #69

inconsistent Boruvka MST #69

azizkayumov commented Feb 27, 2024

inconsistent Boruvka MST #69

inconsistent Boruvka MST #69

Comments

azizkayumov commented Feb 27, 2024