-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example usage for outlier scores #82
Conversation
WalkthroughThe changes in this pull request primarily involve renaming variables in the clustering functionality of the Changes
Assessment against linked issues
Possibly related PRs
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
🔇 Additional comments (3)src/hdbscan.rs (3)
The correction from "corruptted" to "corrupted" improves the error message clarity.
The comment updates correctly reflect that lambda_A corresponds to eps_A, which aligns with the mathematical formulation in the referenced paper (https://dl.acm.org/doi/10.1145/2733381).
The renaming from Also applies to: 495-496 Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
examples/hdbscan.rs (2)
44-44
: LGTM! Consider adding a comment about the score interpretation.The variable renaming improves clarity by distinguishing between noise events and their outlier scores. Since the PR focuses on demonstrating outlier score usage, consider adding a comment explaining that these scores can be interpreted as probabilities in the range [0, 1).
- let (clusters, noise_events, outlier_scores) = clustering.fit(&data.view()); + // clusters: HashMap of cluster labels to point indices + // noise_events: Vector of indices for points not assigned to any cluster + // outlier_scores: Vector of scores in [0, 1) indicating the probability of each point being an outlier + let (clusters, noise_events, outlier_scores) = clustering.fit(&data.view());
53-57
: LGTM! Consider enhancing the outlier statistics.The new output effectively demonstrates the practical use of outlier scores. To provide even more insight, consider adding:
- The average outlier score for noise events (as mentioned in the PR description)
- A configurable threshold instead of the hardcoded 0.9
println!("# of noise events: {}", noise_events.len()); + if !noise_events.is_empty() { + let avg_noise_score: f64 = noise_events + .iter() + .map(|&idx| outlier_scores[idx]) + .sum::<f64>() + / noise_events.len() as f64; + println!("Average outlier score for noise events: {:.3}", avg_noise_score); + } + const OUTLIER_THRESHOLD: f64 = 0.9; println!( - "# of outliers (prob > 0.9): {}", - outlier_scores.iter().filter(|&&score| score > 0.9).count() + "# of outliers (prob > {}): {}", + OUTLIER_THRESHOLD, + outlier_scores.iter().filter(|&&score| score > OUTLIER_THRESHOLD).count() );
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
examples/hdbscan.rs
(2 hunks)src/hdbscan.rs
(3 hunks)
✅ Files skipped from review due to trivial changes (1)
- src/hdbscan.rs
Closes #80
Summary by CodeRabbit
New Features
Bug Fixes
Documentation