Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of the incremental Kolmogorov-Smirnov Statistics #1354

Merged
merged 15 commits into from
Oct 30, 2023

Conversation

hoanganhngo610
Copy link
Contributor

This implementation features the incremental KS statistics, which is first proposed in the KDD 2016 paper.

This implementation is based on Treap (or Cartesian Tree), with bulk operation and lazy propagation. This allows the implementation to conduct insertion and removal operations in $O(log N)$ with high probability and calculate the KS test in $O(1)$, a significant improvement compared to the $O(N log N)$ within the non-incremental implementation.

As said, this implementation supports both insertion (via update) and deletion (via revert). Moreover, a closely related version of the KS Statistics, the Kuiper test, can also be calculated by setting statistic to kuiper.

@hoanganhngo610
Copy link
Contributor Author

hoanganhngo610 commented Oct 30, 2023

@MaxHalford Would you mind having a look through the PR? I think the only remaining problem would be the fact that ruff is continuously failing. (Update: Issue resolved by bumping ruff version locally from 0.1.3 to 0.1.2).

Copy link
Member

@MaxHalford MaxHalford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job @hoanganhngo610! It's great that you added a unit test, it makes me trust the code is correct.

Can you add an entry to UNRELEASED.md?

river/stats/test_ks.py Show resolved Hide resolved
river/stats/test_ks.py Outdated Show resolved Hide resolved
river/stats/kolmogorov_smirnov.py Outdated Show resolved Hide resolved
river/stats/kolmogorov_smirnov.py Show resolved Hide resolved
river/stats/kolmogorov_smirnov.py Outdated Show resolved Hide resolved
hoanganhngo610 and others added 5 commits October 30, 2023 17:40
Co-authored-by: Max Halford <maxhalford25@gmail.com>
Co-authored-by: Max Halford <maxhalford25@gmail.com>
Co-authored-by: Max Halford <maxhalford25@gmail.com>
@hoanganhngo610
Copy link
Contributor Author

@MaxHalford I have successfully incorporated all of your proposed changes to the PR, and also provided a description within the UNRELEASED.md file. Would you mind if I merge the PR into the repository after all tests have passed?

@MaxHalford MaxHalford merged commit 0e87b80 into main Oct 30, 2023
11 checks passed
@MaxHalford MaxHalford deleted the incremental-ks branch October 30, 2023 17:05
@MaxHalford
Copy link
Member

Merged @hoanganhngo610, good job! It's really nice to have this :)

@hoanganhngo610
Copy link
Contributor Author

Thank you so much @MaxHalford! That's really encouraging to hear!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants