Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: Kernel Density Estimation #193

Open
humphreylee opened this issue Sep 22, 2023 · 8 comments
Open

[Feature]: Kernel Density Estimation #193

humphreylee opened this issue Sep 22, 2023 · 8 comments

Comments

@humphreylee
Copy link

Thanks for sharing the good work. Is there any implementation for kernel density estimation available (univariate, bivariate or multivariate)? Thanks.

@wangjiawen2013
Copy link

I think this is very helpful ! Here is repo for kernel density estimation (https://github.com/seatonullberg/kernel-density-estimation)
It would be great to incorporate it statrs

@YeungOnion
Copy link
Contributor

I'm unsure how well this would fit into the exist tooling in statrs. While we do have the Empirical distribution, introducing hyperparameters to the data for data driven distributions starts to fit into a broader realm of statistics that is data-driven.

@henryjac what would you say? We're amidst defining new direction, but I think this would fit better in a different crate.

@henryjac
Copy link
Contributor

henryjac commented Mar 9, 2024

I would not be against adding functionality for performing more data driven distributions and functionality. With this being a statistics crate most features regarding any kind of statistics fits quite well.

We even already have functionality for statistics in the statistics module, so expanding that with KDE etc. makes sense to me.

@YeungOnion
Copy link
Contributor

Well, I would be happy to support data-driven distributions as long as we look ahead a little bit at what we'll choose (for near-future) as out of scope. Perhaps I'd have done better had I said that I'm averse to looking at it now since we've got some short-term priorities right now.

Overall, what I think is that it just takes some discussion on clearly choosing scope, starting with upper bound (things that will certainly be out of scope). What can you say won't be in scope (for near-future)?


But all cards on the table, I actually think KDE would be good candidate compared to some other data-driven distributions, reasons being:

  • the distribution function for it similar to named random variables in statrs::distribution in that it is,
    • invariant under permutation of data
    • deterministic and closed form in terms of specified data and kernel (plus possible hyperparameters)
  • I'd expect real-world use-cases where data volume is less than scale of memory (i.e. not developing API to support larger-than-memory beyond considering IntoIterator)

@humphreylee
Copy link
Author

Don't mean to push things. I fully appreciate your time to make things happen. Just curious, any hope of having this feature?

@YeungOnion
Copy link
Contributor

My first thoughts returning to this are that it would be great! Would really get statrs to be a fuller statistical suite. As far as statistics, we mostly have the parametric stuff without estimators or regression.

I don't think I could personally implement anything with practical performance for anything beyond 1D and I won't prioritize authoring it right now, but perhaps we could start with a simple implementation to set the API and define how this estimator fits (and in principle, future others fit) into the module structure we have or a different one that better suits future features.

As a would be user, do you have any example workflows/use cases we could start the design around?

@humphreylee
Copy link
Author

There is a rust implementation for 1D from this crate, while a 2D is in progress. But my main interest is to build contour plot.

@YeungOnion
Copy link
Contributor

I can start with writing out the set of traits and perhaps with an example that won't run but would compile, but I might want to restructure modules. I'll refer to this issue when I put in the PR. Likely won't start until next week, so aiming for 2/17 on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants