Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: experiment for reusing allocated buffers for Dyn multivariate #278

Conversation

YeungOnion
Copy link
Contributor

takes approach of making a few fields OnceCell, allowing for init-ing
precomputed values while allow taking them out to modify without
reallocate

Implementing similar way to update location via set or set_with would also be implemented.
Would also implement nearly identical in StudentsT.

@YeungOnion
Copy link
Contributor Author

@FreezyLemon an alternative to #273, what are your thoughts?


Noting as a tangential discussion, since #180 was referring to MCMC, perhaps there would also be a need for many more variables than nalgebra is well suited to. We've mentioned ideas for supporting different backends before but I figure if we can specify some of the API for nalgebra, then we can emulate that where possible for a different matrix crate.

Copy link

codecov bot commented Sep 7, 2024

Codecov Report

Attention: Patch coverage is 29.57746% with 50 lines in your changes missing coverage. Please review.

Project coverage is 93.38%. Comparing base (207e27d) to head (3657843).
Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
src/distribution/multivariate_normal.rs 25.37% 50 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #278      +/-   ##
==========================================
- Coverage   93.75%   93.38%   -0.37%     
==========================================
  Files          53       53              
  Lines       11801    11799       -2     
==========================================
- Hits        11064    11019      -45     
- Misses        737      780      +43     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@FreezyLemon
Copy link
Contributor

@FreezyLemon an alternative to #273, what are your thoughts?

This seems like it will result in a nicer API than 273 for sure. Hm.. without OnceLocks or some other thread synchronisation, this will make the struct non-Sync and non-Send though. And I can see complexity becoming a problem, with the internal state needing to be valid in any circumstance.

@YeungOnion
Copy link
Contributor Author

I realized this morning all of this works because of take, so Option would be viable

Also, you mentioned how a distribution should be immutable, and I agree, so semantically it would make more sense to have into_new_from_nalgebra that takes self and relies on take and clone_into.

takes approach of making a few fields `Option`, to use `take` since
creating `nalgebra::Cholesky` requires ownership
@YeungOnion YeungOnion force-pushed the update-multivariate-without-realloc branch from e5ab085 to e88c40b Compare September 10, 2024 03:37
@YeungOnion
Copy link
Contributor Author

redid this with Option, regarding async, I think this would be fine for an RWLock style of code since the only times the Options are None are when there's a mutable reference to them.

I considered a type that represented this kind of "field that owns a data" that always logically stores a value, but is just a wrapper to an Option.

@FreezyLemon
Copy link
Contributor

FreezyLemon commented Sep 11, 2024

Have you benched this? I've tried implementing a builder pattern for some time and benches have always shown it's not really any faster than just reallocating.

Unless the amount of variables (and thereby allocation size) is much larger than I've tested, I doubt this makes much difference at all performance-wise.

Some napkin math (please double-check) reveals that for n variables, we need to allocate 3 n x n matrices and one n x 1 matrix/vector. For 10 variables, that's 310 f64s (8 bytes each), so 2480 bytes i.e. less than 2.5KiB. Such small allocations will usually be incredibly fast unless on memory-constrained hardware.

Even with 1000 variables, we're talking about ~24MiB which still isn't huge.

@FreezyLemon
Copy link
Contributor

FreezyLemon commented Sep 11, 2024

I did a rudimentary profiling run for 1000 variables, just running MultivariateNormal::new 40 times (~10 secs runtime):

https://share.firefox.dev/4enR0hX

The code is basically:

fn main() {
    let (mean, cov) = create_mean_and_cov();

    for _ in 0..40 {
        MultivariateNormal::new_from_nalgebra(mean.clone(), cov.clone()).unwrap();
    }
}

If you look for allocating functions (mostly clone()), you'll find that apart from some calls very deep inside nalgebra that are probably not avoidable by statrs, the allocations that are done don't seem to be very heavy at all. Even the manual clones inside the loop in fn main are less than 0.5% of the total runtime.

Edit: What I'm getting at is that, if anything, other optimizations should probably be targeted first. Or a switch away from nalgebra if that is the bottleneck here... Even if allocations are slower on other hardware, surely it can't be so much slower that it becomes a problem?

@YeungOnion
Copy link
Contributor Author

Oh this is a smarter way to do preliminary benchmarks. I was going to check if this in-place mutation was faster 🙃. Thanks for doing this!

What did you use to profile and view with Firefox profiler?

Changing from nalgebra right now does seem a bit like it wastes effort, so I'm averse to it until we get some more feedback.

I'll close this and share thoughts about something benefitting MCMC on the initial issue.

@FreezyLemon
Copy link
Contributor

What did you use to profile and view with Firefox profiler?

I used samply, it's really easy to use after a bit of setup and does the firefox profiler stuff for you. perf record also works, but is a bit more barebones (flamegraph can convert the artifacts from that into nice graphs)

@YeungOnion YeungOnion deleted the update-multivariate-without-realloc branch September 13, 2024 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants