feat: experiment for reusing allocated buffers for Dyn multivariate #278

YeungOnion · 2024-09-07T02:50:09Z

takes approach of making a few fields OnceCell, allowing for init-ing
precomputed values while allow taking them out to modify without
reallocate

Implementing similar way to update location via set or set_with would also be implemented.
Would also implement nearly identical in StudentsT.

YeungOnion · 2024-09-07T02:57:00Z

@FreezyLemon an alternative to #273, what are your thoughts?

Noting as a tangential discussion, since #180 was referring to MCMC, perhaps there would also be a need for many more variables than nalgebra is well suited to. We've mentioned ideas for supporting different backends before but I figure if we can specify some of the API for nalgebra, then we can emulate that where possible for a different matrix crate.

codecov · 2024-09-07T05:27:52Z

Codecov Report

Attention: Patch coverage is 29.57746% with 50 lines in your changes missing coverage. Please review.

Project coverage is 93.38%. Comparing base (207e27d) to head (3657843).
Report is 5 commits behind head on master.

Files with missing lines	Patch %	Lines
src/distribution/multivariate_normal.rs	25.37%	50 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #278      +/-   ##
==========================================
- Coverage   93.75%   93.38%   -0.37%     
==========================================
  Files          53       53              
  Lines       11801    11799       -2     
==========================================
- Hits        11064    11019      -45     
- Misses        737      780      +43

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

FreezyLemon · 2024-09-07T22:54:51Z

@FreezyLemon an alternative to #273, what are your thoughts?

This seems like it will result in a nicer API than 273 for sure. Hm.. without OnceLocks or some other thread synchronisation, this will make the struct non-Sync and non-Send though. And I can see complexity becoming a problem, with the internal state needing to be valid in any circumstance.

YeungOnion · 2024-09-08T00:04:04Z

I realized this morning all of this works because of take, so Option would be viable

Also, you mentioned how a distribution should be immutable, and I agree, so semantically it would make more sense to have into_new_from_nalgebra that takes self and relies on take and clone_into.

takes approach of making a few fields `Option`, to use `take` since creating `nalgebra::Cholesky` requires ownership

YeungOnion · 2024-09-10T03:42:34Z

redid this with Option, regarding async, I think this would be fine for an RWLock style of code since the only times the Options are None are when there's a mutable reference to them.

I considered a type that represented this kind of "field that owns a data" that always logically stores a value, but is just a wrapper to an Option.

src/distribution/multivariate_normal.rs

FreezyLemon · 2024-09-11T21:06:58Z

Have you benched this? I've tried implementing a builder pattern for some time and benches have always shown it's not really any faster than just reallocating.

Unless the amount of variables (and thereby allocation size) is much larger than I've tested, I doubt this makes much difference at all performance-wise.

Some napkin math (please double-check) reveals that for n variables, we need to allocate 3 n x n matrices and one n x 1 matrix/vector. For 10 variables, that's 310 f64s (8 bytes each), so 2480 bytes i.e. less than 2.5KiB. Such small allocations will usually be incredibly fast unless on memory-constrained hardware.

Even with 1000 variables, we're talking about ~24MiB which still isn't huge.

FreezyLemon · 2024-09-11T21:47:51Z

I did a rudimentary profiling run for 1000 variables, just running MultivariateNormal::new 40 times (~10 secs runtime):

https://share.firefox.dev/4enR0hX

The code is basically:

fn main() {
    let (mean, cov) = create_mean_and_cov();

    for _ in 0..40 {
        MultivariateNormal::new_from_nalgebra(mean.clone(), cov.clone()).unwrap();
    }
}

If you look for allocating functions (mostly clone()), you'll find that apart from some calls very deep inside nalgebra that are probably not avoidable by statrs, the allocations that are done don't seem to be very heavy at all. Even the manual clones inside the loop in fn main are less than 0.5% of the total runtime.

Edit: What I'm getting at is that, if anything, other optimizations should probably be targeted first. Or a switch away from nalgebra if that is the bottleneck here... Even if allocations are slower on other hardware, surely it can't be so much slower that it becomes a problem?

YeungOnion · 2024-09-12T16:21:29Z

Oh this is a smarter way to do preliminary benchmarks. I was going to check if this in-place mutation was faster 🙃. Thanks for doing this!

What did you use to profile and view with Firefox profiler?

Changing from nalgebra right now does seem a bit like it wastes effort, so I'm averse to it until we get some more feedback.

I'll close this and share thoughts about something benefitting MCMC on the initial issue.

FreezyLemon · 2024-09-12T19:10:18Z

What did you use to profile and view with Firefox profiler?

I used samply, it's really easy to use after a bit of setup and does the firefox profiler stuff for you. perf record also works, but is a bit more barebones (flamegraph can convert the artifacts from that into nice graphs)

YeungOnion added 2 commits September 9, 2024 21:07

test: mark display implementations as ignored from coverage

be40c8c

feat: reusing buffers for Dyn multivariate

e88c40b

takes approach of making a few fields `Option`, to use `take` since creating `nalgebra::Cholesky` requires ownership

YeungOnion force-pushed the update-multivariate-without-realloc branch from e5ab085 to e88c40b Compare September 10, 2024 03:37

FreezyLemon reviewed Sep 10, 2024

View reviewed changes

src/distribution/multivariate_normal.rs Outdated Show resolved Hide resolved

YeungOnion mentioned this pull request Sep 10, 2024

Replace all instances of StatsError with custom error types or Option<T> #284

Merged

style: drop _unchecked suffix to match conventions from std

3657843

YeungOnion closed this Sep 12, 2024

YeungOnion mentioned this pull request Sep 12, 2024

Mutable/Movable parameters for multivariate normal #180

Open

YeungOnion deleted the update-multivariate-without-realloc branch September 13, 2024 13:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: experiment for reusing allocated buffers for Dyn multivariate #278

feat: experiment for reusing allocated buffers for Dyn multivariate #278

YeungOnion commented Sep 7, 2024

YeungOnion commented Sep 7, 2024

codecov bot commented Sep 7, 2024 •

edited

Loading

FreezyLemon commented Sep 7, 2024

YeungOnion commented Sep 8, 2024

YeungOnion commented Sep 10, 2024

FreezyLemon commented Sep 11, 2024 •

edited

Loading

FreezyLemon commented Sep 11, 2024 •

edited

Loading

YeungOnion commented Sep 12, 2024

FreezyLemon commented Sep 12, 2024

feat: experiment for reusing allocated buffers for Dyn multivariate #278

feat: experiment for reusing allocated buffers for Dyn multivariate #278

Conversation

YeungOnion commented Sep 7, 2024

YeungOnion commented Sep 7, 2024

codecov bot commented Sep 7, 2024 • edited Loading

Codecov Report

FreezyLemon commented Sep 7, 2024

YeungOnion commented Sep 8, 2024

YeungOnion commented Sep 10, 2024

FreezyLemon commented Sep 11, 2024 • edited Loading

FreezyLemon commented Sep 11, 2024 • edited Loading

YeungOnion commented Sep 12, 2024

FreezyLemon commented Sep 12, 2024

codecov bot commented Sep 7, 2024 •

edited

Loading

FreezyLemon commented Sep 11, 2024 •

edited

Loading

FreezyLemon commented Sep 11, 2024 •

edited

Loading