2x speed difference vs hand-unrolled #135

yongqli · 2015-06-15T10:09:59Z

Hi,

Based on this blog post, I've decided to benchmark nalgebra and I've found it to be 2x slower. Any ideas why? I'm new to rust, so it's entirely possible I'm doing something wrong.

https://gist.github.com/yongqli/7ba8ef0e06fbfaebd98f takes 9.186 s to run, so 9.18 ms per 1 million iterations.

#[cfg(test)]
mod tests {
    extern crate nalgebra;
    extern crate test;

    use super::*;
    use nalgebra::*;

    #[bench]
    fn bench_4x4_mult(b: &mut test::Bencher) {
        b.iter(|| {
            let mut a = test::black_box(
                Mat4::new(1., 1., 1., 1.,
                          1., 2., 1., 1.,
                          1., 1., 4., 1.,
                          1., 1., 1., 1.,)
            );

            let b = test::black_box(new_identity::<Mat4<f64>>(4));

            for _ in 0..1_000_000 {
                // Mat4::inv_mut(&mut a);
                a = a * b;
            }
        });
    }
}

takes 22 ms according to cargo bench.

The text was updated successfully, but these errors were encountered:

sebcrozet · 2015-06-19T20:39:48Z

I see. If this is due to the lack of manual unrolling, this is very unfortunate. Perhaps we could somehow perform this unrolling automatically using macros.

yongqli · 2015-06-21T14:43:31Z

I've been using this, which you might also find useful:

macro_rules! new_Mat3x3(
    ($f: expr) => (
        Mat3x3(
            [[($f)(0, 0), ($f)(0, 1), ($f)(0, 2)],
             [($f)(1, 0), ($f)(1, 1), ($f)(1, 2)],
             [($f)(2, 0), ($f)(2, 1), ($f)(2, 2)]]
        )
    )
);


...


impl Add for $Mat {
    type Output = $Mat;
    #[inline(always)]
    fn add(self, rhs: $Mat) -> $Mat {
        $new_Mat!(|i, j| self[i][j] + rhs[i][j])
    }
}

This unrolls the closure into the "shape" of the matrix.

yongqli · 2015-06-21T19:10:54Z

Here's an example of matrix multiplication:

macro_rules! unroll_sum_4 (
    ($f: expr) => (
        ($f)(0) + ($f)(1) + ($f)(2) + ($f)(3)
    )
);

...

macro_rules! unroll_Mat4x4(
    ($f: expr) => (
        Mat4x4(
            [[($f)(0, 0), ($f)(0, 1), ($f)(0, 2), ($f)(0, 3)],
             [($f)(1, 0), ($f)(1, 1), ($f)(1, 2), ($f)(1, 3)],
             [($f)(2, 0), ($f)(2, 1), ($f)(2, 2), ($f)(2, 3)],
             [($f)(3, 0), ($f)(3, 1), ($f)(3, 2), ($f)(3, 3)]]
        )
    )
);

...

impl Mul<Mat4x4> for Mat4x4 {
    type Output = Mat4x4;
    #[inline(always)]
    fn mul(self, rhs: Mat4x4) -> Mat4x4 {
        unroll_Mat4x4!(|i, j| unroll_sum_4!(|k| self[i][k] * rhs[k][j]))
    }
}

bluss · 2015-08-23T14:35:52Z

@yongqli What kind of compilation flags did you use for C and Rust? This forum thread's later posts touch upon the issue of -ffast-math (lack thereof in rust) and also lack of unrolling. Lack of vectorization in floating point reduction (accumulation) loops is explicitly documented by llvm.

milibopp · 2015-09-10T17:38:19Z

When issues like this come up, I always feel like it should be pushed to the compiler, as this will solve it in more generally useful fashion (hopefully).

bluss · 2015-09-10T18:23:39Z

The annotation for "fast" float semantics probably needs to be explicit. Maybe that's not the whole issue.

bluss · 2015-12-20T17:38:01Z

Here's the rust issue on fast-math / imprecise float operations rust-lang/rust/issues/21690

yongqli · 2017-02-17T22:59:08Z

There's still a performance difference of up to .8x to 2.3x with the latest version of nalgebra.

test tests::bench_4x4_mult_nalgebra   ... bench:      26,195 ns/iter (+/- 2,911)
test tests::bench_4x4_mult_unrolled   ... bench:       7,826 ns/iter (+/- 977)
test tests::bench_4x4_t_mult_nalgebra ... bench:      18,170 ns/iter (+/- 2,764)
test tests::bench_4x4_t_mult_unrolled ... bench:      10,420 ns/iter (+/- 1,630)

You can run it yourself by checking out https://github.com/yongqli/rust_linalgs_bench and running cargo bench

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2x speed difference vs hand-unrolled #135

2x speed difference vs hand-unrolled #135

yongqli commented Jun 15, 2015

sebcrozet commented Jun 19, 2015

yongqli commented Jun 21, 2015

yongqli commented Jun 21, 2015

bluss commented Aug 23, 2015

milibopp commented Sep 10, 2015

bluss commented Sep 10, 2015

bluss commented Dec 20, 2015

yongqli commented Feb 17, 2017

2x speed difference vs hand-unrolled #135

2x speed difference vs hand-unrolled #135

Comments

yongqli commented Jun 15, 2015

sebcrozet commented Jun 19, 2015

yongqli commented Jun 21, 2015

yongqli commented Jun 21, 2015

bluss commented Aug 23, 2015

milibopp commented Sep 10, 2015

bluss commented Sep 10, 2015

bluss commented Dec 20, 2015

yongqli commented Feb 17, 2017