Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use eigen elementwise methods for addition and subtraction #427

Closed

Conversation

SteveBronder
Copy link
Contributor

Right now vectorized code using .* and other elementwise methods is usually slower than just using a raw for loop (see #825 in brms).

This

mu += r_1_1[J_1] .* Z_1_1 + r_1_2[J_1] .* Z_1_2;

where J_1 is a vector of ints produces

Makes c++ like

        assign(mu, nil_index_list(),
          add(stan::model::deep_copy(mu),
            add(
              elt_multiply(
                rvalue(r_1_1, cons_list(index_multi(J_1), nil_index_list()),
                  "r_1_1"), Z_1_1),
              elt_multiply(
                rvalue(r_1_2, cons_list(index_multi(J_1), nil_index_list()),
                  "r_1_2"), Z_1_2))), "assigning variable mu");

This is not good because add returns back an Eigen::Matrix<> and doesn't allow for the operations to build up into an expression tree. But in stan math we've added var and fvar as custom scalar types in eigen so we should be able to build up an expression tree here of eigen operations with something like

        assign(mu, nil_index_list(),
          stan::model::deep_copy(mu) +
            rvalue(r_1_1, cons_list(index_multi(J_1), nil_index_list()), "r_1_1").array() * Z_1_1.array() +
            rvalue(r_1_2, cons_list(index_multi(J_1), nil_index_list()),"r_1_2").array() * Z_1_2.array(),
            "assigning variable mu");

This PR adds this for addition of two matrices and elementwise multiplication. If people think this is cool and good I can add it for other elementwise ops for eigen matrices

Tests

I added elementwise add and multiply to the mother model because they weren't there before

License under BSD3

@SteveBronder
Copy link
Contributor Author

Also it would be rad to have this in the launch because I expect it would make some models a good deal faster (brms ones in particular if Paul changes the loop he currently has to do elewise multiplication). Though we hit the bug fix period so if that's a no-no it's totally reasonable

Comment on lines 211 to 214
if
is_matrix (second es) && (is_matrix (first es))
then pp_scalar_binary ppf "(%a@ *@ %a)" "(%a.array()@ *@ %a.array())" es
else pp_scalar_binary ppf "(%a@ *@ %a)" "elt_multiply(@,%a,@ %a)" es
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a smarter way to write this?

@SteveBronder
Copy link
Contributor Author

Hmm okay this will be a bit tricker than I thought. I need it to look like this where

  1. that deep_copy also is cast to an array
  2. Using cwiseproduct
  3. The code right now just catches raw eigen types but I need to it catch functions that return an eigen type
        assign(mu, nil_index_list(),
          stan::model::deep_copy(mu).array() +
            rvalue(r_1_1, cons_list(index_multi(J_1), nil_index_list()),
              "r_1_1").cwiseProduct(Z_1_1).array() +
              rvalue(r_1_2, cons_list(index_multi(J_1), nil_index_list()),
                "r_1_2").cwiseProduct(Z_1_2).array() +
              rvalue(r_1_3, cons_list(index_multi(J_1), nil_index_list()),
                "r_1_3").cwiseProduct(Z_1_3).array() +
              rvalue(r_2_1, cons_list(index_multi(J_2), nil_index_list()),
                "r_2_1").cwiseProduct(Z_2_1).array() +
              rvalue(r_2_2, cons_list(index_multi(J_2), nil_index_list()),
                "r_2_2").cwiseProduct(Z_2_2).array() +
              rvalue(r_2_3, cons_list(index_multi(J_2), nil_index_list()),
                "r_2_3").cwiseProduct(Z_2_3).array() +
              rvalue(r_3_1, cons_list(index_multi(J_3), nil_index_list()),
                "r_3_1").cwiseProduct(Z_3_1).array() +
              rvalue(r_3_2, cons_list(index_multi(J_3), nil_index_list()),
                "r_3_2").cwiseProduct(Z_3_2).array() +
              rvalue(r_3_3, cons_list(index_multi(J_3), nil_index_list()),
                "r_3_3").cwiseProduct(Z_3_3).array(),
          "assigning variable mu");

But! Fiddling around with a large example on my computer I can confirm this is a pretty dang nice speedup!

@SteveBronder
Copy link
Contributor Author

Think I got it! Though it's a little hackey. Is there a one liner I can run for the cmdstan perf tests or do I just need to run this the old fashioned way? I'm not sure how many of the perf tests use elewise functions so not sure if an effect would show up there

@SteveBronder
Copy link
Contributor Author

Ack, it looks like I need to fix some stuff in math before we cam use this generally. Gonna close for now and see how much that would be

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Jan 14, 2020 via email

@SteveBronder
Copy link
Contributor Author

My understanding is that we've hit code freeze, so no new features for this coming release. The point of having regular releases is so that it's never too long a wait for the next release. Back when we tried to keep adding just one more thing, the release schedule got out of hand.

Yep makes sense! I'm fine with waiting

I understand the code you're generating, but not how it'll get generated in general. Will addition always return the expression template for +?

Yes + at the top level would return a CWiseBinaryOp. That feels good to me because it will be a vectorized op that can be combined with other expression

Does it mean we should get rid of the functions add() and elt_multiply() for matrices?

Out base add and elt_multiply just call + and cwiseProduct under the hood and immediately cast them to a eigen matrix. I think they would still get called for arrays so we should keep them

Is every function, etc., OK taking in these expression templates or might something break elsewhere from this?

Some break which is why I closed this for now. Stan math function would have to be able to take in eigen expressions. Most can but i think something funky is happening with transpose that wont compile

@SteveBronder
Copy link
Contributor Author

Errors are like

tests-cmdstan/cmdstan/stan/lib/stan_math/stan/math/prim/mat/err/check_pos_definite.hpp:6: In file included from /mnt/nvme1n1/workspace/stanc3_PR-427/performance-tests-cmdstan/cmdstan/stan/lib/stan_math/stan/math/prim/scal/err/check_not_nan.hpp:7: /mnt/nvme1n1/workspace/stanc3_PR-427/performance-tests-cmdstan/cmdstan/stan/lib/stan_math/stan/math/prim/scal/fun/value_of_rec.hpp:25:10: error: cannot convert 'const Eigen::CwiseBinaryOp<Eigen::internal::scalar_sum_op<stan::math::var, stan::math::var>, const Eigen::CwiseBinaryOp<Eigen::internal::scalar_sum_op<stan::math::var, stan::math::var>, const Eigen::CwiseBinaryOp<Eigen::internal::scalar_sum_op<stan::math::var, stan::math::var>, const Eigen::ArrayWrapper<Eigen::Matrix<stan::math::var, -1, 1, 0, -1, 1> >, const Eigen::ArrayWrapper<Eigen::Matrix<stan::math::var, -1, 1, 0, -1, 1> > >, const Eigen::ArrayWrapper<Eigen::Matrix<stan::math::var, -1, 1, 0, -1, 1> > >, const Eigen::ArrayWrapper<Eigen::Matrix<stan::math::var, -1, 1, 0, -1, 1> > >' to 'double' without a conversion operator return static_cast<double>(x); ^~~~~~~~~~~~~~~~~~~~~~

https://jenkins.mc-stan.org/job/stanc3/view/change-requests/job/PR-427/4/consoleFull

So the lower level math ops need to be able to handle eigen expressions better

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants