Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Use row-wise mult when adding random effects #825

Closed
SteveBronder opened this issue Jan 13, 2020 · 5 comments
Closed

[FR] Use row-wise mult when adding random effects #825

SteveBronder opened this issue Jan 13, 2020 · 5 comments

Comments

@SteveBronder
Copy link

SteveBronder commented Jan 13, 2020

Right now for a simple random effects model brms generates something like

  for (n in 1:N) {
    // add more terms to the linear predictor
    mu[n] += r_1_1[J_1[n]] * Z_1_1[n] + r_1_2[J_1[n]] * Z_1_2[n] + r_2_1[J_2[n]] * Z_2_1[n] + r_2_2[J_2[n]] * Z_2_2[n] + r_3_1[J_3[n]] * Z_3_1[n] + r_3_2[J_3[n]] * Z_3_2[n];
  }

Then the generated C++ is

        for (size_t n = 1; n <= N; ++n) {
          assign(mu, cons_list(index_uni(n), nil_index_list()),
            (mu[(n - 1)] +
              ((((((r_1_1[(J_1[(n - 1)] - 1)] * Z_1_1[(n - 1)]) +
                    (r_1_2[(J_1[(n - 1)] - 1)] * Z_1_2[(n - 1)])) +
                   (r_2_1[(J_2[(n - 1)] - 1)] * Z_2_1[(n - 1)])) +
                  (r_2_2[(J_2[(n - 1)] - 1)] * Z_2_2[(n - 1)])) +
                 (r_3_1[(J_3[(n - 1)] - 1)] * Z_3_1[(n - 1)])) +
                (r_3_2[(J_3[(n - 1)] - 1)] * Z_3_2[(n - 1)]))),
            "assigning variable mu");
        }

Would using J_1 directly with multi-indexing give the same result? ala

  mu += r_1_1[J_1] .* Z_1_1 + r_1_2[J_1] .* Z_1_2 + r_2_1[J_2] .* Z_2_1 + r_2_2[J_2] .* Z_2_2 + r_3_1[J_3] .* Z_3_1 + r_3_2[J_3] .* Z_3_2;

Because the C++ code looks a bit nicer. I can run some lil tests but the below should vectorize a bit better

        assign(mu, nil_index_list(),
          add(stan::model::deep_copy(mu),
            add(
              add(
                add(
                  add(
                    add(
                      elt_multiply(
                        rvalue(r_1_1,
                          cons_list(index_multi(J_1), nil_index_list()),
                          "r_1_1"), Z_1_1),
                      elt_multiply(
                        rvalue(r_1_2,
                          cons_list(index_multi(J_1), nil_index_list()),
                          "r_1_2"), Z_1_2)),
                    elt_multiply(
                      rvalue(r_2_1,
                        cons_list(index_multi(J_2), nil_index_list()),
                        "r_2_1"), Z_2_1)),
                  elt_multiply(
                    rvalue(r_2_2,
                      cons_list(index_multi(J_2), nil_index_list()), "r_2_2"),
                    Z_2_2)),
                elt_multiply(
                  rvalue(r_3_1,
                    cons_list(index_multi(J_3), nil_index_list()), "r_3_1"),
                  Z_3_1)),
              elt_multiply(
                rvalue(r_3_2, cons_list(index_multi(J_3), nil_index_list()),
                  "r_3_2"), Z_3_2))), "assigning variable mu");

If we put Z_* into a matrix instead of separate vectors and I added a colwise_sum method in stan math we could do something like

// causes a deep copy of mu and a sort of r_1 once
mu += colwise_sum(r_1[J_1,] .* Z_1)
@SteveBronder
Copy link
Author

It's only circumstantial evidence but throwing the above model change into cmdstanr seems to make the model a lot faster

I wonder if we could do something like cmdstan performance test but with a handful of brms models

https://github.com/stan-dev/performance-tests-cmdstan

@SteveBronder
Copy link
Author

I think this is too much fiddling, would require a bit too much of a rewrite, but the gist below has a semi complicated model with a matrix version of the random effect accumulation that's pretty general. It only assumes the random effects input vectors are all the same size.

The only major bummer is that columns_dot_product returns a row vector when we want a column vector which is why you see all the weird transpose funcs

mu += transpose(columns_dot_product(transpose(r_1_mu[J_1, ]), transpose(Z_1)));

https://gist.github.com/SteveBronder/796836a9ab72b5726b1fa8a17ce951b3

If you have any other ideas or things that would make things easier for brms models in stan math lmk!

@paul-buerkner
Copy link
Owner

paul-buerkner commented Jan 13, 2020

Thank you so much steve for diving into this!

I had the same feeling that we should be able to make this more efficient. See #772 and the 're-multiple-indexing' branch. I am not sure if I tried the raw multi-indexing stuff, that is just avoding the loop but keeping everything else the same, but the more concise version (or at least the on I tried) actually seems to be slower.

A second problem is that, due to brms' ID syntax, r_1 may need to be split into multiple parts distributed across distributional parameters or response variables. This needs to be taken into account as well when performing the indexing.

@SteveBronder
Copy link
Author

Ty!

had the same feeling that we should be able to make this more efficient. See #772 and the 're-multiple-indexing' branch. I am not sure if I tried the raw multi-indexing stuff, that is just avoding the loop but keeping everything else the same, but the more concise version (or at least the on I tried) actually seems to be slower

Can you share the mod you are checking? .* being slower might be due to some weird things upstream I can check out

A second problem is that, due to brms' ID syntax, r_1 may need to be split into multiple parts distributed across distributional parameters or response variables. This needs to be taken into account as well when performing the indexing

Ah in that case yes probs better to keep as is!

@paul-buerkner
Copy link
Owner

I will provide more information in the other thread about what exactly I have tried and with what model later on. Thank you for helping me with this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants