Explicit solver in C++ #41

david-cortes · 2020-11-23T11:03:33Z

This PR adds a copycat version of the als_implicit_fun with minor changes for it work in the explicit feedback case. This is my first time using armadillo so I'm not sure I'm doing things the best way here. Particularly, I didn't get how to add values to the diagonal of a matrix (what's written in their docs doesn't compile), so this line can perhaps be done better:

 const arma::Mat<T> inv = X_nnz * X_nnz.t()
                            + regularization * arma::Mat<T>(X_nnz.n_rows, X_nnz.n_rows, arma::fill::eye);

In addition, the PR also performs mean centering when the inputs are not constrained to be non-negative. However I wasn't sure how to add an extra public attribute to an R6 class, so I added glob_mean to the list of public attributes definitions, but R complains that it's not documented. If I remove the line that adds it, it will throw an error when trying to add it dynamically as an attribute:

Error in self$glob_mean = mean(c_ui@x) : 
  cannot add bindings to a locked environment

Even though it somehow doesn't throw such error when adding self$components which was not declared beforehand.

As a next step, I also wanted to add biases to the model, but I'm not sure about the best way of going around it from a usability point of view. I think the best way would be to create a matrix for components having dimension rank+2 and also outputing a matrix of dimension rank+2 when calling transform, with one row/column in each one set to ones to match the other. What do you think about it?

Also wanted to make other potential changes:

When calling transform, I think it'd be a better idea to always use the Cholesky solver as the CG one usually doesn't reach the optimal solution, especially when starting from zero which is what happens in transform.
The library has the non-negative solver in an installable header, but I don't think one would want that file to be installed. Better move the header to /src instead.

david-cortes · 2020-11-23T11:10:25Z

Nevermind the comment about R6 attributes, I now notices that components was defined in the base class so I put the global mean in there too.

codecov · 2020-11-23T11:52:23Z

Codecov Report

Merging #41 (cc5654a) into master (56e252e) will decrease coverage by 0.62%.
The diff coverage is 73.21%.

@@            Coverage Diff             @@
##           master      #41      +/-   ##
==========================================
- Coverage   73.76%   73.14%   -0.63%     
==========================================
  Files          28       28              
  Lines        1845     2022     +177     
==========================================
+ Hits         1361     1479     +118     
- Misses        484      543      +59

Impacted Files	Coverage Δ
src/TopProduct.cpp	`94.44% <33.33%> (-3.60%)`	⬇️
src/SolverAlsWrmf.cpp	`78.36% <68.34%> (-20.46%)`	⬇️
R/model_WRMF.R	`78.05% <82.50%> (+0.10%)`	⬆️
R/MatrixFactorizationRecommender.R	`48.27% <100.00%> (ø)`
R/utils.R	`89.28% <100.00%> (ø)`
src/nnls.cpp	`0.00% <0.00%> (-100.00%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 56e252e...cecfb3d. Read the comment docs.

dselivanov · 2020-11-23T12:29:45Z

Thanks for PR.

As a next step, I also wanted to add biases to the model, but I'm not sure about the best way of going around it from a usability point of view. I think the best way would be to create a matrix for components having dimension rank+2 and also outputing a matrix of dimension rank+2 when calling transform, with one row/column in each one set to ones to match the other. What do you think about it?

That will be great addition. Approach with rank+2 makes sense. I would do the same.

When calling transform, I think it'd be a better idea to always use the Cholesky solver as the CG one usually doesn't reach the optimal solution, especially when starting from zero which is what happens in transform

makes sense. Do you have real-life examples of sub-optimal quality solution? Will it make any difference (hit@k / ndcg@k) when making recommendations using lastfm360k?

The library has the non-negative solver in an installable header, but I don't think one would want that file to be installed. Better move the header to /src instead.

Eventually I would like to move solvers to a standalone header only C++ library (with only dependency on arma). So one day there might be a python interface to solvers in rsparse (wanted to try https://github.com/RUrlus/carma, but python build system is still a bit mysterious for me). I had this in mind for some time, but didn't have time to start. So nnls.hpp is a first step.
BTW you may be noticed SolverRankMF.cpp. This one is ~15-20x faster compared to lightfm and if I remember correctly ~10x faster compared to implicit BPR MF on CPU.

david-cortes · 2020-11-23T13:57:49Z

I'm not sure how much of a difference does a sub-optimal solution make when computing recommendations, but here's an example showing that the difference can be rather big:

library(Matrix)
library(rsparse)
data("movielens100k")

X <- movielens100k
set.seed(123)
m <- WRMF$new(rank=100, lambda=1,
              feedback="explicit",
              solver="conjugate_gradient")
factors_fit <- m$fit_transform(X)
factors_transf <- m$transform(X)
difference = factors_fit[10, ] - factors_transf[10, ]
cat("||diff same||: ", sqrt(difference %*% difference), "\n")
diff_rnd = factors_fit[10, ] - factors_fit[11, ]
cat("||diff other||: ", sqrt(diff_rnd %*% diff_rnd), "\n")


set.seed(123)
m2 <- WRMF$new(rank=100, lambda=1,
               feedback="explicit",
               solver="cholesky")
factors_fit2 <- m2$fit_transform(X)
factors_transf2 <- m2$transform(X)
difference2 = factors_fit2[10, ] - factors_transf2[10, ]
cat("||diff same||: ", sqrt(difference2 %*% difference2), "\n")
diff_rnd2 = factors_fit2[10, ] - factors_fit2[11, ]
cat("||diff other||: ", sqrt(diff_rnd2 %*% diff_rnd2), "\n")

With CG:

||diff same||:  0.7800775
||diff other||:  3.566516

With Cholesky:

||diff same||:  0.09373769
||diff other||:  3.560296

And if you inspect the vector of differences it doesn't look small at all:

difference

  [1]  0.0127499651  0.0143486627  0.1208470001  0.0788502813 -0.0013807357 -0.0099117434  0.0204211998
  [8]  0.0412307875 -0.0056689127  0.1393863037 -0.1863619531 -0.0259938203  0.1340213561  0.0850409956
 [15] -0.1270411701  0.0912398027  0.0642978719 -0.0477396709 -0.0427490033 -0.0801674094 -0.0047753663
 [22] -0.0770143656 -0.0635077166 -0.0206334300 -0.0001753938  0.0547208784 -0.0090631809 -0.0854853832
 [29]  0.1741550885  0.0052555909 -0.0667294350  0.0087404726  0.1151789801  0.0101277141  0.0846622292
 [36] -0.0992792819  0.0890092132 -0.0735728099 -0.0857616448 -0.1264824624  0.0710994151 -0.0809849841
 [43] -0.0980387508  0.0508430290 -0.1405652974  0.1359459940 -0.1166028153  0.0421086208 -0.0140544701
 [50]  0.0344551605  0.0157901592  0.1276604140 -0.0109623805 -0.0439943739 -0.0946275574  0.0341311424
 [57] -0.1193419474 -0.0570416644  0.0441665448  0.1574439171  0.0374196889 -0.0577082312  0.0071502418
 [64]  0.0873868136 -0.0934141613  0.0071387634 -0.0207400443  0.0108728827 -0.0250645020 -0.0715137239
 [71] -0.0692711938 -0.0462565404 -0.0585163161  0.0565288607 -0.0964034757  0.1317269375 -0.0590797522
 [78]  0.0244796766 -0.0496083246 -0.0667767908  0.0341947939 -0.0465304349 -0.0759954706 -0.0642911897
 [85]  0.1403484631  0.1039142085 -0.0111111490  0.0830722849  0.1013928378 -0.1036686256 -0.0381310334
 [92]  0.0083235620 -0.0162638356  0.0177373712 -0.0003093340  0.1067158155  0.1184988123  0.0413801166
 [99]  0.0953847600 -0.0506076053

dselivanov · 2020-11-23T14:29:32Z

Okay, you've spotted interesting issue. Results from transform and fit_transform should be identical / very close regardless of the method. The issue here is that we start ALS by updating user embeddings and finish by updating item embeddings. We need to do it other way around.

david-cortes · 2020-11-23T14:49:23Z

But even if you do it the other way waround, you'll still see such a difference. Here's an example using the package cmfrec which already does it the other way around:

library(Matrix)
library(rsparse)
library(cmfrec)
data("movielens100k")

X <- movielens100k
X <- as(X, "TsparseMatrix")
set.seed(123)
m <- CMF(X, k=100, lambda=1, use_cg=TRUE, finalize_chol=FALSE)
factors_fit <- t(m$matrices$A)
factors_transf <- factors(m, X)
difference = factors_fit[10, ] - factors_transf[10, ]
cat("||diff same||: ", sqrt(difference %*% difference), "\n")
diff_rnd = factors_fit[10, ] - factors_fit[11, ]
cat("||diff other||: ", sqrt(diff_rnd %*% diff_rnd), "\n")


set.seed(123)
m2 <- CMF(X, k=100, lambda=1, use_cg=FALSE)
factors_fit2 <- t(m2$matrices$A)
factors_transf2 <- factors(m2, X)
difference2 = factors_fit2[10, ] - factors_transf2[10, ]
cat("||diff same||: ", sqrt(difference2 %*% difference2), "\n")
diff_rnd2 = factors_fit2[10, ] - factors_fit2[11, ]
cat("||diff other||: ", sqrt(diff_rnd2 %*% diff_rnd2), "\n")

CG:

||diff same||:  1.345862
||diff other||:  5.431094

Cholesky:

||diff same||:  2.168532e-15
||diff other||:  3.405199

david-cortes · 2020-11-23T17:14:35Z

Added functionality to calculate biases. This has 2 problems however:

At every iteration, it allocates new copies of the factors matrices, computes the factors there, and then copies them back to the originals, which is inefficient. I didn't get how to create memoryviews in armadillo so perhaps you could check it.
It initializes the biases to random numbers, which is not optimal. A better procedure would start by estimating the biases without factors beforehand, and as it is, adding the biases will lead to worse results than discarding them.

The procedure for initializing the biases could be as follows:

Initialize them to e.g. item_bias = sum_by_item(R) / (cnt_by_item + lambda), then user_bias = sum_by_user(R - item_bias) / (cnt_by_user + lambda).
(Better alternative) Do a round of succesive updates:
- item_bias := 0, user_bias := 0
- item_bias = sum_by_item(R - user_bias) / (cnt_by_item + lambda)
- user_bias = sum_by_user(R - item_bias) / (cnt_by_user + lambda)

But that I wouldn't know how to do in armadillo either.

Also would be better if you could make the default lambda higher for explicit feedback as it tends to give very bad results when there is no regularization.

david-cortes · 2020-11-23T18:51:25Z

Correction about the comment: even with random bias initializations, they still seems to provide a small lift in performance.

Got these numbers for the ML10M (rank=40+biases, lambda=35, 10 iterations):

RMSE without biases: ~0.813
RMSE with biases: ~0.801
Nevertheless, the initialization of the biases can still be improved.

dselivanov · 2020-11-24T05:43:00Z

src/SolverAlsWrmf.cpp

+arma::Col<T> scd_nnls_solver_explicit(const arma::Mat<T> &X_nnz,
+                                      const arma::Col<T> &confidence,
+                                      T regularization) {
+  const arma::Mat<T> inv = X_nnz * X_nnz.t()


inv.diag() += regularization; should work

That doesn't compile unfortunately.

That's because of const. I've updated your branch

- minor formatting

…avid-cortes-explicit_cpp

dselivanov · 2020-11-26T18:28:49Z

@david-cortes it will take couple more days to review

- optional biases for explicit feedback related to #41

dselivanov · 2020-11-29T11:10:13Z

@david-cortes I've started to review and eventually modified big part of the WRMF solver (pushed to master now). Now it is much easier to read and reason about. It will take a bit more time to incorporate your additions and merge this PR.

Also I haven't noticed that different user and item bias initializations has significant impact on the loss.

dselivanov · 2020-11-29T19:07:48Z

@david-cortes I've resolved conflicts and merged. Thanks a lot for your work!

However I've commented out code related to bias init and global mean (as I don't yet fully understand how it works). Would you be able to try current version and add this logic in a new PR?

TODO from my side:

add your CG solver for explicit feedback to R interface
add biases for implicit feedback

david-cortes · 2020-11-29T19:17:41Z

Ok, will re-add them. A small reminder just in case that, if you decide to add biases for the implicit-feedback version, you'd have to do it through t(X)*R rather than through R.

dselivanov · 2020-11-29T19:22:40Z

src/SolverAlsWrmf.cpp

+}
+
+template <class T>
+void initialize_biases(const dMappedCSC& ConfCSC,


@david-cortes I believe it will be easier/more efficient to pass global_mean here and to ALS solver rather then subtracting it from c_ui and c_iu.

david-cortes added 3 commits November 22, 2020 20:33

avoid negative init when using non_negative

7e17bfc

move explicit solver to c++

e0e9fe7

move mean to parent class

a2e4660

add back glob_mean to the scores

722b770

added option for biases in explicit model

608e9af

david-cortes added 3 commits November 23, 2020 21:36

avoid modifying input data in-place

4c7086a

avoid modifying data in transform too

99b05dd

avoid data copy, it's already done when transposing it

6a2a7e7

dselivanov reviewed Nov 24, 2020

View reviewed changes

dselivanov and others added 3 commits November 24, 2020 14:42

- more straightforward diag addition

1f8325d

- minor formatting

fix failing test

5dd2ee5

Merge branch 'explicit_cpp' of github.com:david-cortes/rsparse into d…

ff8eda4

…avid-cortes-explicit_cpp

david-cortes added 2 commits November 28, 2020 16:30

don't add biases to loss

a0b1cae

better intialization for the biases

b6541b1

david-cortes force-pushed the explicit_cpp branch from 9126dba to b6541b1 Compare November 28, 2020 16:30

david-cortes added 2 commits November 28, 2020 18:50

avoid negative init of biases when using non_negative

4857cd4

use the same initialization for double and float

cecfb3d

dselivanov added a commit that referenced this pull request Nov 29, 2020

- C++ solver for explicit feedback

9c24b77

- optional biases for explicit feedback related to #41

dselivanov merged commit 35247b4 into dselivanov:master Nov 29, 2020

dselivanov reviewed Nov 29, 2020

View reviewed changes

david-cortes mentioned this pull request Nov 29, 2020

Don't use CG in transform #47

Merged

david-cortes mentioned this pull request Jun 27, 2024

Solve CRAN issues from Rcpp with UBSAN #76

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explicit solver in C++ #41

Explicit solver in C++ #41

david-cortes commented Nov 23, 2020

david-cortes commented Nov 23, 2020

codecov bot commented Nov 23, 2020 •

edited

Loading

dselivanov commented Nov 23, 2020

david-cortes commented Nov 23, 2020

dselivanov commented Nov 23, 2020

david-cortes commented Nov 23, 2020

david-cortes commented Nov 23, 2020

david-cortes commented Nov 23, 2020 •

edited

Loading

dselivanov Nov 24, 2020

david-cortes Nov 24, 2020

dselivanov Nov 24, 2020

dselivanov commented Nov 26, 2020

dselivanov commented Nov 29, 2020

dselivanov commented Nov 29, 2020

david-cortes commented Nov 29, 2020

dselivanov Nov 29, 2020

Explicit solver in C++ #41

Explicit solver in C++ #41

Conversation

david-cortes commented Nov 23, 2020

david-cortes commented Nov 23, 2020

codecov bot commented Nov 23, 2020 • edited Loading

Codecov Report

dselivanov commented Nov 23, 2020

david-cortes commented Nov 23, 2020

dselivanov commented Nov 23, 2020

david-cortes commented Nov 23, 2020

david-cortes commented Nov 23, 2020

david-cortes commented Nov 23, 2020 • edited Loading

dselivanov Nov 24, 2020

Choose a reason for hiding this comment

david-cortes Nov 24, 2020

Choose a reason for hiding this comment

dselivanov Nov 24, 2020

Choose a reason for hiding this comment

dselivanov commented Nov 26, 2020

dselivanov commented Nov 29, 2020

dselivanov commented Nov 29, 2020

david-cortes commented Nov 29, 2020

dselivanov Nov 29, 2020

Choose a reason for hiding this comment

codecov bot commented Nov 23, 2020 •

edited

Loading

david-cortes commented Nov 23, 2020 •

edited

Loading