Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicit solver in C++ #41

Merged
merged 15 commits into from
Nov 29, 2020
Merged

Conversation

david-cortes
Copy link
Contributor

This PR adds a copycat version of the als_implicit_fun with minor changes for it work in the explicit feedback case. This is my first time using armadillo so I'm not sure I'm doing things the best way here. Particularly, I didn't get how to add values to the diagonal of a matrix (what's written in their docs doesn't compile), so this line can perhaps be done better:

 const arma::Mat<T> inv = X_nnz * X_nnz.t()
                            + regularization * arma::Mat<T>(X_nnz.n_rows, X_nnz.n_rows, arma::fill::eye);

In addition, the PR also performs mean centering when the inputs are not constrained to be non-negative. However I wasn't sure how to add an extra public attribute to an R6 class, so I added glob_mean to the list of public attributes definitions, but R complains that it's not documented. If I remove the line that adds it, it will throw an error when trying to add it dynamically as an attribute:

Error in self$glob_mean = mean(c_ui@x) : 
  cannot add bindings to a locked environment

Even though it somehow doesn't throw such error when adding self$components which was not declared beforehand.

As a next step, I also wanted to add biases to the model, but I'm not sure about the best way of going around it from a usability point of view. I think the best way would be to create a matrix for components having dimension rank+2 and also outputing a matrix of dimension rank+2 when calling transform, with one row/column in each one set to ones to match the other. What do you think about it?

Also wanted to make other potential changes:

  • When calling transform, I think it'd be a better idea to always use the Cholesky solver as the CG one usually doesn't reach the optimal solution, especially when starting from zero which is what happens in transform.
  • The library has the non-negative solver in an installable header, but I don't think one would want that file to be installed. Better move the header to /src instead.

@david-cortes
Copy link
Contributor Author

Nevermind the comment about R6 attributes, I now notices that components was defined in the base class so I put the global mean in there too.

@codecov
Copy link

codecov bot commented Nov 23, 2020

Codecov Report

Merging #41 (cc5654a) into master (56e252e) will decrease coverage by 0.62%.
The diff coverage is 73.21%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #41      +/-   ##
==========================================
- Coverage   73.76%   73.14%   -0.63%     
==========================================
  Files          28       28              
  Lines        1845     2022     +177     
==========================================
+ Hits         1361     1479     +118     
- Misses        484      543      +59     
Impacted Files Coverage Δ
src/TopProduct.cpp 94.44% <33.33%> (-3.60%) ⬇️
src/SolverAlsWrmf.cpp 78.36% <68.34%> (-20.46%) ⬇️
R/model_WRMF.R 78.05% <82.50%> (+0.10%) ⬆️
R/MatrixFactorizationRecommender.R 48.27% <100.00%> (ø)
R/utils.R 89.28% <100.00%> (ø)
src/nnls.cpp 0.00% <0.00%> (-100.00%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 56e252e...cecfb3d. Read the comment docs.

@dselivanov
Copy link
Owner

Thanks for PR.

As a next step, I also wanted to add biases to the model, but I'm not sure about the best way of going around it from a usability point of view. I think the best way would be to create a matrix for components having dimension rank+2 and also outputing a matrix of dimension rank+2 when calling transform, with one row/column in each one set to ones to match the other. What do you think about it?

That will be great addition. Approach with rank+2 makes sense. I would do the same.

When calling transform, I think it'd be a better idea to always use the Cholesky solver as the CG one usually doesn't reach the optimal solution, especially when starting from zero which is what happens in transform

makes sense. Do you have real-life examples of sub-optimal quality solution? Will it make any difference (hit@k / ndcg@k) when making recommendations using lastfm360k?

The library has the non-negative solver in an installable header, but I don't think one would want that file to be installed. Better move the header to /src instead.

Eventually I would like to move solvers to a standalone header only C++ library (with only dependency on arma). So one day there might be a python interface to solvers in rsparse (wanted to try https://github.com/RUrlus/carma, but python build system is still a bit mysterious for me). I had this in mind for some time, but didn't have time to start. So nnls.hpp is a first step.
BTW you may be noticed SolverRankMF.cpp. This one is ~15-20x faster compared to lightfm and if I remember correctly ~10x faster compared to implicit BPR MF on CPU.

@david-cortes
Copy link
Contributor Author

I'm not sure how much of a difference does a sub-optimal solution make when computing recommendations, but here's an example showing that the difference can be rather big:

library(Matrix)
library(rsparse)
data("movielens100k")

X <- movielens100k
set.seed(123)
m <- WRMF$new(rank=100, lambda=1,
              feedback="explicit",
              solver="conjugate_gradient")
factors_fit <- m$fit_transform(X)
factors_transf <- m$transform(X)
difference = factors_fit[10, ] - factors_transf[10, ]
cat("||diff same||: ", sqrt(difference %*% difference), "\n")
diff_rnd = factors_fit[10, ] - factors_fit[11, ]
cat("||diff other||: ", sqrt(diff_rnd %*% diff_rnd), "\n")


set.seed(123)
m2 <- WRMF$new(rank=100, lambda=1,
               feedback="explicit",
               solver="cholesky")
factors_fit2 <- m2$fit_transform(X)
factors_transf2 <- m2$transform(X)
difference2 = factors_fit2[10, ] - factors_transf2[10, ]
cat("||diff same||: ", sqrt(difference2 %*% difference2), "\n")
diff_rnd2 = factors_fit2[10, ] - factors_fit2[11, ]
cat("||diff other||: ", sqrt(diff_rnd2 %*% diff_rnd2), "\n")

With CG:

||diff same||:  0.7800775
||diff other||:  3.566516

With Cholesky:

||diff same||:  0.09373769
||diff other||:  3.560296

And if you inspect the vector of differences it doesn't look small at all:

difference
  [1]  0.0127499651  0.0143486627  0.1208470001  0.0788502813 -0.0013807357 -0.0099117434  0.0204211998
  [8]  0.0412307875 -0.0056689127  0.1393863037 -0.1863619531 -0.0259938203  0.1340213561  0.0850409956
 [15] -0.1270411701  0.0912398027  0.0642978719 -0.0477396709 -0.0427490033 -0.0801674094 -0.0047753663
 [22] -0.0770143656 -0.0635077166 -0.0206334300 -0.0001753938  0.0547208784 -0.0090631809 -0.0854853832
 [29]  0.1741550885  0.0052555909 -0.0667294350  0.0087404726  0.1151789801  0.0101277141  0.0846622292
 [36] -0.0992792819  0.0890092132 -0.0735728099 -0.0857616448 -0.1264824624  0.0710994151 -0.0809849841
 [43] -0.0980387508  0.0508430290 -0.1405652974  0.1359459940 -0.1166028153  0.0421086208 -0.0140544701
 [50]  0.0344551605  0.0157901592  0.1276604140 -0.0109623805 -0.0439943739 -0.0946275574  0.0341311424
 [57] -0.1193419474 -0.0570416644  0.0441665448  0.1574439171  0.0374196889 -0.0577082312  0.0071502418
 [64]  0.0873868136 -0.0934141613  0.0071387634 -0.0207400443  0.0108728827 -0.0250645020 -0.0715137239
 [71] -0.0692711938 -0.0462565404 -0.0585163161  0.0565288607 -0.0964034757  0.1317269375 -0.0590797522
 [78]  0.0244796766 -0.0496083246 -0.0667767908  0.0341947939 -0.0465304349 -0.0759954706 -0.0642911897
 [85]  0.1403484631  0.1039142085 -0.0111111490  0.0830722849  0.1013928378 -0.1036686256 -0.0381310334
 [92]  0.0083235620 -0.0162638356  0.0177373712 -0.0003093340  0.1067158155  0.1184988123  0.0413801166
 [99]  0.0953847600 -0.0506076053

@dselivanov
Copy link
Owner

Okay, you've spotted interesting issue. Results from transform and fit_transform should be identical / very close regardless of the method. The issue here is that we start ALS by updating user embeddings and finish by updating item embeddings. We need to do it other way around.

@david-cortes
Copy link
Contributor Author

But even if you do it the other way waround, you'll still see such a difference. Here's an example using the package cmfrec which already does it the other way around:

library(Matrix)
library(rsparse)
library(cmfrec)
data("movielens100k")

X <- movielens100k
X <- as(X, "TsparseMatrix")
set.seed(123)
m <- CMF(X, k=100, lambda=1, use_cg=TRUE, finalize_chol=FALSE)
factors_fit <- t(m$matrices$A)
factors_transf <- factors(m, X)
difference = factors_fit[10, ] - factors_transf[10, ]
cat("||diff same||: ", sqrt(difference %*% difference), "\n")
diff_rnd = factors_fit[10, ] - factors_fit[11, ]
cat("||diff other||: ", sqrt(diff_rnd %*% diff_rnd), "\n")


set.seed(123)
m2 <- CMF(X, k=100, lambda=1, use_cg=FALSE)
factors_fit2 <- t(m2$matrices$A)
factors_transf2 <- factors(m2, X)
difference2 = factors_fit2[10, ] - factors_transf2[10, ]
cat("||diff same||: ", sqrt(difference2 %*% difference2), "\n")
diff_rnd2 = factors_fit2[10, ] - factors_fit2[11, ]
cat("||diff other||: ", sqrt(diff_rnd2 %*% diff_rnd2), "\n")

CG:

||diff same||:  1.345862
||diff other||:  5.431094

Cholesky:

||diff same||:  2.168532e-15
||diff other||:  3.405199

@david-cortes
Copy link
Contributor Author

Added functionality to calculate biases. This has 2 problems however:

  • At every iteration, it allocates new copies of the factors matrices, computes the factors there, and then copies them back to the originals, which is inefficient. I didn't get how to create memoryviews in armadillo so perhaps you could check it.
  • It initializes the biases to random numbers, which is not optimal. A better procedure would start by estimating the biases without factors beforehand, and as it is, adding the biases will lead to worse results than discarding them.

The procedure for initializing the biases could be as follows:

  • Initialize them to e.g. item_bias = sum_by_item(R) / (cnt_by_item + lambda), then user_bias = sum_by_user(R - item_bias) / (cnt_by_user + lambda).
  • (Better alternative) Do a round of succesive updates:
    • item_bias := 0, user_bias := 0
    • item_bias = sum_by_item(R - user_bias) / (cnt_by_item + lambda)
    • user_bias = sum_by_user(R - item_bias) / (cnt_by_user + lambda)

But that I wouldn't know how to do in armadillo either.

Also would be better if you could make the default lambda higher for explicit feedback as it tends to give very bad results when there is no regularization.

@david-cortes
Copy link
Contributor Author

david-cortes commented Nov 23, 2020

Correction about the comment: even with random bias initializations, they still seems to provide a small lift in performance.

Got these numbers for the ML10M (rank=40+biases, lambda=35, 10 iterations):

  • RMSE without biases: ~0.813
  • RMSE with biases: ~0.801
    Nevertheless, the initialization of the biases can still be improved.

arma::Col<T> scd_nnls_solver_explicit(const arma::Mat<T> &X_nnz,
const arma::Col<T> &confidence,
T regularization) {
const arma::Mat<T> inv = X_nnz * X_nnz.t()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inv.diag() += regularization; should work

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't compile unfortunately.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's because of const. I've updated your branch

@dselivanov
Copy link
Owner

@david-cortes it will take couple more days to review

dselivanov added a commit that referenced this pull request Nov 29, 2020
- optional biases for explicit feedback

related to #41
@dselivanov
Copy link
Owner

@david-cortes I've started to review and eventually modified big part of the WRMF solver (pushed to master now). Now it is much easier to read and reason about. It will take a bit more time to incorporate your additions and merge this PR.

Also I haven't noticed that different user and item bias initializations has significant impact on the loss.

@dselivanov dselivanov merged commit 35247b4 into dselivanov:master Nov 29, 2020
@dselivanov
Copy link
Owner

@david-cortes I've resolved conflicts and merged. Thanks a lot for your work!

However I've commented out code related to bias init and global mean (as I don't yet fully understand how it works). Would you be able to try current version and add this logic in a new PR?

TODO from my side:

  • add your CG solver for explicit feedback to R interface
  • add biases for implicit feedback

@david-cortes
Copy link
Contributor Author

Ok, will re-add them. A small reminder just in case that, if you decide to add biases for the implicit-feedback version, you'd have to do it through t(X)*R rather than through R.

}

template <class T>
void initialize_biases(const dMappedCSC& ConfCSC,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@david-cortes I believe it will be easier/more efficient to pass global_mean here and to ALS solver rather then subtracting it from c_ui and c_iu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants