WRMF user and item biases for implicit feedback data #53

dselivanov · 2020-12-09T19:02:31Z

@dselivanov I’m not so sure it’s something desirable to have actually. I tried playing with centering and biases with implicit-feedback data, and I see that adding user biases usually gives a very small lift in metrics like HR@5, but item biases makes them much worse.

You can play with cmfrec (version from git, the one from CRAN has bugs for this use-case) like this with e.g. the lastFM data or similar, which would fit the same model as WRMF with feedback="implicit":

library(cmfrec)
Xvalues <- Xcoo@x
Xcoo@x <- rep(1, length(Xcoo@x))
model <- CMF(Xcoo, weight=Xvalues, NA_as_zero=TRUE,
             center=TRUE, user_bias=TRUE, item_bias=TRUE)

Originally posted by @david-cortes in #44 (comment)

The text was updated successfully, but these errors were encountered:

dselivanov · 2020-12-09T19:06:58Z

@david-cortes that's interesting information. Do biases help to reduce loss?

My intuition is that item biases should be highly correlated with item popularity. I would expect them help significantly for users with few interactions.

As for centering - I'm not sure if it will help.

david-cortes · 2020-12-10T21:33:23Z

I'm realizing there might be a bug with the item biases in cmfrec. But here's a comparison with centering and user biases for now:

library(Matrix)
library(rsparse)
library(cmfrec)
data("movielens100k")

eval.full.loss <-  function(X, W, A, B, user_bias=NULL, item_bias=NULL, glob_mean=NULL) {
    Xdense <- as(X, "matrix")
    X@x <- W
    Wdense <- as(X, "matrix")
    pred <- A %*% t(B)
    if (!is.null(glob_mean))
        pred <- pred + glob_mean
    if (NROW(user_bias))
        pred <- pred + user_bias
    if (NROW(item_bias))
        pred <- pred + item_bias
    err <- Wdense * ((Xdense - pred)^2)
    return(mean(err))
}

Xcoo <- as(movielens100k, "TsparseMatrix")
Xvalues <- Xcoo@x
Xcoo@x <- rep(1, length(Xcoo@x))

set.seed(123)
m.nobias.nocenter <- CMF(Xcoo, weight=Xvalues, NA_as_zero=TRUE, k=10, verbose=FALSE,
                         center=FALSE, user_bias=FALSE, item_bias=FALSE)
eval.full.loss(Xcoo, Xvalues,
               t(m.nobias.nocenter$matrices$A),
               t(m.nobias.nocenter$matrices$B))



set.seed(123)
m.nobias.center <- CMF(Xcoo, weight=Xvalues, NA_as_zero=TRUE, k=10, verbose=FALSE,
                       center=TRUE, user_bias=FALSE, item_bias=FALSE)
eval.full.loss(Xcoo, Xvalues,
               t(m.nobias.center$matrices$A),
               t(m.nobias.center$matrices$B),
               glob_mean = m.nobias.center$matrices$glob_mean)


set.seed(123)
m.userbias.center <- CMF(Xcoo, weight=Xvalues, NA_as_zero=TRUE, k=10, verbose=FALSE,
                         center=TRUE, user_bias=TRUE, item_bias=FALSE)
eval.full.loss(Xcoo, Xvalues,
               t(m.userbias.center$matrices$A),
               t(m.userbias.center$matrices$B),
               user_bias = m.userbias.center$matrices$user_bias,
               glob_mean = m.userbias.center$matrices$glob_mean)

Results:

[1] 0.04399711
[1] 0.04234777
[1] 0.04188822

So the change doesn't look too big.

david-cortes · 2020-12-11T16:17:21Z

@dselivanov My bad, there's actually no bug in cmfrec, it's this loss function that was wrong. Doing the experiment again:

EDIT: just realized that R doesn't do matrix + row-vector sums, here's the function redone and compared with all the biases

eval.full.loss <-  function(X, W, A, B, user_bias=NULL, item_bias=NULL, glob_mean=NULL) {
    Xdense <- as(X, "matrix")
    X@x <- W - 1
    Wdense <- as(X, "matrix") + 1
    pred <- A %*% t(B)
    if (!is.null(glob_mean))
        pred <- pred + glob_mean
    if (NROW(user_bias))
        pred <- pred + user_bias
    if (NROW(item_bias))
        pred <- pred + matrix(rep(item_bias, nrow(pred)), nrow=nrow(pred), byrow = TRUE)
    err <- Wdense * ((Xdense - pred)^2)
    return(mean(err))
}

[1] 0.0726521 ## no bias, no center
[1] 0.07276726 ## center
[1] 0.07216536 ## center + user bias
[1] 0.07158445 ## center + user bias + item bias

So in the end the item biases do bring some small improvement in terms of loss, and my earlier tests were wrong (was summing them incorrectly).

dselivanov · 2020-12-11T18:01:02Z

@david-cortes thanks for example. I will try to play with cmfrec and user-item biases. Could you provide a link to the code where user-item biases for implicit feedback implemented?

david-cortes · 2020-12-11T21:46:15Z

It's implemented through different functions. First it has some aggregated steps like this:
https://github.com/david-cortes/cmfrec/blob/259057fcb59f2c0115f9737c6a18cbe1347925e9/src/collective.c#L7518

Then it calls function factors_closed_form in a loop here:
https://github.com/david-cortes/cmfrec/blob/259057fcb59f2c0115f9737c6a18cbe1347925e9/src/common.c#L625
(key there are the variables named "bias")

If using the CG method, that function will then end up calling this other one:
https://github.com/david-cortes/cmfrec/blob/259057fcb59f2c0115f9737c6a18cbe1347925e9/src/common.c#L1067

But overall, the idea is that you're solving a system like this:

solve(t(W*X)*X + diag(lambda),   t(W*X)(Y-glob_bias-column_bias)

In which the RHS can be decomposed into some parts that apply to all rows and some parts that turn to zero for missing entries:

t(W*X)*(Y)  - t((W-1)*X*(glob_bias+column_bias)) - t(X)*(glob_bias+column_bias)

dselivanov · 2020-12-24T08:16:43Z

@david-cortes I have some challenges with cmfrec... Would be great if you can provide an example on how to predict top n items for new users. Here I've put a template:

library(Matrix)
library(rsparse)
library(cmfrec)
data(movielens100k)

set.seed(1)
# take 100 users for validation
i = sample(nrow(movielens100k), 100)

val = movielens100k[i, ]
train = movielens100k[-i, ]

# now mark 30% of the interactions as observed and
# 70% as unobserved - will evaluate map@k at these 70%
val_split = rsparse:::train_test_split(val, test_proportion = 0.7)
str(val_split)

List of 2
$ train:Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
.. ..@ i : int [1:3039] 10 13 24 32 45 46 52 54 55 59 ...
.. ..@ p : int [1:1683] 0 13 17 20 26 30 31 42 53 61 ...
.. ..@ Dim : int [1:2] 100 1682
.. ..@ Dimnames:List of 2
.. .. ..$ : chr [1:100] "836" "679" "129" "930" ...
.. .. ..$ : chr [1:1682] "1" "2" "3" "4" ...
.. ..@ x : num [1:3039] 3 3 4 3 4 2 5 5 4 5 ...
.. ..@ factors : list()
$ test :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
.. ..@ i : int [1:6983] 1 3 5 6 9 11 14 30 31 33 ...
.. ..@ p : int [1:1683] 0 41 48 54 66 71 72 102 117 137 ...
.. ..@ Dim : int [1:2] 100 1682
.. ..@ Dimnames:List of 2
.. .. ..$ : chr [1:100] "836" "679" "129" "930" ...
.. .. ..$ : chr [1:1682] "1" "2" "3" "4" ...
.. ..@ x : num [1:6983] 3 3 4 3 5 4 5 4 3 3 ...
.. ..@ factors : list()

train = as(train, "TsparseMatrix")
w = train@x
train@x = rep(1, length(train@x))

model = CMF(train, 
            weight = w, 
            NA_as_zero = TRUE, 
            k = 10, 
            verbose = TRUE, 
            center = FALSE, 
            user_bias = FALSE, 
            item_bias = FALSE)

Now the question is how can I use topN_new() to make predictions based on val_split$trainand validate against val_split$test

david-cortes · 2020-12-24T10:58:05Z

Sorry had a bug with topN that threw incorrect results when not using biases, fixed now.

The easiest way to use that function is to pass the data as a sparseVector from Matrix:

val.csr = as(val, "RsparseMatrix")
w.val = val.csr@x
val.csr@x = rep(1, length(val.csr@x))

### TopN for row 1 in val
topN_new(model, as(val.csr[1L, , drop=FALSE], "sparseVector"),
         weight = w.val[seq(val.csr@p[1L] + 1L, val.csr@p[2L])])

Although you could also take the matrices and replace the values of $components from a WRMF object. These are available under model$matrices (A is the user factors, B is the item factors, and the biases are user_bias, item_bias, glob_mean).

dselivanov · 2020-12-24T12:29:50Z

Although you could also take the matrices and replace the values of $components from a WRMF object. These are available under model$matrices (A is the user factors, B is the item factors, and the biases are user_bias, item_bias, glob_mean)

But still there is an ALS step when item-embeddings are fixed. And this should call cmfrec solver under the hood.

map@k

here how I calculate map@k with cmfrec

predict_cmfrec = function(model, X) {
  n = nrow(X)
  X = as(X, "RsparseMatrix")
  res = lapply(seq_len(n), function (i) {
    if (i %% 10 == 0)message(sprintf("%d/%d", i, n))
    x = as(X[i, , drop=FALSE], "sparseVector")
    w = x@x
    x@x = rep(1, length(x@x))
    preds = topN_new(
      model,
      x,
      weight = w,
      exclude = x@i
    )
    preds
  })
  do.call(rbind, res)
}

preds = predict_cmfrec(model, val_split$train)
mean(rsparse::ap_k(preds, val_split$test))

For lastfm360 it looks there is a moderate lift in map@10:

0.2880954 without user and item biases
0.298398 with user and item biases

dselivanov · 2020-12-24T12:48:08Z

@david-cortes also when lambda > 0 I observe very strong correlation between item bias and item popularity. Interestingly that when lambda close to 0 this is not the case. I think there might be chance that are some issues in the code related to this fact.

david-cortes · 2020-12-24T13:53:04Z

I'll have to guess that this increment in MAP@10 is less than what you'd see from using k+2 factors, ergo not worth it.

The low correlation might be due to numerical instability when using too small lambda. By default it uses a GC solver and then switches to Cholesky in the last iteration, so perhaps it'd look a bit better using finalize_chol=FALSE.

Or perhaps could be due to how you're measuring popularity. There is also a model MostPopular which will calculate only the biases, using their closed-form solution:

model = MostPopular(X, implicit=TRUE, lambda=0)

dselivanov · 2020-12-24T15:23:49Z

I'll have to guess that this increment in MAP@10 is less than what you'd see from using k+2 factors, ergo not worth it.

Well, it is rather worth to compare to the model with k+1 factors. Also biases should make a huge difference for users with few/no interactions.

dselivanov · 2020-12-27T07:27:41Z

What I've figured out so far:

only code for rhs is affected
rhs = X * C_u * (p_u - x_biases) = X * eye * (0 - x_biases) + X * diag(1 + f(r_ui)) * (1 - x_biases)
rhs_init = X * eye * (0 - x_biases) = -X * x_biases can be precomputed
then for each user we calculate
- rhs = rhs_init + X.cols(idx_nnz) * x_biases(idx_nnz) - removing p=0 terms from init
- rhs = rhs + X.cols(idx_nnz) * diag(confidence(idx_nnz)) * (1 - x_biases(idx_nnz)) - adding p=1 terms

david-cortes · 2020-12-27T13:06:29Z

But that can be simplified further:

rhs = rhs_init + X_nnz * C_u - X_nnz * ((C_u-1) * (x_biases))
rhs = rhs_init + X_nnz * (C_u - (C_u-1) * x_biases)

That way you also avoid an extra matrix multiplication.

dselivanov · 2020-12-27T13:26:08Z

Yeah, I've done that in #54 . However results are quite different from cmfrec and map@k significantly worse compared to the model without biases.
User/item biases on other side are highly correlated with popularity...

david-cortes · 2020-12-28T19:08:15Z

By the way, it's also straightforward to add it to the CG method: you just need to modify the calculation for the first residual.

dselivanov changed the title ~~user and item biases in WRMF for implicit feedback~~ WRMF user and item biases for implicit feedback data Dec 9, 2020

dselivanov mentioned this issue Dec 23, 2020

initial work on biases for model with implicit feedback #54

Merged

david-cortes added a commit to david-cortes/cmfrec that referenced this issue Dec 24, 2020

fix topN without biases ref dselivanov/rsparse#53

160f86d

dselivanov closed this as completed in 459e33d May 9, 2021

david-cortes mentioned this issue Jun 27, 2024

Solve CRAN issues from Rcpp with UBSAN #76

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WRMF user and item biases for implicit feedback data #53

WRMF user and item biases for implicit feedback data #53

dselivanov commented Dec 9, 2020

dselivanov commented Dec 9, 2020

david-cortes commented Dec 10, 2020

david-cortes commented Dec 11, 2020 •

edited

Loading

dselivanov commented Dec 11, 2020

david-cortes commented Dec 11, 2020

dselivanov commented Dec 24, 2020

david-cortes commented Dec 24, 2020

dselivanov commented Dec 24, 2020

dselivanov commented Dec 24, 2020

david-cortes commented Dec 24, 2020 •

edited

Loading

dselivanov commented Dec 24, 2020

dselivanov commented Dec 27, 2020 •

edited

Loading

david-cortes commented Dec 27, 2020

dselivanov commented Dec 27, 2020

david-cortes commented Dec 28, 2020

WRMF user and item biases for implicit feedback data #53

WRMF user and item biases for implicit feedback data #53

Comments

dselivanov commented Dec 9, 2020

dselivanov commented Dec 9, 2020

david-cortes commented Dec 10, 2020

david-cortes commented Dec 11, 2020 • edited Loading

dselivanov commented Dec 11, 2020

david-cortes commented Dec 11, 2020

dselivanov commented Dec 24, 2020

david-cortes commented Dec 24, 2020

dselivanov commented Dec 24, 2020

map@k

dselivanov commented Dec 24, 2020

david-cortes commented Dec 24, 2020 • edited Loading

dselivanov commented Dec 24, 2020

dselivanov commented Dec 27, 2020 • edited Loading

david-cortes commented Dec 27, 2020

dselivanov commented Dec 27, 2020

david-cortes commented Dec 28, 2020

david-cortes commented Dec 11, 2020 •

edited

Loading

david-cortes commented Dec 24, 2020 •

edited

Loading

dselivanov commented Dec 27, 2020 •

edited

Loading