Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WRMF user and item biases for implicit feedback data #53

Closed
dselivanov opened this issue Dec 9, 2020 · 15 comments
Closed

WRMF user and item biases for implicit feedback data #53

dselivanov opened this issue Dec 9, 2020 · 15 comments

Comments

@dselivanov
Copy link
Owner

@dselivanov I’m not so sure it’s something desirable to have actually. I tried playing with centering and biases with implicit-feedback data, and I see that adding user biases usually gives a very small lift in metrics like HR@5, but item biases makes them much worse.

You can play with cmfrec (version from git, the one from CRAN has bugs for this use-case) like this with e.g. the lastFM data or similar, which would fit the same model as WRMF with feedback="implicit":

library(cmfrec)
Xvalues <- Xcoo@x
Xcoo@x <- rep(1, length(Xcoo@x))
model <- CMF(Xcoo, weight=Xvalues, NA_as_zero=TRUE,
             center=TRUE, user_bias=TRUE, item_bias=TRUE)

Originally posted by @david-cortes in #44 (comment)

@dselivanov dselivanov changed the title user and item biases in WRMF for implicit feedback WRMF user and item biases for implicit feedback data Dec 9, 2020
@dselivanov
Copy link
Owner Author

@david-cortes that's interesting information. Do biases help to reduce loss?

My intuition is that item biases should be highly correlated with item popularity. I would expect them help significantly for users with few interactions.

As for centering - I'm not sure if it will help.

@david-cortes
Copy link
Contributor

I'm realizing there might be a bug with the item biases in cmfrec. But here's a comparison with centering and user biases for now:

library(Matrix)
library(rsparse)
library(cmfrec)
data("movielens100k")

eval.full.loss <-  function(X, W, A, B, user_bias=NULL, item_bias=NULL, glob_mean=NULL) {
    Xdense <- as(X, "matrix")
    X@x <- W
    Wdense <- as(X, "matrix")
    pred <- A %*% t(B)
    if (!is.null(glob_mean))
        pred <- pred + glob_mean
    if (NROW(user_bias))
        pred <- pred + user_bias
    if (NROW(item_bias))
        pred <- pred + item_bias
    err <- Wdense * ((Xdense - pred)^2)
    return(mean(err))
}

Xcoo <- as(movielens100k, "TsparseMatrix")
Xvalues <- Xcoo@x
Xcoo@x <- rep(1, length(Xcoo@x))

set.seed(123)
m.nobias.nocenter <- CMF(Xcoo, weight=Xvalues, NA_as_zero=TRUE, k=10, verbose=FALSE,
                         center=FALSE, user_bias=FALSE, item_bias=FALSE)
eval.full.loss(Xcoo, Xvalues,
               t(m.nobias.nocenter$matrices$A),
               t(m.nobias.nocenter$matrices$B))



set.seed(123)
m.nobias.center <- CMF(Xcoo, weight=Xvalues, NA_as_zero=TRUE, k=10, verbose=FALSE,
                       center=TRUE, user_bias=FALSE, item_bias=FALSE)
eval.full.loss(Xcoo, Xvalues,
               t(m.nobias.center$matrices$A),
               t(m.nobias.center$matrices$B),
               glob_mean = m.nobias.center$matrices$glob_mean)


set.seed(123)
m.userbias.center <- CMF(Xcoo, weight=Xvalues, NA_as_zero=TRUE, k=10, verbose=FALSE,
                         center=TRUE, user_bias=TRUE, item_bias=FALSE)
eval.full.loss(Xcoo, Xvalues,
               t(m.userbias.center$matrices$A),
               t(m.userbias.center$matrices$B),
               user_bias = m.userbias.center$matrices$user_bias,
               glob_mean = m.userbias.center$matrices$glob_mean)

Results:

[1] 0.04399711
[1] 0.04234777
[1] 0.04188822

So the change doesn't look too big.

@david-cortes
Copy link
Contributor

david-cortes commented Dec 11, 2020

@dselivanov My bad, there's actually no bug in cmfrec, it's this loss function that was wrong. Doing the experiment again:

EDIT: just realized that R doesn't do matrix + row-vector sums, here's the function redone and compared with all the biases

eval.full.loss <-  function(X, W, A, B, user_bias=NULL, item_bias=NULL, glob_mean=NULL) {
    Xdense <- as(X, "matrix")
    X@x <- W - 1
    Wdense <- as(X, "matrix") + 1
    pred <- A %*% t(B)
    if (!is.null(glob_mean))
        pred <- pred + glob_mean
    if (NROW(user_bias))
        pred <- pred + user_bias
    if (NROW(item_bias))
        pred <- pred + matrix(rep(item_bias, nrow(pred)), nrow=nrow(pred), byrow = TRUE)
    err <- Wdense * ((Xdense - pred)^2)
    return(mean(err))
}
[1] 0.0726521 ## no bias, no center
[1] 0.07276726 ## center
[1] 0.07216536 ## center + user bias
[1] 0.07158445 ## center + user bias + item bias

So in the end the item biases do bring some small improvement in terms of loss, and my earlier tests were wrong (was summing them incorrectly).

@dselivanov
Copy link
Owner Author

@david-cortes thanks for example. I will try to play with cmfrec and user-item biases. Could you provide a link to the code where user-item biases for implicit feedback implemented?

@david-cortes
Copy link
Contributor

It's implemented through different functions. First it has some aggregated steps like this:
https://github.com/david-cortes/cmfrec/blob/259057fcb59f2c0115f9737c6a18cbe1347925e9/src/collective.c#L7518

Then it calls function factors_closed_form in a loop here:
https://github.com/david-cortes/cmfrec/blob/259057fcb59f2c0115f9737c6a18cbe1347925e9/src/common.c#L625
(key there are the variables named "bias")

If using the CG method, that function will then end up calling this other one:
https://github.com/david-cortes/cmfrec/blob/259057fcb59f2c0115f9737c6a18cbe1347925e9/src/common.c#L1067

But overall, the idea is that you're solving a system like this:

solve(t(W*X)*X + diag(lambda),   t(W*X)(Y-glob_bias-column_bias)

In which the RHS can be decomposed into some parts that apply to all rows and some parts that turn to zero for missing entries:

t(W*X)*(Y)  - t((W-1)*X*(glob_bias+column_bias)) - t(X)*(glob_bias+column_bias)

@dselivanov
Copy link
Owner Author

@david-cortes I have some challenges with cmfrec... Would be great if you can provide an example on how to predict top n items for new users. Here I've put a template:

library(Matrix)
library(rsparse)
library(cmfrec)
data(movielens100k)

set.seed(1)
# take 100 users for validation
i = sample(nrow(movielens100k), 100)

val = movielens100k[i, ]
train = movielens100k[-i, ]

# now mark 30% of the interactions as observed and
# 70% as unobserved - will evaluate map@k at these 70%
val_split = rsparse:::train_test_split(val, test_proportion = 0.7)
str(val_split)

List of 2
$ train:Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
.. ..@ i : int [1:3039] 10 13 24 32 45 46 52 54 55 59 ...
.. ..@ p : int [1:1683] 0 13 17 20 26 30 31 42 53 61 ...
.. ..@ Dim : int [1:2] 100 1682
.. ..@ Dimnames:List of 2
.. .. ..$ : chr [1:100] "836" "679" "129" "930" ...
.. .. ..$ : chr [1:1682] "1" "2" "3" "4" ...
.. ..@ x : num [1:3039] 3 3 4 3 4 2 5 5 4 5 ...
.. ..@ factors : list()
$ test :Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
.. ..@ i : int [1:6983] 1 3 5 6 9 11 14 30 31 33 ...
.. ..@ p : int [1:1683] 0 41 48 54 66 71 72 102 117 137 ...
.. ..@ Dim : int [1:2] 100 1682
.. ..@ Dimnames:List of 2
.. .. ..$ : chr [1:100] "836" "679" "129" "930" ...
.. .. ..$ : chr [1:1682] "1" "2" "3" "4" ...
.. ..@ x : num [1:6983] 3 3 4 3 5 4 5 4 3 3 ...
.. ..@ factors : list()

train = as(train, "TsparseMatrix")
w = train@x
train@x = rep(1, length(train@x))

model = CMF(train, 
            weight = w, 
            NA_as_zero = TRUE, 
            k = 10, 
            verbose = TRUE, 
            center = FALSE, 
            user_bias = FALSE, 
            item_bias = FALSE)

Now the question is how can I use topN_new() to make predictions based on val_split$trainand validate against val_split$test

david-cortes added a commit to david-cortes/cmfrec that referenced this issue Dec 24, 2020
@david-cortes
Copy link
Contributor

Sorry had a bug with topN that threw incorrect results when not using biases, fixed now.

The easiest way to use that function is to pass the data as a sparseVector from Matrix:

val.csr = as(val, "RsparseMatrix")
w.val = val.csr@x
val.csr@x = rep(1, length(val.csr@x))

### TopN for row 1 in val
topN_new(model, as(val.csr[1L, , drop=FALSE], "sparseVector"),
         weight = w.val[seq(val.csr@p[1L] + 1L, val.csr@p[2L])])

Although you could also take the matrices and replace the values of $components from a WRMF object. These are available under model$matrices (A is the user factors, B is the item factors, and the biases are user_bias, item_bias, glob_mean).

@dselivanov
Copy link
Owner Author

Although you could also take the matrices and replace the values of $components from a WRMF object. These are available under model$matrices (A is the user factors, B is the item factors, and the biases are user_bias, item_bias, glob_mean)

But still there is an ALS step when item-embeddings are fixed. And this should call cmfrec solver under the hood.

map@k

here how I calculate map@k with cmfrec

predict_cmfrec = function(model, X) {
  n = nrow(X)
  X = as(X, "RsparseMatrix")
  res = lapply(seq_len(n), function (i) {
    if (i %% 10 == 0)message(sprintf("%d/%d", i, n))
    x = as(X[i, , drop=FALSE], "sparseVector")
    w = x@x
    x@x = rep(1, length(x@x))
    preds = topN_new(
      model,
      x,
      weight = w,
      exclude = x@i
    )
    preds
  })
  do.call(rbind, res)
}

preds = predict_cmfrec(model, val_split$train)
mean(rsparse::ap_k(preds, val_split$test))

For lastfm360 it looks there is a moderate lift in map@10:

  • 0.2880954 without user and item biases
  • 0.298398 with user and item biases

@dselivanov
Copy link
Owner Author

@david-cortes also when lambda > 0 I observe very strong correlation between item bias and item popularity. Interestingly that when lambda close to 0 this is not the case. I think there might be chance that are some issues in the code related to this fact.

@david-cortes
Copy link
Contributor

david-cortes commented Dec 24, 2020

I'll have to guess that this increment in MAP@10 is less than what you'd see from using k+2 factors, ergo not worth it.

The low correlation might be due to numerical instability when using too small lambda. By default it uses a GC solver and then switches to Cholesky in the last iteration, so perhaps it'd look a bit better using finalize_chol=FALSE.

Or perhaps could be due to how you're measuring popularity. There is also a model MostPopular which will calculate only the biases, using their closed-form solution:

model = MostPopular(X, implicit=TRUE, lambda=0)

@dselivanov
Copy link
Owner Author

I'll have to guess that this increment in MAP@10 is less than what you'd see from using k+2 factors, ergo not worth it.

Well, it is rather worth to compare to the model with k+1 factors. Also biases should make a huge difference for users with few/no interactions.

@dselivanov
Copy link
Owner Author

dselivanov commented Dec 27, 2020

What I've figured out so far:

  • only code for rhs is affected

  • rhs = X * C_u * (p_u - x_biases) = X * eye * (0 - x_biases) + X * diag(1 + f(r_ui)) * (1 - x_biases)

  • rhs_init = X * eye * (0 - x_biases) = -X * x_biases can be precomputed

  • then for each user we calculate

    • rhs = rhs_init + X.cols(idx_nnz) * x_biases(idx_nnz) - removing p=0 terms from init
    • rhs = rhs + X.cols(idx_nnz) * diag(confidence(idx_nnz)) * (1 - x_biases(idx_nnz)) - adding p=1 terms

@david-cortes
Copy link
Contributor

But that can be simplified further:

rhs = rhs_init + X_nnz * C_u - X_nnz * ((C_u-1) * (x_biases))
rhs = rhs_init + X_nnz * (C_u - (C_u-1) * x_biases)

That way you also avoid an extra matrix multiplication.

@dselivanov
Copy link
Owner Author

Yeah, I've done that in #54 . However results are quite different from cmfrec and map@k significantly worse compared to the model without biases.
User/item biases on other side are highly correlated with popularity...

@david-cortes
Copy link
Contributor

By the way, it's also straightforward to add it to the CG method: you just need to modify the calculation for the first residual.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants