-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WRMF user and item biases for implicit feedback data #53
Comments
@david-cortes that's interesting information. Do biases help to reduce loss? My intuition is that item biases should be highly correlated with item popularity. I would expect them help significantly for users with few interactions. As for centering - I'm not sure if it will help. |
I'm realizing there might be a bug with the item biases in library(Matrix)
library(rsparse)
library(cmfrec)
data("movielens100k")
eval.full.loss <- function(X, W, A, B, user_bias=NULL, item_bias=NULL, glob_mean=NULL) {
Xdense <- as(X, "matrix")
X@x <- W
Wdense <- as(X, "matrix")
pred <- A %*% t(B)
if (!is.null(glob_mean))
pred <- pred + glob_mean
if (NROW(user_bias))
pred <- pred + user_bias
if (NROW(item_bias))
pred <- pred + item_bias
err <- Wdense * ((Xdense - pred)^2)
return(mean(err))
}
Xcoo <- as(movielens100k, "TsparseMatrix")
Xvalues <- Xcoo@x
Xcoo@x <- rep(1, length(Xcoo@x))
set.seed(123)
m.nobias.nocenter <- CMF(Xcoo, weight=Xvalues, NA_as_zero=TRUE, k=10, verbose=FALSE,
center=FALSE, user_bias=FALSE, item_bias=FALSE)
eval.full.loss(Xcoo, Xvalues,
t(m.nobias.nocenter$matrices$A),
t(m.nobias.nocenter$matrices$B))
set.seed(123)
m.nobias.center <- CMF(Xcoo, weight=Xvalues, NA_as_zero=TRUE, k=10, verbose=FALSE,
center=TRUE, user_bias=FALSE, item_bias=FALSE)
eval.full.loss(Xcoo, Xvalues,
t(m.nobias.center$matrices$A),
t(m.nobias.center$matrices$B),
glob_mean = m.nobias.center$matrices$glob_mean)
set.seed(123)
m.userbias.center <- CMF(Xcoo, weight=Xvalues, NA_as_zero=TRUE, k=10, verbose=FALSE,
center=TRUE, user_bias=TRUE, item_bias=FALSE)
eval.full.loss(Xcoo, Xvalues,
t(m.userbias.center$matrices$A),
t(m.userbias.center$matrices$B),
user_bias = m.userbias.center$matrices$user_bias,
glob_mean = m.userbias.center$matrices$glob_mean) Results:
So the change doesn't look too big. |
@dselivanov My bad, there's actually no bug in EDIT: just realized that R doesn't do matrix + row-vector sums, here's the function redone and compared with all the biases eval.full.loss <- function(X, W, A, B, user_bias=NULL, item_bias=NULL, glob_mean=NULL) {
Xdense <- as(X, "matrix")
X@x <- W - 1
Wdense <- as(X, "matrix") + 1
pred <- A %*% t(B)
if (!is.null(glob_mean))
pred <- pred + glob_mean
if (NROW(user_bias))
pred <- pred + user_bias
if (NROW(item_bias))
pred <- pred + matrix(rep(item_bias, nrow(pred)), nrow=nrow(pred), byrow = TRUE)
err <- Wdense * ((Xdense - pred)^2)
return(mean(err))
}
So in the end the item biases do bring some small improvement in terms of loss, and my earlier tests were wrong (was summing them incorrectly). |
@david-cortes thanks for example. I will try to play with |
It's implemented through different functions. First it has some aggregated steps like this: Then it calls function If using the CG method, that function will then end up calling this other one: But overall, the idea is that you're solving a system like this:
In which the RHS can be decomposed into some parts that apply to all rows and some parts that turn to zero for missing entries:
|
@david-cortes I have some challenges with cmfrec... Would be great if you can provide an example on how to predict top n items for new users. Here I've put a template: library(Matrix)
library(rsparse)
library(cmfrec)
data(movielens100k)
set.seed(1)
# take 100 users for validation
i = sample(nrow(movielens100k), 100)
val = movielens100k[i, ]
train = movielens100k[-i, ]
# now mark 30% of the interactions as observed and
# 70% as unobserved - will evaluate map@k at these 70%
val_split = rsparse:::train_test_split(val, test_proportion = 0.7)
str(val_split)
train = as(train, "TsparseMatrix")
w = train@x
train@x = rep(1, length(train@x))
model = CMF(train,
weight = w,
NA_as_zero = TRUE,
k = 10,
verbose = TRUE,
center = FALSE,
user_bias = FALSE,
item_bias = FALSE) Now the question is how can I use |
Sorry had a bug with The easiest way to use that function is to pass the data as a val.csr = as(val, "RsparseMatrix")
w.val = val.csr@x
val.csr@x = rep(1, length(val.csr@x))
### TopN for row 1 in val
topN_new(model, as(val.csr[1L, , drop=FALSE], "sparseVector"),
weight = w.val[seq(val.csr@p[1L] + 1L, val.csr@p[2L])]) Although you could also take the matrices and replace the values of |
But still there is an ALS step when item-embeddings are fixed. And this should call map@khere how I calculate map@k with cmfrec predict_cmfrec = function(model, X) {
n = nrow(X)
X = as(X, "RsparseMatrix")
res = lapply(seq_len(n), function (i) {
if (i %% 10 == 0)message(sprintf("%d/%d", i, n))
x = as(X[i, , drop=FALSE], "sparseVector")
w = x@x
x@x = rep(1, length(x@x))
preds = topN_new(
model,
x,
weight = w,
exclude = x@i
)
preds
})
do.call(rbind, res)
}
preds = predict_cmfrec(model, val_split$train)
mean(rsparse::ap_k(preds, val_split$test)) For lastfm360 it looks there is a moderate lift in map@10:
|
@david-cortes also when |
I'll have to guess that this increment in MAP@10 is less than what you'd see from using k+2 factors, ergo not worth it. The low correlation might be due to numerical instability when using too small lambda. By default it uses a GC solver and then switches to Cholesky in the last iteration, so perhaps it'd look a bit better using Or perhaps could be due to how you're measuring popularity. There is also a model model = MostPopular(X, implicit=TRUE, lambda=0) |
Well, it is rather worth to compare to the model with k+1 factors. Also biases should make a huge difference for users with few/no interactions. |
What I've figured out so far:
|
But that can be simplified further:
That way you also avoid an extra matrix multiplication. |
Yeah, I've done that in #54 . However results are quite different from cmfrec and map@k significantly worse compared to the model without biases. |
By the way, it's also straightforward to add it to the CG method: you just need to modify the calculation for the first residual. |
@dselivanov I’m not so sure it’s something desirable to have actually. I tried playing with centering and biases with implicit-feedback data, and I see that adding user biases usually gives a very small lift in metrics like HR@5, but item biases makes them much worse.
You can play with
cmfrec
(version from git, the one from CRAN has bugs for this use-case) like this with e.g. the lastFM data or similar, which would fit the same model asWRMF
withfeedback="implicit"
:Originally posted by @david-cortes in #44 (comment)
The text was updated successfully, but these errors were encountered: