-
Notifications
You must be signed in to change notification settings - Fork 426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add precision@k and recall@k #58
Conversation
spotlight/evaluation.py
Outdated
@@ -96,6 +96,60 @@ def sequence_mrr_score(model, test): | |||
return np.array(mrrs) | |||
|
|||
|
|||
def precision_recall_at_k(model, test, k): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make k
an optional argument (with a default value)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If yes, I guess the only natural default would be None
and return the P@k/R@k for the whole ranking (although the function name would become a little misleading).
This is why I usually call the function just precision/recall, and put the "at"s only in the variable names. But I guess the current name is what most people expect to find in the library because the literature chose to put the k in the metric name...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think one name that would be consistent with the existing naming convention would be precision_recall_score
.
spotlight/evaluation.py
Outdated
The model to evaluate. | ||
test: :class:`spotlight.interactions.Interactions` | ||
Test interactions. | ||
k: int or array of int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure where I sit on making this a list:
- yes, it's more efficient;
- but it is also more complex.
Could we maybe make it so that if a scalar is passed a scalar is also returned?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having two options make it more confuse. How about change it to only scalar or only list? I design it for list before, as in research, people usually plot experiment figures to see how performance changes when varying k.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think doing both list and scalar version is a nice compromise.
We should set k=10
as the default value. Then
- if a list is passed, return tuple of lists of arrays
- if a scalar is passed (or the default value is kepy), we return tuple of arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, got it. I'll make the changes soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having said that, I would also be happy for this to be only scalar.
spotlight/evaluation.py
Outdated
|
||
test = test.tocsr() | ||
|
||
if not isinstance(k, list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should check for more possible iterables: tuples, numpy arrays and so on.
spotlight/evaluation.py
Outdated
targets = row.indices | ||
|
||
for _k in k: | ||
pred = np.asarray(predictions)[:_k] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: predictions is already a numpy array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you are correct.
spotlight/evaluation.py
Outdated
|
||
for _k in k: | ||
pred = np.asarray(predictions)[:_k] | ||
num_hit = len(set(pred).intersection(set(targets))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.in1d(targets, pred)
may be faster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested in my laptop(2012 Macbook) and find out that the 'set intersection' version is slightly faster than 'np in1d' version (1s vs. 1.2s) on ML100k dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the code:
num_hit = sum(np.in1d(pred, targets))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.set
it is.
tests/factorization/test_implicit.py
Outdated
@@ -10,7 +10,7 @@ | |||
from spotlight.factorization.implicit import ImplicitFactorizationModel | |||
from spotlight.factorization.representations import BilinearNet | |||
from spotlight.layers import BloomEmbedding | |||
|
|||
from spotlight.evaluation import precision_recall_at_k |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind adding a new separate test file for evaluation metrics?
Parameters | ||
---------- | ||
|
||
model: fitted instance of a recommender model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make this accept train interactions as well in the same way other metrics do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so that we can disregard the model giving high scores to known interactions. This reflects real-world system implementations.
spotlight/evaluation.py
Outdated
------- | ||
|
||
(Precision@k, Recall@k): numpy array of shape (num_users,) | ||
A tuple of Precisions@k and Recalls@k for each user in test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be clearer about what this returns. I suggest tuple of lists of arrays (or tuple of arrays in the case of scalar k
), so that the i-th element of list is the (num_users ,)
array of precision/recall at k[i]
.
Thanks for the improvements! There are still one or two things I'd like changed. I hope it's OK if I make a PR to your branch in the next couple of days? |
Not a problem! Take your time, and I can add more metrics (MAP, AUC) after that. |
This commit does the following: - simplify the code - improve the docs - some efficiency improvements in eliminating train interactions
Simplify and improve tests.
🚀 |
I add two new ranking metrics: precision@k and recall@k.
There still remain some works to do: