Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add precision@k and recall@k #58

Merged
merged 8 commits into from
Oct 14, 2017
Merged

Conversation

graytowne
Copy link
Contributor

I add two new ranking metrics: precision@k and recall@k.

There still remain some works to do:

  1. When evaluation recommender models, one should ignore all the rated item(from training data), as they won't appear again in the future (test data).
  2. More metrics like MAP@k and AUC still need to implement.

@@ -96,6 +96,60 @@ def sequence_mrr_score(model, test):
return np.array(mrrs)


def precision_recall_at_k(model, test, k):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make k an optional argument (with a default value)?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If yes, I guess the only natural default would be None and return the P@k/R@k for the whole ranking (although the function name would become a little misleading).

This is why I usually call the function just precision/recall, and put the "at"s only in the variable names. But I guess the current name is what most people expect to find in the library because the literature chose to put the k in the metric name...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one name that would be consistent with the existing naming convention would be precision_recall_score.

The model to evaluate.
test: :class:`spotlight.interactions.Interactions`
Test interactions.
k: int or array of int,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure where I sit on making this a list:

  • yes, it's more efficient;
  • but it is also more complex.

Could we maybe make it so that if a scalar is passed a scalar is also returned?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having two options make it more confuse. How about change it to only scalar or only list? I design it for list before, as in research, people usually plot experiment figures to see how performance changes when varying k.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think doing both list and scalar version is a nice compromise.

We should set k=10 as the default value. Then

  • if a list is passed, return tuple of lists of arrays
  • if a scalar is passed (or the default value is kepy), we return tuple of arrays.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, got it. I'll make the changes soon.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having said that, I would also be happy for this to be only scalar.


test = test.tocsr()

if not isinstance(k, list):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should check for more possible iterables: tuples, numpy arrays and so on.

targets = row.indices

for _k in k:
pred = np.asarray(predictions)[:_k]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: predictions is already a numpy array.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are correct.


for _k in k:
pred = np.asarray(predictions)[:_k]
num_hit = len(set(pred).intersection(set(targets)))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.in1d(targets, pred) may be faster?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested in my laptop(2012 Macbook) and find out that the 'set intersection' version is slightly faster than 'np in1d' version (1s vs. 1.2s) on ML100k dataset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the code:

num_hit = sum(np.in1d(pred, targets))

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.set it is.

@@ -10,7 +10,7 @@
from spotlight.factorization.implicit import ImplicitFactorizationModel
from spotlight.factorization.representations import BilinearNet
from spotlight.layers import BloomEmbedding

from spotlight.evaluation import precision_recall_at_k
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind adding a new separate test file for evaluation metrics?

Parameters
----------

model: fitted instance of a recommender model
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make this accept train interactions as well in the same way other metrics do.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is so that we can disregard the model giving high scores to known interactions. This reflects real-world system implementations.

-------

(Precision@k, Recall@k): numpy array of shape (num_users,)
A tuple of Precisions@k and Recalls@k for each user in test.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be clearer about what this returns. I suggest tuple of lists of arrays (or tuple of arrays in the case of scalar k), so that the i-th element of list is the (num_users ,) array of precision/recall at k[i].

@maciejkula
Copy link
Owner

Thanks for the improvements! There are still one or two things I'd like changed. I hope it's OK if I make a PR to your branch in the next couple of days?

@graytowne
Copy link
Contributor Author

Not a problem! Take your time, and I can add more metrics (MAP, AUC) after that.

maciejkula and others added 2 commits October 13, 2017 22:01
This commit does the following:
- simplify the code
- improve the docs
- some efficiency improvements in eliminating train interactions
@maciejkula
Copy link
Owner

🚀

@maciejkula maciejkula merged commit 2db4845 into maciejkula:master Oct 14, 2017
@beeva-ramiromanso beeva-ramiromanso mentioned this pull request Nov 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants