Retrieval evaluation

Retrieval is the process of returning the most relevant documents, based on a specific user query.

We have already introduced several solutions and algorithms to perform retrieval. But how can we objectively evaluate retrieval? Let's see some evaluation measures...

In this pill, I just want to list the most common retrieval metrics. You can find in-depth explanations in the suggested resources.

(Image by Pinecone, Laura Carnevali and James Briggs)

Order-unaware metrics

Precision@K:

it quantifies how many documents in the top-K results are relevant. The K refers to the number of items returned by the system.
Recall@K:

it measures how many relevant documents for the query are returned among all possible relevant results in the collection.

👍 Recall is simply interpretable,

👎 but it does not take the position of results into account.

Order-aware metrics

Mean Reciprocal Rank (MRR):

this metric takes into account the position of the first correctly retrieved document.

👍 MRR is useful when only the rank of the first relevant result matters (for example, in chatbots);

👎 it's not a good metric if there can be multiple relevant results.
Mean Average Precision@K (MAP@K):

it takes into account the order of each item retrieved.

👍 MAP@K is suitable when we expect to return multiple relevant documents; it is very popular for evaluating recommender systems and search engines.
Normalized Discounted Cumulative Gain@K (NDCG@K):

it is a popular and newer retrieval metric. In this case, the documents are not simply considered relevant or not relevant: instead, we adopt a more nuanced rating scale.

It is simply explained in the scikit-learn docs as follows: "Sum the true scores ranked in the order induced by the predicted scores, after applying a logarithmic discount (=incorrect order penalty). Then divide by the best possible score (Ideal DCG, obtained for a perfect ranking) to obtain a score between 0 and 1."

Resources

Evaluation Metrics For Information Retrieval: great and complete blogpost by Amit Chaudhary
Evaluation Measures in Information Retrieval: from Pinecone blog, a very visual article by Laura Carnevali and James Briggs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

retrieval-evaluation.md

retrieval-evaluation.md

Retrieval evaluation

Order-unaware metrics

Order-aware metrics

Resources

Files

retrieval-evaluation.md

Latest commit

History

retrieval-evaluation.md

File metadata and controls

Retrieval evaluation

Order-unaware metrics

Order-aware metrics

Resources