bug: fix MRR and MAP calculations #7841

Amnah199 · 2024-06-11T14:06:32Z

Related Issues

fixes MAP and MRR wrong for multiple gold documents #7758

Proposed Changes:

Fixed errors of MRREvaluator:

Corrected the logic for calculation of MRR
Added a flag to break the loop when condition is fulfilled
Optimized the loops

Fixed errors of MAPEvaluator:

Corrected the logic for calculation of MAP
Optimized the loops

Fixed errors in pytest file of test_document_map.py:

Corrected values for test_run_with_complex_data()

How did you test it?

Ran unit tests for both evaluators

Notes for the reviewer

Verify the correctness of the calculated scores for an example.

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
I documented my code
I ran pre-commit hooks and fixed any issue

coveralls · 2024-06-11T14:13:04Z

Pull Request Test Coverage Report for Build 9467114407

Details

0 of 0 changed or added relevant lines in 0 files are covered.
5 unchanged lines in 2 files lost coverage.
Overall coverage decreased (-0.002%) to 89.801%

Files with Coverage Reduction	New Missed Lines	%
components/evaluators/document_map.py	2	93.33%
components/evaluators/document_mrr.py	3	90.0%

Totals
Change from base Build 9451690597:	-0.002%
Covered Lines:	6850
Relevant Lines:	7628

💛 - Coveralls

coveralls · 2024-06-11T15:32:48Z

Pull Request Test Coverage Report for Build 9468451282

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 0 changed or added relevant lines in 0 files are covered.
56 unchanged lines in 3 files lost coverage.
Overall coverage increased (+0.01%) to 89.813%

Files with Coverage Reduction	New Missed Lines	%
components/evaluators/document_map.py	2	93.33%
components/evaluators/document_mrr.py	3	90.0%
core/pipeline/pipeline.py	51	65.48%

Totals
Change from base Build 9451690597:	0.01%
Covered Lines:	6859
Relevant Lines:	7637

💛 - Coveralls

haystack/components/evaluators/document_mrr.py

ju-gu · 2024-06-12T15:00:22Z

thanks for taking care! I just doubled checked some examples with the result from v1 and it looks fine :)

coveralls · 2024-06-14T12:06:36Z

Pull Request Test Coverage Report for Build 9515678439

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 0 changed or added relevant lines in 0 files are covered.
55 unchanged lines in 3 files lost coverage.
Overall coverage decreased (-0.2%) to 89.645%

Files with Coverage Reduction	New Missed Lines	%
components/evaluators/document_map.py	2	93.33%
components/evaluators/document_mrr.py	2	92.31%
core/pipeline/pipeline.py	51	65.48%

Totals
Change from base Build 9451690597:	-0.2%
Covered Lines:	6900
Relevant Lines:	7697

💛 - Coveralls

masci

You can simplify a bit the code using list comprehensions

haystack/components/evaluators/document_map.py

haystack/components/evaluators/document_mrr.py

@anakin87

handing over to @anakin87

coveralls · 2024-06-21T15:17:24Z

Pull Request Test Coverage Report for Build 9615764716

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 0 changed or added relevant lines in 0 files are covered.
298 unchanged lines in 44 files lost coverage.
Overall coverage increased (+0.2%) to 89.968%

Files with Coverage Reduction	New Missed Lines	%
components/builders/answer_builder.py	1	98.21%
components/builders/chat_prompt_builder.py	1	98.41%
components/converters/utils.py	1	95.24%
components/evaluators/document_map.py	1	96.15%
components/evaluators/document_mrr.py	1	95.45%
components/preprocessors/document_cleaner.py	1	98.82%
components/preprocessors/document_splitter.py	1	98.63%
components/routers/metadata_router.py	1	95.65%
components/websearch/searchapi.py	1	96.3%
components/converters/tika.py	2	91.18%

Totals
Change from base Build 9451690597:	0.2%
Covered Lines:	6717
Relevant Lines:	7466

💛 - Coveralls

anakin87

I still found opportunities for improvement that could make our lives easier in the future (I hope)

anakin87 · 2024-06-24T10:17:40Z

haystack/components/evaluators/document_mrr.py

-                    if ground_document.content in retrieved_document.content:
-                        score = 1 / (rank + 1)
-                        break
+            ground_truth_content = [doc.content for doc in ground_truth if doc.content is not None]


Suggested change

ground_truth_content = [doc.content for doc in ground_truth if doc.content is not None]

ground_truth_contents = [doc.content for doc in ground_truth if doc.content is not None]

anakin87 · 2024-06-24T10:56:11Z

haystack/components/evaluators/document_map.py

+            average_precision = 0.0
+            relevant_documents = 0

-                for rank, retrieved_document in enumerate(retrieved):
-                    if retrieved_document.content is None:
-                        continue
+            ground_truth_content = [doc.content for doc in ground_truth if doc.content is not None]
+            for rank, retrieved_document in enumerate(retrieved):
+                if retrieved_document.content is None:
+                    continue

-                    if ground_document.content in retrieved_document.content:
-                        relevant_documents += 1
-                        average_precision += relevant_documents / (rank + 1)
-                if relevant_documents > 0:
-                    score = average_precision / relevant_documents
+                if retrieved_document.content in ground_truth_content:
+                    relevant_documents += 1
+                    average_precision += relevant_documents / (rank + 1)
+            if relevant_documents > 0:
+                score = average_precision / relevant_documents
            individual_scores.append(score)

-        score = sum(individual_scores) / len(retrieved_documents)
+        score = sum(individual_scores) / len(ground_truth_documents)


Not your fault but I had a hard time understanding the algorithm.

I think we could benefit from a better choice of variable names.

Adding a comment in the code with the formula/link from which we drew inspiration could also help

Resources I took inspiration from: #1, #2

Something like this:

average_precision_numerator = 0.0 relevant_documents = 0 ground_truth_contents = [doc.content for doc in ground_truth if doc.content is not None] for rank, retrieved_document in enumerate(retrieved): if retrieved_document.content is None: continue if retrieved_document.content in ground_truth_contents: relevant_documents += 1 precision_at_k= relevant_documents / (rank + 1) average_precision_numerator+=precision_at_k average_precision=0 if relevant_documents > 0: average_precision = average_precision_numerator / relevant_documents individual_scores.append(average_precision) score = sum(individual_scores) / len(ground_truth_documents)

(untested)

WDYT?

Agreed, I also struggled to understand the function initially. Optimized variables can help.

Also I think precision_at_k is unnecessary and will be extra storage?

I agree that it is not necessary, but in my opinion it makes the code more readable.

Feel free to make any changes you think appropriate.

coveralls · 2024-06-24T15:42:35Z

Pull Request Test Coverage Report for Build 9648193977

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 0 changed or added relevant lines in 0 files are covered.
298 unchanged lines in 44 files lost coverage.
Overall coverage increased (+0.2%) to 89.968%

Files with Coverage Reduction	New Missed Lines	%
components/builders/answer_builder.py	1	98.21%
components/builders/chat_prompt_builder.py	1	98.41%
components/converters/utils.py	1	95.24%
components/evaluators/document_map.py	1	96.15%
components/evaluators/document_mrr.py	1	95.45%
components/preprocessors/document_cleaner.py	1	98.82%
components/preprocessors/document_splitter.py	1	98.63%
components/routers/metadata_router.py	1	95.65%
components/websearch/searchapi.py	1	96.3%
components/converters/tika.py	2	91.18%

Totals
Change from base Build 9451690597:	0.2%
Covered Lines:	6717
Relevant Lines:	7466

💛 - Coveralls

anakin87

Great!

Please incorporate these small suggestions and then merge.

anakin87 · 2024-06-24T15:47:32Z

haystack/components/evaluators/document_map.py

@@ -10,7 +10,7 @@
 @component
 class DocumentMAPEvaluator:
    """
-    A Mean Average Precision (MAP) evaluator for documents.
+    A Mean Average Precision (MAP) evaluator for documents. For details, please refer to the [resource](https://www.pinecone.io/learn/offline-evaluation/).


Suggested change

A Mean Average Precision (MAP) evaluator for documents. For details, please refer to the [resource](https://www.pinecone.io/learn/offline-evaluation/).

A Mean Average Precision (MAP) evaluator for documents.

I would not put this link in the docstring, but I would put it as a comment at the beginning of the run method code to help those who will be working with the code in the future.

anakin87 · 2024-06-24T15:47:51Z

haystack/components/evaluators/document_mrr.py

@@ -10,7 +10,7 @@
 @component
 class DocumentMRREvaluator:
    """
-    Evaluator that calculates the mean reciprocal rank of the retrieved documents.
+    Evaluator that calculates the mean reciprocal rank of the retrieved documents. For details, please refer to the [resource](https://www.pinecone.io/learn/offline-evaluation/).


Suggested change

Evaluator that calculates the mean reciprocal rank of the retrieved documents. For details, please refer to the [resource](https://www.pinecone.io/learn/offline-evaluation/).

Evaluator that calculates the mean reciprocal rank of the retrieved documents.

Same as above

coveralls · 2024-06-25T09:17:52Z

Pull Request Test Coverage Report for Build 9659678165

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 0 changed or added relevant lines in 0 files are covered.
298 unchanged lines in 44 files lost coverage.
Overall coverage increased (+0.2%) to 89.968%

Files with Coverage Reduction	New Missed Lines	%
components/builders/answer_builder.py	1	98.21%
components/builders/chat_prompt_builder.py	1	98.41%
components/converters/utils.py	1	95.24%
components/evaluators/document_map.py	1	96.15%
components/evaluators/document_mrr.py	1	95.45%
components/preprocessors/document_cleaner.py	1	98.82%
components/preprocessors/document_splitter.py	1	98.63%
components/routers/metadata_router.py	1	95.65%
components/websearch/searchapi.py	1	96.3%
components/converters/tika.py	2	91.18%

Totals
Change from base Build 9451690597:	0.2%
Covered Lines:	6717
Relevant Lines:	7466

💛 - Coveralls

coveralls · 2024-06-25T09:39:22Z

Pull Request Test Coverage Report for Build 9659996000

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 0 changed or added relevant lines in 0 files are covered.
297 unchanged lines in 43 files lost coverage.
Overall coverage increased (+0.2%) to 89.968%

Files with Coverage Reduction	New Missed Lines	%
components/builders/answer_builder.py	1	98.21%
components/builders/chat_prompt_builder.py	1	98.41%
components/converters/utils.py	1	95.24%
components/evaluators/document_map.py	1	96.15%
components/preprocessors/document_cleaner.py	1	98.82%
components/preprocessors/document_splitter.py	1	98.63%
components/routers/metadata_router.py	1	95.65%
components/websearch/searchapi.py	1	96.3%
components/converters/tika.py	2	91.18%
components/converters/txt.py	2	90.0%

Totals
Change from base Build 9451690597:	0.2%
Covered Lines:	6717
Relevant Lines:	7466

💛 - Coveralls

bug: fix MRR and MAP calculations

4676247

Amnah199 requested a review from a team as a code owner June 11, 2024 14:06

Amnah199 requested review from masci and removed request for a team June 11, 2024 14:06

github-actions bot added the topic:tests label Jun 11, 2024

Amnah199 added the ignore-for-release-notes PRs with this flag won't be included in the release notes. label Jun 11, 2024

Added the release notes

bc0256c

Amnah199 requested a review from a team as a code owner June 11, 2024 15:26

Amnah199 requested review from dfokina and removed request for a team June 11, 2024 15:26

github-actions bot added the type:documentation Improvements on the docs label Jun 11, 2024

Amnah199 removed ignore-for-release-notes PRs with this flag won't be included in the release notes. labels Jun 11, 2024

masci reviewed Jun 12, 2024

View reviewed changes

haystack/components/evaluators/document_mrr.py Outdated Show resolved Hide resolved

removed unnecessary flag

39f1d14

masci previously requested changes Jun 14, 2024

View reviewed changes

haystack/components/evaluators/document_map.py Outdated Show resolved Hide resolved

haystack/components/evaluators/document_mrr.py Outdated Show resolved Hide resolved

masci requested a review from anakin87 June 21, 2024 11:23

Improved loops with list comprehensions

0a214cf

anakin87 reviewed Jun 24, 2024

View reviewed changes

Updated the variables and docs

d0dac89

anakin87 approved these changes Jun 24, 2024

View reviewed changes

Fixed linting issue

7aa270c

Small fixes

9a6bb93

Amnah199 merged commit fc011d7 into main Jun 25, 2024
17 checks passed

Amnah199 deleted the fix-mrr-map-evaluators branch June 25, 2024 10:07

davidsbatista mentioned this pull request Jun 28, 2024

Review: Doc Recall Eval #7772

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: fix MRR and MAP calculations #7841

bug: fix MRR and MAP calculations #7841

Amnah199 commented Jun 11, 2024

coveralls commented Jun 11, 2024 •

edited

Loading

coveralls commented Jun 11, 2024 •

edited

Loading

ju-gu commented Jun 12, 2024

coveralls commented Jun 14, 2024 •

edited

Loading

masci left a comment

coveralls commented Jun 21, 2024 •

edited

Loading

anakin87 left a comment

anakin87 Jun 24, 2024

anakin87 Jun 24, 2024

Amnah199 Jun 24, 2024 •

edited

Loading

Amnah199 Jun 24, 2024

anakin87 Jun 24, 2024

coveralls commented Jun 24, 2024 •

edited

Loading

anakin87 left a comment

anakin87 Jun 24, 2024

anakin87 Jun 24, 2024

coveralls commented Jun 25, 2024 •

edited

Loading

coveralls commented Jun 25, 2024 •

edited

Loading

	ground_truth_content = [doc.content for doc in ground_truth if doc.content is not None]
	ground_truth_contents = [doc.content for doc in ground_truth if doc.content is not None]

	A Mean Average Precision (MAP) evaluator for documents. For details, please refer to the [resource](https://www.pinecone.io/learn/offline-evaluation/).
	A Mean Average Precision (MAP) evaluator for documents.

	Evaluator that calculates the mean reciprocal rank of the retrieved documents. For details, please refer to the [resource](https://www.pinecone.io/learn/offline-evaluation/).
	Evaluator that calculates the mean reciprocal rank of the retrieved documents.

bug: fix MRR and MAP calculations #7841

bug: fix MRR and MAP calculations #7841

Conversation

Amnah199 commented Jun 11, 2024

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

coveralls commented Jun 11, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9467114407

Details

💛 - Coveralls

coveralls commented Jun 11, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9468451282

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

ju-gu commented Jun 12, 2024

coveralls commented Jun 14, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9515678439

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

masci left a comment

Choose a reason for hiding this comment

coveralls commented Jun 21, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9615764716

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

anakin87 left a comment

Choose a reason for hiding this comment

anakin87 Jun 24, 2024

Choose a reason for hiding this comment

anakin87 Jun 24, 2024

Choose a reason for hiding this comment

Amnah199 Jun 24, 2024 • edited Loading

Choose a reason for hiding this comment

Amnah199 Jun 24, 2024

Choose a reason for hiding this comment

anakin87 Jun 24, 2024

Choose a reason for hiding this comment

coveralls commented Jun 24, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9648193977

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

anakin87 left a comment

Choose a reason for hiding this comment

anakin87 Jun 24, 2024

Choose a reason for hiding this comment

anakin87 Jun 24, 2024

Choose a reason for hiding this comment

coveralls commented Jun 25, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9659678165

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

coveralls commented Jun 25, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9659996000

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

coveralls commented Jun 11, 2024 •

edited

Loading

coveralls commented Jun 11, 2024 •

edited

Loading

coveralls commented Jun 14, 2024 •

edited

Loading

coveralls commented Jun 21, 2024 •

edited

Loading

Amnah199 Jun 24, 2024 •

edited

Loading

coveralls commented Jun 24, 2024 •

edited

Loading

coveralls commented Jun 25, 2024 •

edited

Loading

coveralls commented Jun 25, 2024 •

edited

Loading