Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: add support for scoring function #1594

Merged
merged 9 commits into from
Dec 15, 2024
Merged

Conversation

Samoed
Copy link
Collaborator

@Samoed Samoed commented Dec 14, 2024

Checklist

  • Run tests locally to make sure nothing is broken using make test.
  • Run the formatter to format the code using make lint.

Currently, there is no way to specify the scoring function except by passing it as kwargs, but ColBERT models require it.

Ref #1592 (comment)
Closes #1589

@sam-hey

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really appreciate this PR, but I believe this should be done using the model implemented similarity metric:

def similarity(

@Samoed
Copy link
Collaborator Author

Samoed commented Dec 15, 2024

I've aligned RetrievalEvaluator with the other evaluators. It now uses model.similarity.

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot better. Should we move it to v2.0.0?

mteb/evaluation/evaluators/RetrievalEvaluator.py Outdated Show resolved Hide resolved
Comment on lines 177 to +181
if is_nan.sum() > 0:
logger.warning(
f"Found {is_nan.sum()} NaN values in the similarity scores. Replacing NaN values with -1."
)
cos_scores[is_nan] = -1
similarity_scores[is_nan] = -1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have actually removed this over in v2.0.0 (it is not needed)

@@ -112,21 +100,12 @@ def search(
corpus: dict[str, dict[str, str]],
queries: dict[str, str | list[str]],
top_k: int,
score_function: str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

breaking? should we move to v2.0.0?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is very breaking because, in the implementation from the main branch, score_function can only be passed with evaluator.run, and I doubt many users have specified this.

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
@Samoed
Copy link
Collaborator Author

Samoed commented Dec 15, 2024

This PR can be moved to 2.0, but this change mainly benefits ColBert and I don't think it will break anything. However, if you believe it's breaking, I can move it to 2.0.

@KennethEnevoldsen
Copy link
Contributor

So technically breaking, in practice not breaking. I agree with that. Let's keep it as it is and merge this in.

@Samoed
Copy link
Collaborator Author

Samoed commented Dec 15, 2024

Also found that BitextMiningEvaluator uses cosine as score function. I think this also should be changed


Because PairClassification , STSEvaluator and SummarizatonEvaluator already can use model.similarity

@KennethEnevoldsen
Copy link
Contributor

Would love to add that as well. Feel free to add it in a separate PR and merge this one though

@Samoed Samoed merged commit 8e6ee46 into main Dec 15, 2024
10 checks passed
@Samoed Samoed deleted the add_support_for_score_funstion branch December 15, 2024 23:32
Samoed added a commit that referenced this pull request Dec 22, 2024
* feat: add new arctic v2.0 models (#1574)

* feat: add new arctic v2.0 models

* chore: make lint

* 1.24.0

Automatically generated by python-semantic-release

* fix: Add namaa MrTydi reranking dataset (#1573)

* Add dataset class and file requirements

* pass tests

* make lint changes

* adjust meta data and remove load_data

---------

Co-authored-by: Omar Elshehy <omarelshehy@Omars-MacBook-Pro.local>

* Update tasks table

* 1.24.1

Automatically generated by python-semantic-release

* fix: Eval langs not correctly passed to monolingual tasks (#1587)

* fix SouthAfricanLangClassification.py

* add check for langs

* lint

* 1.24.2

Automatically generated by python-semantic-release

* feat: Add ColBert (#1563)

* feat: add max_sim operator for IR tasks to support multi-vector models

* docs: add doc for Model2VecWrapper.__init__(...)

* feat: add ColBERTWrapper to models & add ColBERTv2

* fix: resolve issues

* fix: resolve issues

* Update README.md

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

* Update README.md

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* Update README.md

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* Update mteb/evaluation/evaluators/RetrievalEvaluator.py

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* Update README.md

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* README.md: rm subset

* doc: update example for Late Interaction

* get colbert running without errors

* fix: pass is_query to pylate

* fix: max_sim add pad_sequence

* feat: integrate Jinja templates for ColBERTv2 and add model prompt handling

* feat: add revision & prompt_name

* doc: pad_sequence

* rm TODO jina colbert v2

* doc: warning: higher resource usage for MaxSim

---------

Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com>
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* 1.25.0

Automatically generated by python-semantic-release

* doc: colbert add score_function & doc section (#1592)

* doc: colbert add score_function & doc section

* doc: Update README.md

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* doc: Update README.md

Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

---------

Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* Feat: add support for scoring function (#1594)

* add support for scoring function

* lint

* move similarity to wrapper

* remove score function

* lint

* remove from InstructionRetrievalEvaluator

* Update mteb/evaluation/evaluators/RetrievalEvaluator.py

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* remove score function from README.md

---------

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

* Add new models nvidia, gte, linq (#1436)

* Add new models nvidia, gte, linq
* add warning for gte-Qwen and nvidia models re: instruction used in docs as well
---------
Co-authored-by: isaac-chung <chungisaac1217@gmail.com>

* Leaderboard: Refined plots (#1601)

* Added embedding size guide to performance-size plot, removed shading on radar chart

* Changed plot names to something more descriptive

* Made plots failsafe

* fix: Leaderboard refinements (#1603)

* Added explanation of aggregate measures

* Added download button to result tables

* Task info gets sorted by task name

* Added custom, shareable links for each benchmark

* Moved explanation of aggregate metrics to the summary tab

* 1.25.1

Automatically generated by python-semantic-release

* Feat: Use similarity scores if available (#1602)

* Use similarity scores if available

* lint

* Add NanoBEIR Datasets (#1588)

* add NanoClimateFeverRetrieval task, still requires some debugging
* move task to correct place in init file
* add all Nano datasets and results
* format code
* Update mteb/tasks/Retrieval/eng/tempCodeRunnerFile.py
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
* pin revision to commit and add datasets to benchmark.py
* create new benchmark for NanoBEIR
* add revision when loading datasets
* lint
---------
Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>
Co-authored-by: isaac-chung <chungisaac1217@gmail.com>

* Update tasks table

* Feat: Evaluate missing languages (#1584)

* init
* fix tests
* update mock retrieval
* update tests
* use subsets instead of langs
* Apply suggestions from code review
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
* fix tests
* add to readme
* rename subset in readme
---------
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>

* Add IBM Granite Embedding Models (#1613)

* add IBM granite embedding models
* lint formatting
* add adapted_from and superseded_by to ModelMeta

* fix: disable co2_tracker for API models (#1614)

* 1.25.2

Automatically generated by python-semantic-release

* fix: set `use_instructions` to True in models using prompts (#1616)

feat: set `use_instructions` to True in models using prompts

* 1.25.3

Automatically generated by python-semantic-release

* update RetrievalEvaluator.py

* update imports

* update imports and metadata

* fix tests

* fix tests

* fix output path for retrieval

* fix similarity function

---------

Co-authored-by: Daniel Buades Marcos <daniel.buades@clinia.com>
Co-authored-by: github-actions <github-actions@github.com>
Co-authored-by: Omar Elshehy <41394057+omarelshehy@users.noreply.github.com>
Co-authored-by: Omar Elshehy <omarelshehy@Omars-MacBook-Pro.local>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sam <40773225+sam-hey@users.noreply.github.com>
Co-authored-by: sam021313 <40773225+sam021313@users.noreply.github.com>
Co-authored-by: Isaac Chung <chungisaac1217@gmail.com>
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>
Co-authored-by: Alexey Vatolin <vatolinalex@gmail.com>
Co-authored-by: Márton Kardos <power.up1163@gmail.com>
Co-authored-by: KGupta10 <92774828+KGupta10@users.noreply.github.com>
Co-authored-by: Aashka Trivedi <aashka.trivedi@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The score_function is cos_sim() by default, when evaluating ColBert model in Retrieval task
2 participants