Skip to content

Releases: embeddings-benchmark/mteb

1.18.0

28 Oct 14:49
Compare
Choose a tag to compare

1.18.0 (2024-10-28)

Feature

  • feat: update English benchmarks and mark MMTEB benchmarks as beta (#1341)

  • feat: update English benchmarks and mark MMTEB benchmarks as beta

  • Added summEvalv2

  • Update docs with new MTEB_EN_MAIN rename (61371dd)

1.17.0

26 Oct 13:35
Compare
Choose a tag to compare

1.17.0 (2024-10-26)

Feature

  • feat: Update metadata for all models (#1316)

  • Added model meta

  • format

  • fixed metadata

  • Metadata update for voyage models

  • Update mteb/models/cohere_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

  • Update mteb/models/cohere_models.py

Co-authored-by: Roman Solomatin <samoed.roman@gmail.com>

  • Added corrections from review

  • fix spelling error


Co-authored-by: Roman Solomatin <samoed.roman@gmail.com> (f8fed9b)

Unknown

  • WIP: Leaderboard UI improvements (#1320)

  • Fixed typos in task_results

  • Fixed typos in task_results

  • Added Tailwind, reorganized layout and fixed scrolling

  • Ran linting

  • Removed faux benchmark

  • Updated layout

  • Changed table number format

  • Table highlights highest values by making them bold

  • Added rank to table, removed organization from model_name

  • Added mean rank to table

  • Ran linting (5af36c5)

  • Cache the embeddings when requested (#1307)

  • add caching

  • update test to use close

  • change from json to pkl

  • fix for window

  • cleanup on Windows again

  • infer dimension

  • move cachewrapper

  • add wrapper

  • fix

  • updates

  • fix tests

  • fix lint

  • lint

  • add test (650e8b8)

  • Update tasks table (4a04042)

  • Add multilingual mFollowIR dataset (#1308)

  • add mFollowIR

  • paper name

  • edit warning->info

  • convert to parquet

  • lint (b580b95)

1.16.5

25 Oct 19:59
Compare
Choose a tag to compare

1.16.5 (2024-10-25)

Fix

  • fix: Add implementations of common reranker models (#1309)

  • init

  • revert

  • revert

  • add metadata

  • lint

  • add reqs

  • change to float16

  • benchmark lint fix (f5f90d3)

1.16.4

25 Oct 15:17
Compare
Choose a tag to compare

1.16.4 (2024-10-25)

Fix

  • fix: Re-upload dataset to hub to avoid using script upload (#1322)

  • fix dataset upload

  • add linting (f00a262)

Unknown

1.16.3

24 Oct 12:38
Compare
Choose a tag to compare

1.16.3 (2024-10-24)

Fix

  • fix: remove duplicate multilingual (2f14519)

1.16.2

24 Oct 11:25
Compare
Choose a tag to compare

1.16.2 (2024-10-24)

Fix

  • fix: Add Slovak Hate Speech and Offensive Language Dataset (#1274)

  • Add Slovak Hate Speech and Offensive Language
    Dataset

This commit introduces the Slovak Hate Speech and Offensive Language Database to MTEB. The dataset includes posts from a social network, annotated by humans for hate speech and offensive content. Additionally, the corresponding task has been added to the tasks.md table to reflect this update.

  • Add Slovak Hate Speech and Offensive Language Dataset
  • Updated init.py to include the new SlovakHateSpeechClassification task.
  • Modified SlovakHateSpeechClassification.py as per review suggestions to enhance functionality and readability.
  • Did requested changes:
  • Updated init.py to include the new SlovakHateSpeechClassification task.
  • Modified SlovakHateSpeechClassification.py as per review suggestions to enhance functionality and readability.
  • resolve linting issues by running make lint (f3d8014)

Unknown

  • WIP: Leaderboard UI improvements (#1312)

  • Fixed typos in task_results

  • Fixed typos in task_results

  • Added Tailwind, reorganized layout and fixed scrolling

  • Ran linting (bd5ee9e)

  • Update tasks table (0d86753)

1.16.1

22 Oct 10:43
Compare
Choose a tag to compare

1.16.1 (2024-10-22)

Fix

  • fix: Add Retrieval SK Quad dataset for Slovak search evaluation (#1276)

  • Add Retrieval SK Quad dataset for Slovak search evaluation

This commit introduces the Retrieval SK Quad dataset, designed to assess Slovak search performance. The dataset is derived from SK-QuAD and includes questions with their best answers categorized post-annotation. This addition provides a significant resource for advancing Slovak language search evaluation and supporting further research and development.

  • Add Retrieval SK Quad dataset for Slovak search evaluation 2

Added the requested changes on the SKQuadRetrieval.py file

  • add task to init

  • add missing task metadata


Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> (fc53498)

Unknown

1.16.0

21 Oct 21:50
Compare
Choose a tag to compare

1.16.0 (2024-10-21)

Feature

  • feat: Use prompts instead of encode_corpus and encode_queries (#1278)

  • add prompt per task type

  • fix prompt

  • upd test

  • lint

  • fix test

  • fix DeprecatedSummarizationEvaluator

  • fix prompts

  • add test

  • lint

  • logger info

  • use task type only in model_encode

  • lint

  • update interface

  • add prompt types to docs

  • fix test

  • mock tasks

  • mock task registry

  • remove last task_type

  • fix tests

  • lint

  • fix test

  • fix

  • use wrapper and new prompts

  • fix tests

  • lint

  • fix test

  • remove conftest

  • validate task to prompt_name

  • override model prompts

  • task to prompt name optional

  • fix tests

  • fix models

  • remove task_to_prompt_name

  • remove from mteb init

  • update docs

  • load existing model prompts if model_prompts is None

  • fix

  • lint

  • change wrapper loader

  • add wrapper class

  • lint

  • add wrapper file

  • update logging

  • upd logging

  • refactor reranking

  • lint

  • remove prints (2a61821)

Unknown

  • Leaderboard (#1235)

  • Add leaderboard dev

  • Renamed MTEBResults to TaskResult

  • Moved model and model meta loading utilities into overview.py

  • Added get_model_metas to retrieve filtered metadata for models

  • Restructured results object and made it into a class instead of a dict

  • Added utilities for filtering models on BenchmarkResults objects

  • Added to_table utility function to BenchmarkResults

  • Added serialization utilities to BenchmarkResults

  • Attempted fixing tests

  • Added get_model_metas to init

  • Added get_benchmarks to init and made it return all benchmarks by default

  • Added get_benchmarks to init

  • Made tasks hashable

  • Added task filtering based on task objects on BenchmarkResults

  • Added BenchmarkResults to init

  • Added additional arguments to get_scores on two classes

  • Made get_scores smarter on BenchmarkResult

  • Added basic multilingual benchmark

  • Modified benchmark to be able to easily access results

  • Added useful properties and filtering functions to BenchmarkResults

  • Added minimal functioning example

  • Added smarter table, task-list updating and tried fixing dropdown scrolling

  • Made restrict_results into a private function

Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com>

  • Removed old leaderboard scripts

  • Hardcoded max and min model size

  • Removed redundant utils file

  • Ran linting

  • added leaderboard dependencies as optional

  • Fixed union type error on Python 3.9

  • Removed references to Dict in task aggregation

  • Fixed name errors in _restrict_task_results

  • Fixed _restrict_task_results

  • Made hf_subsets={'default'} when the task is monolingual in _restric_task_results

  • Task dropdown now gets filtered based on the other criteria

  • Ran linting again

  • Introduced hotfix for reranking test

  • Added BenchmarkResults to all in init

  • Fixed validate_and_filter_scores method, and replaced _restric_task_results with it


Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsen@gmail.com> (094f922)

1.15.8

20 Oct 09:18
Compare
Choose a tag to compare

1.15.8 (2024-10-20)

Fix

  • fix: Remove non-existent eval split of CMNLI (#1294)

fix eval_splits of CMNLI (5b4b262)

1.15.7

16 Oct 22:49
Compare
Choose a tag to compare

1.15.7 (2024-10-16)

Fix

  • fix: Add metadata dict to QBQTC in C-MTEB (#1292)

  • fix QBQTC in C-MTEB

  • make lint


Co-authored-by: Isaac Chung <chungisaac1217@gmail.com> (4a88a1d)