GitHub - thisisclement/STS-Benchmark-SentEval: STS-Benchmark is a helper library that evaluates Sentence Transformer Models for Semantic Textual Similarity tasks.

STS Benchmark Evaluator

STS Benchmark Evaluator is a helper library that evaluates Sentence Transformer models for Semantic Textual Similarity Tasks.

This utilises the STS-Benchmark test set for the evaluation. This should work for the other SemEval datasets as well.

How to run

Install dependencies and needed libraries

pipenv install

You can take a look at my example notebook in run_stsbenchmark.ipynb

Benchmark Results

Model	STS-B Dataset	Spearman Score
paraphrase-multilingual-mpnet-base-v2	English STS-B	0.8682219047835629
paraphrase-multilingual-MiniLM-L12-v2	English STS-B	0.8441678301403761
paraphrase-multilingual-mpnet-base-v2	Malay STS-B	0.6341583902856095
paraphrase-multilingual-mpnet-base-v2	Vietnames STS-B	0.6265472133925664
paraphrase-multilingual-mpnet-base-v2	Thai STS-B	0.6192682373584009
paraphrase-multilingual-MiniLM-L12-v2	Malay STS-B	0.6118353309379856
paraphrase-multilingual-mpnet-base-v2	Chinese STS-B	0.6052726127430685
paraphrase-multilingual-MiniLM-L12-v2	Vietnames STS-B	0.6037541386963938
paraphrase-multilingual-MiniLM-L12-v2	Chinese STS-B	0.6032880514028351
paraphrase-multilingual-MiniLM-L12-v2	Thai STS-B	0.6000013617198022
paraphrase-multilingual-mpnet-base-v2	Tagalog STS-B	0.36567875871165906
paraphrase-multilingual-MiniLM-L12-v2	Tagalog STS-B	0.3394533385140273

The multilingual models perform relatively well on Bahasa Malay, Vietnamese, Thai and Chinese but performs poorly on Tagalog related STS tasks. This might be due to the lack of parallel data for the Tagalog language.

Citation

f

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
SimilarityEvaluation		SimilarityEvaluation
benchmark_results		benchmark_results
data		data
notebooks		notebooks
scripts		scripts
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STS Benchmark Evaluator

How to run

Benchmark Results

Citation

About

Releases

Packages

Languages

thisisclement/STS-Benchmark-SentEval

Folders and files

Latest commit

History

Repository files navigation

STS Benchmark Evaluator

How to run

Benchmark Results

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages