Evaluation of word2vec models against semantic similarity datasets #1047

akutuzov · 2016-12-15T15:42:40Z

We long had analogy evaluation of wor2vec models in Gensim (also known as analogical inference). However, another type of evaluation is widespread in distributional semantics world, that is using word pairs ranked by their semantic similarity (see SimLex-999 and other datasets), and the correlation of these similarities to those produced by the model.

This PR adds the self.evaluation function to perform such evaluation against arbitrary datasets.

Conflicts: CHANGELOG.txt

Conflicts: CHANGELOG.txt gensim/models/word2vec.py

…vec.

… default vector size is 100, not 200).

Conflicts: gensim/models/word2vec.py

Conflicts: CHANGELOG.txt gensim/models/word2vec.py gensim/scripts/word2vec_standalone.py

Conflicts: CHANGELOG.md README.md gensim/models/word2vec.py tutorials.md

…y judgments datasets.

…elop

piskvorky · 2016-12-15T22:55:53Z

Thanks @akutuzov , that looks useful!

But what's with all those commits? Most look unrelated, and some look downright scary (like f3f2a52).

Also, we'll have to change the name evaluate to something more specific -- how about evaluate_word_pairs?

akutuzov · 2016-12-16T00:56:51Z

Thanks @piskvorky
I am certainly not against renaming, done.
As for extra commits, I am trying to understand why Github has bound this PR with my previous one (#538).
Only the last few commits starting from e11909f make sense in the context of this PR. Only two files are changed, in fact.

akutuzov · 2016-12-16T05:46:13Z

This is crazy.
So, can you squash all these commits into one or probably I should just start another PR from scratch?

tmylk · 2016-12-19T17:58:55Z

@akutuzov thanks for the feature. Could we please add some simple unit tests for this new feature?

akutuzov · 2016-12-19T18:15:56Z

@tmylk what can those be? Evaluating against a toy dataset? Should it follow the same structure as testAccuracy in https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/test/test_word2vec.py#L370?

Also, what should we do with the old unneeded commits in this PR? As I've said, I can probably start a new one from scratch, if it is not possible to just squash them all into one on Gensim side.

tmylk · 2016-12-19T18:21:39Z

That test is not a good example. It is not a test of accuracy but a test of KeyedVectors. A good test is when a model trained on Lee corpus being given a single pair to evaluate, like in the sanity test

There is another point. Having the small and canonical questions-words.txt in the repo helps a lot of people to test accuracy of their models. So we should add a semantic similarity dataset it is less than 1Mb .

Don't worry about commits, I will squash them.

akutuzov · 2016-12-19T18:34:05Z

OK, I will add a test, then.

tmylk · 2016-12-22T01:14:45Z

Thanks for the PR! Merging to add it to this year's release. Tests and a dataset should be in a separate PR.

akutuzov · 2016-12-22T08:51:16Z

Cool, thanks!
I will implement tests in a few days.

tmylk and others added 30 commits November 5, 2015 19:07

Merge branch 'release-0.12.3rc1'

1c63c9a

Merge branch 'release-0.12.3'

280a488

Merge branch 'release-0.12.3'

ddeb002

Update CHANGELOG.txt

f2ac3a9

Update CHANGELOG.txt

cf09e8c

cbow_mean default changed from 0 to 1.

b8b8f57

Hyperparameters' default values are aligned with Mikolov's word2vec.

6456cbc

Merge remote-tracking branch 'upstream/master' into develop

966a4b0

Conflicts: CHANGELOG.txt

Fix for piskvorky#538: cbow_mean default changed from 0 to 1.

d9ec7e4

Update changelog

76d2df7

(main) defaults aligned to Mikolov's word2vec.

0b6f45b

Merge remote-tracking branch 'upstream/develop' into develop

7fb5f18

Conflicts: CHANGELOG.txt gensim/models/word2vec.py

word2vec (main) now mimics command-line arguments for Mikolov's word2…

bc7a447

…vec.

Fix for piskvorky#538

e689b4f

Fix for piskvorky#538 (tabs and spaces).

a5274ab

Fix for piskvorky#538 (tests).

5c32ca8

For piskvorky#538: slightly relaxed sanity check demands (because now…

ac889b3

… default vector size is 100, not 200).

Fixes as per @gojomo comments.

92087c0

Test fixes due to negative sampling becoming default behavior.

06785b5

Commented out tests which work for HS only.

3ac5fd4

Fix for piskvorky#538.

e0ac3d2

Yet another fix.

0aad977

Merge remote-tracking branch 'upstream/develop' into develop

1db616b

Conflicts: gensim/models/word2vec.py

Merging.

e4eb8ba

Fix for CBOW test.

ab25344

Merge remote-tracking branch 'upstream/develop' into develop

6b3f01d

Changelog mention of piskvorky#538

2bf45d3

Fix for CBOW negative sampling tests.

1a579ec

Merge remote-tracking branch 'upstream/develop' into develop

78372bf

Factoring out word2vec _main__ into gensim/scripts

0c10fa6

tmylk and others added 15 commits June 9, 2016 19:48

Release version typo fix

9c74b40

Merge branch 'release-0.13.0rc1'

7b30025

Merge branch 'release-0.13.0'

de79c8e

Merge branch 'release-0.13.1'

d4f9cc5

Merge remote-tracking branch 'upstream/master' into develop

e0627c6

Conflicts: CHANGELOG.txt gensim/models/word2vec.py gensim/scripts/word2vec_standalone.py

Finalizing.

b8b30c2

'fisrt_push'

f3f2a52

Initial shippable release

873f184

Merge remote-tracking branch 'upstream/develop' into develop

68a3e86

Conflicts: CHANGELOG.md README.md gensim/models/word2vec.py tutorials.md

Evaluation function to measure model correlation with human similarit…

498474d

…y judgments datasets.

Updating semantic similarity evaluation.

ce64d5a

Scipy stats import

0936971

Evaluation function to measure model correlation with human similarit…

e11909f

…y judgments datasets.

Merge branch 'develop' of https://github.com/akutuzov/gensim into dev…

5f38818

…elop

Remove unneccessary.

b4b8d14

Changing the neame of the word pairs evaluation function.

2429dc4

piskvorky assigned tmylk Dec 16, 2016

piskvorky added the feature Issue described a new feature label Dec 16, 2016

Merge branch 'develop' into develop

ad6b268

tmylk merged commit baf0f16 into piskvorky:develop Dec 22, 2016

akutuzov mentioned this pull request Dec 27, 2016

Tests for the evaluate_word_pairs function #1061

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation of word2vec models against semantic similarity datasets #1047

Evaluation of word2vec models against semantic similarity datasets #1047

akutuzov commented Dec 15, 2016

piskvorky commented Dec 15, 2016 •

edited

Loading

akutuzov commented Dec 16, 2016

akutuzov commented Dec 16, 2016

tmylk commented Dec 19, 2016

akutuzov commented Dec 19, 2016

tmylk commented Dec 19, 2016

akutuzov commented Dec 19, 2016

tmylk commented Dec 22, 2016

akutuzov commented Dec 22, 2016

Evaluation of word2vec models against semantic similarity datasets #1047

Evaluation of word2vec models against semantic similarity datasets #1047

Conversation

akutuzov commented Dec 15, 2016

piskvorky commented Dec 15, 2016 • edited Loading

akutuzov commented Dec 16, 2016

akutuzov commented Dec 16, 2016

tmylk commented Dec 19, 2016

akutuzov commented Dec 19, 2016

tmylk commented Dec 19, 2016

akutuzov commented Dec 19, 2016

tmylk commented Dec 22, 2016

akutuzov commented Dec 22, 2016

piskvorky commented Dec 15, 2016 •

edited

Loading