-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tests for the evaluate_word_pairs function #1061
Merged
Merged
Changes from all commits
Commits
Show all changes
62 commits
Select commit
Hold shift + click to select a range
1c63c9a
Merge branch 'release-0.12.3rc1'
tmylk 280a488
Merge branch 'release-0.12.3'
tmylk ddeb002
Merge branch 'release-0.12.3'
tmylk f2ac3a9
Update CHANGELOG.txt
tmylk cf09e8c
Update CHANGELOG.txt
tmylk b8b8f57
cbow_mean default changed from 0 to 1.
akutuzov 6456cbc
Hyperparameters' default values are aligned with Mikolov's word2vec.
akutuzov 966a4b0
Merge remote-tracking branch 'upstream/master' into develop
akutuzov d9ec7e4
Fix for #538: cbow_mean default changed from 0 to 1.
akutuzov 76d2df7
Update changelog
akutuzov 0b6f45b
(main) defaults aligned to Mikolov's word2vec.
akutuzov 7fb5f18
Merge remote-tracking branch 'upstream/develop' into develop
akutuzov bc7a447
word2vec (main) now mimics command-line arguments for Mikolov's word2…
akutuzov e689b4f
Fix for #538
akutuzov a5274ab
Fix for #538 (tabs and spaces).
akutuzov 5c32ca8
Fix for #538 (tests).
akutuzov ac889b3
For #538: slightly relaxed sanity check demands (because now default …
akutuzov 92087c0
Fixes as per @gojomo comments.
akutuzov 06785b5
Test fixes due to negative sampling becoming default behavior.
akutuzov 3ac5fd4
Commented out tests which work for HS only.
akutuzov e0ac3d2
Fix for #538.
akutuzov 0aad977
Yet another fix.
akutuzov 1db616b
Merge remote-tracking branch 'upstream/develop' into develop
akutuzov e4eb8ba
Merging.
akutuzov ab25344
Fix for CBOW test.
akutuzov 6b3f01d
Merge remote-tracking branch 'upstream/develop' into develop
akutuzov 2bf45d3
Changelog mention of #538
akutuzov 1a579ec
Fix for CBOW negative sampling tests.
akutuzov 78372bf
Merge remote-tracking branch 'upstream/develop' into develop
akutuzov 0c10fa6
Factoring out word2vec _main__ into gensim/scripts
akutuzov 8a3d58b
Use logger instead of logging.
akutuzov c5249b9
Made Changelog less verbose about word2vec defaults changed.
akutuzov a40e624
Fixes to word2vec_standalone.py as per Radim's comments.
akutuzov dbd0eab
Alpha argument. with different defaults for CBOW ans skipgram.
akutuzov b61287a
resolve merge conflict in Changelog
tmylk 3ade404
Merge branch 'release-0.12.4' with #596
tmylk 9e6522e
Merge branch 'release-0.13.0'
tmylk 87c4e9c
Merge branch 'release-0.13.0'
tmylk 9c74b40
Release version typo fix
tmylk 7b30025
Merge branch 'release-0.13.0rc1'
tmylk de79c8e
Merge branch 'release-0.13.0'
tmylk d4f9cc5
Merge branch 'release-0.13.1'
tmylk e0627c6
Merge remote-tracking branch 'upstream/master' into develop
akutuzov b8b30c2
Finalizing.
akutuzov f3f2a52
'fisrt_push'
Nowow 873f184
Initial shippable release
Nowow 68a3e86
Merge remote-tracking branch 'upstream/develop' into develop
akutuzov 498474d
Evaluation function to measure model correlation with human similarit…
akutuzov ce64d5a
Updating semantic similarity evaluation.
akutuzov 0936971
Scipy stats import
akutuzov e11909f
Evaluation function to measure model correlation with human similarit…
akutuzov 5f38818
Merge branch 'develop' of https://github.com/akutuzov/gensim into dev…
akutuzov b4b8d14
Remove unneccessary.
akutuzov 2429dc4
Changing the neame of the word pairs evaluation function.
akutuzov ad6b268
Merge branch 'develop' into develop
tmylk fddbc0a
Merge remote-tracking branch 'upstream/develop' into develop
akutuzov 910a511
Wordsim353 dataset added.
akutuzov 54e0ba2
Fixed bug in evaluate_word_pairs.
akutuzov 41f8f8e
Tests for evaluate_word_pairs function.
akutuzov 9dfbac5
Atrributing Wordsim353 dataset.
akutuzov 5899610
Merge remote-tracking branch 'upstream/develop' into develop
akutuzov 11c9afb
Test for out-of-vocabulary pairs in evaluate_word_pairs.
akutuzov File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,355 @@ | ||
# The WordSimilarity-353 Test Collection (http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/) | ||
# Word 1 Word 2 Human (mean) | ||
love sex 6.77 | ||
tiger cat 7.35 | ||
tiger tiger 10.00 | ||
book paper 7.46 | ||
computer keyboard 7.62 | ||
computer internet 7.58 | ||
plane car 5.77 | ||
train car 6.31 | ||
telephone communication 7.50 | ||
television radio 6.77 | ||
media radio 7.42 | ||
drug abuse 6.85 | ||
bread butter 6.19 | ||
cucumber potato 5.92 | ||
doctor nurse 7.00 | ||
professor doctor 6.62 | ||
student professor 6.81 | ||
smart student 4.62 | ||
smart stupid 5.81 | ||
company stock 7.08 | ||
stock market 8.08 | ||
stock phone 1.62 | ||
stock CD 1.31 | ||
stock jaguar 0.92 | ||
stock egg 1.81 | ||
fertility egg 6.69 | ||
stock live 3.73 | ||
stock life 0.92 | ||
book library 7.46 | ||
bank money 8.12 | ||
wood forest 7.73 | ||
money cash 9.15 | ||
professor cucumber 0.31 | ||
king cabbage 0.23 | ||
king queen 8.58 | ||
king rook 5.92 | ||
bishop rabbi 6.69 | ||
Jerusalem Israel 8.46 | ||
Jerusalem Palestinian 7.65 | ||
holy sex 1.62 | ||
fuck sex 9.44 | ||
Maradona football 8.62 | ||
football soccer 9.03 | ||
football basketball 6.81 | ||
football tennis 6.63 | ||
tennis racket 7.56 | ||
Arafat peace 6.73 | ||
Arafat terror 7.65 | ||
Arafat Jackson 2.50 | ||
law lawyer 8.38 | ||
movie star 7.38 | ||
movie popcorn 6.19 | ||
movie critic 6.73 | ||
movie theater 7.92 | ||
physics proton 8.12 | ||
physics chemistry 7.35 | ||
space chemistry 4.88 | ||
alcohol chemistry 5.54 | ||
vodka gin 8.46 | ||
vodka brandy 8.13 | ||
drink car 3.04 | ||
drink ear 1.31 | ||
drink mouth 5.96 | ||
drink eat 6.87 | ||
baby mother 7.85 | ||
drink mother 2.65 | ||
car automobile 8.94 | ||
gem jewel 8.96 | ||
journey voyage 9.29 | ||
boy lad 8.83 | ||
coast shore 9.10 | ||
asylum madhouse 8.87 | ||
magician wizard 9.02 | ||
midday noon 9.29 | ||
furnace stove 8.79 | ||
food fruit 7.52 | ||
bird cock 7.10 | ||
bird crane 7.38 | ||
tool implement 6.46 | ||
brother monk 6.27 | ||
crane implement 2.69 | ||
lad brother 4.46 | ||
journey car 5.85 | ||
monk oracle 5.00 | ||
cemetery woodland 2.08 | ||
food rooster 4.42 | ||
coast hill 4.38 | ||
forest graveyard 1.85 | ||
shore woodland 3.08 | ||
monk slave 0.92 | ||
coast forest 3.15 | ||
lad wizard 0.92 | ||
chord smile 0.54 | ||
glass magician 2.08 | ||
noon string 0.54 | ||
rooster voyage 0.62 | ||
money dollar 8.42 | ||
money cash 9.08 | ||
money currency 9.04 | ||
money wealth 8.27 | ||
money property 7.57 | ||
money possession 7.29 | ||
money bank 8.50 | ||
money deposit 7.73 | ||
money withdrawal 6.88 | ||
money laundering 5.65 | ||
money operation 3.31 | ||
tiger jaguar 8.00 | ||
tiger feline 8.00 | ||
tiger carnivore 7.08 | ||
tiger mammal 6.85 | ||
tiger animal 7.00 | ||
tiger organism 4.77 | ||
tiger fauna 5.62 | ||
tiger zoo 5.87 | ||
psychology psychiatry 8.08 | ||
psychology anxiety 7.00 | ||
psychology fear 6.85 | ||
psychology depression 7.42 | ||
psychology clinic 6.58 | ||
psychology doctor 6.42 | ||
psychology Freud 8.21 | ||
psychology mind 7.69 | ||
psychology health 7.23 | ||
psychology science 6.71 | ||
psychology discipline 5.58 | ||
psychology cognition 7.48 | ||
planet star 8.45 | ||
planet constellation 8.06 | ||
planet moon 8.08 | ||
planet sun 8.02 | ||
planet galaxy 8.11 | ||
planet space 7.92 | ||
planet astronomer 7.94 | ||
precedent example 5.85 | ||
precedent information 3.85 | ||
precedent cognition 2.81 | ||
precedent law 6.65 | ||
precedent collection 2.50 | ||
precedent group 1.77 | ||
precedent antecedent 6.04 | ||
cup coffee 6.58 | ||
cup tableware 6.85 | ||
cup article 2.40 | ||
cup artifact 2.92 | ||
cup object 3.69 | ||
cup entity 2.15 | ||
cup drink 7.25 | ||
cup food 5.00 | ||
cup substance 1.92 | ||
cup liquid 5.90 | ||
jaguar cat 7.42 | ||
jaguar car 7.27 | ||
energy secretary 1.81 | ||
secretary senate 5.06 | ||
energy laboratory 5.09 | ||
computer laboratory 6.78 | ||
weapon secret 6.06 | ||
FBI fingerprint 6.94 | ||
FBI investigation 8.31 | ||
investigation effort 4.59 | ||
Mars water 2.94 | ||
Mars scientist 5.63 | ||
news report 8.16 | ||
canyon landscape 7.53 | ||
image surface 4.56 | ||
discovery space 6.34 | ||
water seepage 6.56 | ||
sign recess 2.38 | ||
Wednesday news 2.22 | ||
mile kilometer 8.66 | ||
computer news 4.47 | ||
territory surface 5.34 | ||
atmosphere landscape 3.69 | ||
president medal 3.00 | ||
war troops 8.13 | ||
record number 6.31 | ||
skin eye 6.22 | ||
Japanese American 6.50 | ||
theater history 3.91 | ||
volunteer motto 2.56 | ||
prejudice recognition 3.00 | ||
decoration valor 5.63 | ||
century year 7.59 | ||
century nation 3.16 | ||
delay racism 1.19 | ||
delay news 3.31 | ||
minister party 6.63 | ||
peace plan 4.75 | ||
minority peace 3.69 | ||
attempt peace 4.25 | ||
government crisis 6.56 | ||
deployment departure 4.25 | ||
deployment withdrawal 5.88 | ||
energy crisis 5.94 | ||
announcement news 7.56 | ||
announcement effort 2.75 | ||
stroke hospital 7.03 | ||
disability death 5.47 | ||
victim emergency 6.47 | ||
treatment recovery 7.91 | ||
journal association 4.97 | ||
doctor personnel 5.00 | ||
doctor liability 5.19 | ||
liability insurance 7.03 | ||
school center 3.44 | ||
reason hypertension 2.31 | ||
reason criterion 5.91 | ||
hundred percent 7.38 | ||
Harvard Yale 8.13 | ||
hospital infrastructure 4.63 | ||
death row 5.25 | ||
death inmate 5.03 | ||
lawyer evidence 6.69 | ||
life death 7.88 | ||
life term 4.50 | ||
word similarity 4.75 | ||
board recommendation 4.47 | ||
governor interview 3.25 | ||
OPEC country 5.63 | ||
peace atmosphere 3.69 | ||
peace insurance 2.94 | ||
territory kilometer 5.28 | ||
travel activity 5.00 | ||
competition price 6.44 | ||
consumer confidence 4.13 | ||
consumer energy 4.75 | ||
problem airport 2.38 | ||
car flight 4.94 | ||
credit card 8.06 | ||
credit information 5.31 | ||
hotel reservation 8.03 | ||
grocery money 5.94 | ||
registration arrangement 6.00 | ||
arrangement accommodation 5.41 | ||
month hotel 1.81 | ||
type kind 8.97 | ||
arrival hotel 6.00 | ||
bed closet 6.72 | ||
closet clothes 8.00 | ||
situation conclusion 4.81 | ||
situation isolation 3.88 | ||
impartiality interest 5.16 | ||
direction combination 2.25 | ||
street place 6.44 | ||
street avenue 8.88 | ||
street block 6.88 | ||
street children 4.94 | ||
listing proximity 2.56 | ||
listing category 6.38 | ||
cell phone 7.81 | ||
production hike 1.75 | ||
benchmark index 4.25 | ||
media trading 3.88 | ||
media gain 2.88 | ||
dividend payment 7.63 | ||
dividend calculation 6.48 | ||
calculation computation 8.44 | ||
currency market 7.50 | ||
OPEC oil 8.59 | ||
oil stock 6.34 | ||
announcement production 3.38 | ||
announcement warning 6.00 | ||
profit warning 3.88 | ||
profit loss 7.63 | ||
dollar yen 7.78 | ||
dollar buck 9.22 | ||
dollar profit 7.38 | ||
dollar loss 6.09 | ||
computer software 8.50 | ||
network hardware 8.31 | ||
phone equipment 7.13 | ||
equipment maker 5.91 | ||
luxury car 6.47 | ||
five month 3.38 | ||
report gain 3.63 | ||
investor earning 7.13 | ||
liquid water 7.89 | ||
baseball season 5.97 | ||
game victory 7.03 | ||
game team 7.69 | ||
marathon sprint 7.47 | ||
game series 6.19 | ||
game defeat 6.97 | ||
seven series 3.56 | ||
seafood sea 7.47 | ||
seafood food 8.34 | ||
seafood lobster 8.70 | ||
lobster food 7.81 | ||
lobster wine 5.70 | ||
food preparation 6.22 | ||
video archive 6.34 | ||
start year 4.06 | ||
start match 4.47 | ||
game round 5.97 | ||
boxing round 7.61 | ||
championship tournament 8.36 | ||
fighting defeating 7.41 | ||
line insurance 2.69 | ||
day summer 3.94 | ||
summer drought 7.16 | ||
summer nature 5.63 | ||
day dawn 7.53 | ||
nature environment 8.31 | ||
environment ecology 8.81 | ||
nature man 6.25 | ||
man woman 8.30 | ||
man governor 5.25 | ||
murder manslaughter 8.53 | ||
soap opera 7.94 | ||
opera performance 6.88 | ||
life lesson 5.94 | ||
focus life 4.06 | ||
production crew 6.25 | ||
television film 7.72 | ||
lover quarrel 6.19 | ||
viewer serial 2.97 | ||
possibility girl 1.94 | ||
population development 3.75 | ||
morality importance 3.31 | ||
morality marriage 3.69 | ||
Mexico Brazil 7.44 | ||
gender equality 6.41 | ||
change attitude 5.44 | ||
family planning 6.25 | ||
opera industry 2.63 | ||
sugar approach 0.88 | ||
practice institution 3.19 | ||
ministry culture 4.69 | ||
problem challenge 6.75 | ||
size prominence 5.31 | ||
country citizen 7.31 | ||
planet people 5.75 | ||
development issue 3.97 | ||
experience music 3.47 | ||
music project 3.63 | ||
glass metal 5.56 | ||
aluminum metal 7.83 | ||
chance credibility 3.88 | ||
exhibit memorabilia 5.31 | ||
concert virtuoso 6.81 | ||
rock jazz 7.59 | ||
museum theater 7.19 | ||
observation architecture 4.38 | ||
space world 6.53 | ||
preservation world 6.19 | ||
admission ticket 7.69 | ||
shower thunderstorm 6.31 | ||
shower flood 6.03 | ||
weather forecast 8.34 | ||
disaster area 6.25 | ||
governor office 6.34 | ||
architecture century 3.78 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we please test for
oov_ratio
incorrelation[2]
too?