Accuracy is 0 when negative sampling is disabled #2778

ChristianAngel · 2020-03-30T18:36:58Z

Problem description

When Word2Vec is trained on the text8 dataset with negative=0 (negative sampling disabled), the accuracy drops to 0 when evaluated on questions-words.txt.

Steps/code/corpus to reproduce

Minimal reproducible example:

import gensim.downloader
from gensim.models import Word2Vec

def evaluate(model):
    globalStats = model.wv.accuracy("questions-words.txt") 
    numberCorrect = len(globalStats[-1]['correct'])
    return numberCorrect

dataset = gensim.downloader.load("text8")
model1 = Word2Vec(dataset, size=300, workers=1, negative=5)
model2 = Word2Vec(dataset, size=300, workers=1, negative=0)

print("Number correct with negative sampling:", evaluate(model1))
print("Number correct without negative sampling:", evaluate(model2))

Output:

Number correct with negative sampling: 4031
Number correct without negative sampling: 0

questions-words.txt was downloaded from https://github.com/nicholas-leonard/word2vec/blob/master/questions-words.txt

Versions

Linux-3.10.0-862.2.3.el7.x86_64-x86_64-with-centos-7.5.1804-Core
Python 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34)
[GCC 7.3.0]
NumPy 1.16.4
SciPy 1.3.0
gensim 3.8.1
FAST_VERSION 1

The text was updated successfully, but these errors were encountered:

gojomo · 2020-04-03T00:24:01Z

Duplicate of #1983 - but the only thing missing is a warning/error. This is a nonsensical configuration: if you disable negative without also enabling hs, then the model has no output-layer & source of backprop-training. (Either negative must be nonzero, or hs must be enabled, for anything useful to happen - as with the original word2vec.c code released by Google, there's no non-sparse training mode.) Training will complete instantly, logging output will be nonsense.

gojomo closed this as completed Apr 3, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accuracy is 0 when negative sampling is disabled #2778

Accuracy is 0 when negative sampling is disabled #2778

ChristianAngel commented Mar 30, 2020

gojomo commented Apr 3, 2020 •

edited

Loading

Accuracy is 0 when negative sampling is disabled #2778

Accuracy is 0 when negative sampling is disabled #2778

Comments

ChristianAngel commented Mar 30, 2020

Problem description

Steps/code/corpus to reproduce

Versions

gojomo commented Apr 3, 2020 • edited Loading

gojomo commented Apr 3, 2020 •

edited

Loading