[WIP] Added function "predict_output_word" to predict the output word given the context words. Fixes issue #863. #1209

chinmayapancholi13 · 2017-03-12T19:37:25Z

This PR adds a function predict_output_word, to the class Word2Vec, which runs the trained model and reports the probability values of the possible output words. This fixes #863.

…into word_predict

tmylk · 2017-03-13T13:48:48Z

Please add unit tests and a note in CHANGELOG.md

chinmayapancholi13 · 2017-03-13T15:23:34Z

@tmylk Sure. Also, I wanted to confirm if this function (just like the score function) would only be implemented for the hierarchical softmax scheme. To compute the final probability values, only self.syn1 has been used right now. If we also implement this for negative sampling, then we would have to use self.syn1neg.

Also, if we are only implementing this for the hierarchical softmax scheme, then we should add the check if not self.hs at the start of the function and show an appropriate error message like "We have currently only implemented predict_output_word for the hierarchical softmax scheme, so you need to have run word2vec with hs=1 and negative=0 for this to work." Could you please confirm if this is correct?

gojomo · 2017-03-13T15:33:57Z

Hierarchical-softmax mode is non-default, and in my experience less commonly-used. Also, this code currently interprets the individual output-slots of syn1 as indicating, one-for-one, the vocabulary words in index2word order. However, that's an interpretation that's only valid for negative-sampling mode using syn1neg. (In HS mode, a word's variable length list points of the syn1 nodes must be tending towards the codes values to predict that one word.)

I'd suggest instead that the negative-sampling case be clearly and properly supported – as that has the easier interpretation (a single slot in syn1neg does refer to just one word). Then, work on figuring out a sensible way to report HS probabilities.

chinmayapancholi13 · 2017-03-13T16:30:45Z

@gojomo Thanks a lot for clarifying this. So, I'll change the current implementation of the function to serve negative sampling scheme first and then figure out how to report probabilities for hierarchical softmax case.

tmylk · 2017-03-16T19:37:41Z

gensim/models/word2vec.py

+        if word2_indices and self.cbow_mean:
+            l1 /= len(word2_indices)
+
+        if self.negative :


Please raise exception

if not self.negative: raise RuntimeError("We have currently only implemented for negative sampling") ``

tmylk · 2017-03-16T19:40:59Z

gensim/models/word2vec.py

+            word2_indices.append(word.index)
+
+        l1 = np_sum(self.wv.syn0[word2_indices], axis=0)
+        if word2_indices and self.cbow_mean:


if word_vocabs is empty, then return None with a warning

tmylk · 2017-03-16T19:41:31Z

gensim/models/word2vec.py

+
+        word2_indices = []
+        for pos, word in enumerate(word_vocabs):
+            word2_indices.append(word.index)


please use list comprehension

tmylk · 2017-03-17T00:38:57Z

Tests fixed by smart_open update

tmylk · 2017-03-17T01:19:34Z

Please add a unit test and a note in Changelog

…into word_predict

piskvorky · 2017-03-20T05:44:34Z

CHANGELOG.md

@@ -35,35 +37,35 @@ Improvements:
 * Phrases and Phraser allow a generator corpus (ELind77 [#1099](https://github.com/RaRe-Technologies/gensim/pull/1099))
 * Ignore DocvecsArray.doctag_syn0norm in save. Fix #789 (@accraze,[#1053](https://github.com/RaRe-Technologies/gensim/pull/1053))
 * Fix bug in LsiModel that occurs when id2word is a Python 3 dictionary. (@cvangysel,[#1103](https://github.com/RaRe-Technologies/gensim/pull/1103)
-* Fix broken link to paper in readme (@bhargavvader,[#1101](https://github.com/RaRe-Technologies/gensim/pull/1101)) 
-* Lazy formatting in evaluate_word_pairs (@akutuzov,[#1084](https://github.com/RaRe-Technologies/gensim/pull/1084)) 


@tmylk please check -- or even better, introduce an automated check -- that makes sure there's no trailing whitespace in commits.

Because it then leads to confusing diffs like this one, when someone (correctly!) removes the trailing whitespace later on.

chinmayapancholi13 · 2017-03-20T07:44:39Z

@tmylk I have made changes to CHANGELOG.md and also added a unit test as suggested by you earlier.

tmylk · 2017-03-20T22:00:43Z

Thanks for the new feature. Would be good to add it to https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/word2vec.ipynb

chinmayapancholi13 · 2017-03-21T03:10:54Z

@tmylk Sure. I would update the IPython Notebook as well.

exoticknight · 2017-03-21T14:04:30Z

@chinmayapancholi13
thanks for your work but correct me if I use it the wrong way(I just replace the whole word2vec.py)
macOS Sierra 10.12.3
Python 2.7.13

plan1 = ["pick-up-B", "stack-B-A", "pick-up-D", "stack-D-C"]
plan2 = ["unstack-B-A", "put-down-B", "unstack-D-C", "put-down-D"]
plan3 = ["pick-up-B", "stack-B-A", "pick-up-C", "stack-C-B", "pick-up-D", "stack-D-C"]
plan4 = ["unstack-D-C", "put-down-D", "unstack-C-B", "put-down-C", "unstack-B-A", "put-down-B"]

from gensim.models import word2vec

raw_sentences = plan1 + plan2 + plan3 + plan4

sentences = [s.split() for s in raw_sentences]

model = word2vec.Word2Vec(sentences, min_count=1, size=10, workers=4)

# pick-up-B OOO unstack-D-C put-down-D OOO stack-C-B OOO OOO
# pick-up-B stack-B-A unstack-D-C put-down-D pick-up-C stack-C-B pick-up-D stack-D-C
a = model.predict_output_word(['put-down-D', 'stack-C-B'])

print(a)
# weird???
# [('put-down-B', 0.083333336), ('stack-B-A', 0.083333336), ('unstack-C-B', 0.083333336), ('pick-up-C', 0.083333336), ('stack-C-B', 0.083333336), ('unstack-B-A', 0.083333336), ('put-down-D', 0.083333336), ('stack-D-C', 0.083333336), ('pick-up-B', 0.083333336), ('pick-up-D', 0.083333336)]

chinmayapancholi13 · 2017-03-21T19:23:57Z

@exoticknight Here sentences, the input list being fed to the model for training, is of length 20 with each sentence being just one word long. So every word would be equally probable to be the output word for the input provided. And since the size of the vocabulary is 12 in this case, the probability value for each word would be 1/12 = 0.08333333. This should be the expected output, right? Is there something that I missed?

exoticknight · 2017-03-23T02:31:48Z

@chinmayapancholi13
oh clumsy me! 😢
thx for the help man 🎉

chinmayapancholi13 · 2017-03-23T03:52:34Z

@exoticknight No problem! Let me know if you face any other problems. I'd be happy to help. :)

yzexeter · 2017-05-19T10:56:20Z

@chinmayapancholi13 Hi, I checked the code and comments. From my understanding, the implementation is for CBOW. So is it right that given 'emergency','beacon','received' (from the tutorial), the output is the center word either between 'emergency' and 'beacon' or between 'beacon' and 'received'. Because I didn't see your discussions at first, I used the implementation to predict the next word given a list of words.

chinmayapancholi13 · 2017-05-19T18:16:35Z

@yzexeter Hey! Yes, this implementation is for CBOW, as mentioned in the original issue. This means that the list of words (context_words_list) passed to function predict_output_word() is the list of words in the context of the word(s) output by the function (as per our trained model). So the probability distribution output by the function is for the center word.

yzexeter · 2017-05-20T18:22:47Z

@chinmayapancholi13 Thank you for your explanation. As it is based on CBOW, what if I use skip-gram to train the model (sg = 1). It doesn't output warning, I assume it works. does the prediction under skip-gram have other meanings. is it still the center word of the context list?

chinmayapancholi13 · 2017-05-21T07:42:33Z

@yzexeter In CBOW, we train our model to predict the center (target) word correctly given the context words. On the other hand, in case of skip-gram, we train our model to train the context words correctly given a particular word as input.
You can (as in you'll get some valid output), but should not, use this function after training your model for the skip-gram model. This is because in such a scenario your training objective would be different than what the function expects while giving the output probabilities. That is, you are training your model to be good at predicting the context words given the center word, but the function is expected to give the output probabilities of the center word given the context words. So, the output values from the function that you get after training your model for skip-gram (rather than CBOW) may not be good.
Hence, the function has been implemented keeping CBOW in mind as it takes a list of context words as input and outputs the probability distribution of the center word. Such a format does not cohere with the skip-gram model.

yzexeter · 2017-05-22T09:55:17Z

@chinmayapancholi13 Thank you. This explains my results after I used skip-gram model to predict output. will this be further implemented along with skip-gram. I will keep track of future modification of the implementation.

chinmayapancholi13 · 2017-05-27T21:56:17Z

@yzexeter Hey! Sorry for the late response. As I had mentioned above, the input format of this function (i.e a list of context words) doesn't really cohere with the skip-gram model (which predicts things the other way around i.e. predicts context words given the central word). I guess there can be a separate function for skip-gram model for doing this, but I don't have any plans right now to extend the same function to do that. :)

gojomo · 2017-05-29T05:26:51Z

@chinmayapancholi13 the difference between "focus word predicts all context window words" or "all context window words are used to predict focus word" ultimately isn't that significant - in the end, all the exact same "input word -> predicted word" pairs are used for training, just in a slightly different order. (IIRC, the word2vec paper describes it one way, but the Google word2vec.c code does it the other way, because they found slightly better CPU cache utilization patterns & thus bulk performance the way the code does it.)

A skip-gram predict-word function would need the exact same context-window input - but would presumably calculate the individual predictions for every context word, then average all those predictions – even more expensive, by a factor of 2 * window, than the CBOW approach.

(Either the CBOW or SG predictions should perhaps also simulate the distance-weighting that occurs during training. During training passes, windows aren't actually window size, but some random size from 1 to window. This means nearby words are most-often part of training-examples, and further-words are less-often part of training examples – effectively, a distance-weighting, but because it's accomplished by often leaving things out, rather than scaling words' effects, it speed training rather than the slowdown that extra scaling would required.)

yzexeter · 2017-05-29T10:51:19Z

@chinmayapancholi13 No problem. Thanks for your help. :)

chinmayapancholi13 added 2 commits March 13, 2017 00:57

added function to predict output word in CBOW from context words

70642f3

Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim …

146c5d2

…into word_predict

handling negative_sampling case

8d91738

chinmayapancholi13 changed the title ~~Added function "predict_output_word" to predict the output word given the context words. Fixes issue #863.~~ [WIP] Added function "predict_output_word" to predict the output word given the context words. Fixes issue #863. Mar 16, 2017

tmylk reviewed Mar 16, 2017

View reviewed changes

chinmayapancholi13 added 5 commits March 17, 2017 19:16

added warnings for out-of-vocabulary and not negative sampling cases

fa6d14f

Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim …

867d79a

…into word_predict

added unit tests for predict_output_word

d0aa6dc

updated CHANGELOG

da9bb32

Merge branch 'develop' of https://github.com/RaRe-Technologies/gensim …

13c6639

…into word_predict

piskvorky reviewed Mar 20, 2017

View reviewed changes

tmylk merged commit cc86005 into piskvorky:develop Mar 20, 2017

chinmayapancholi13 mentioned this pull request Mar 21, 2017

Updated word2vec.ipynb to include predict_output_word example #1228

Merged

menshikh-iv mentioned this pull request Aug 10, 2018

Adding Word-to-Context Prediction in Word2Vec (inverse of predict_output_word()) #2152

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Added function "predict_output_word" to predict the output word given the context words. Fixes issue #863. #1209

[WIP] Added function "predict_output_word" to predict the output word given the context words. Fixes issue #863. #1209

chinmayapancholi13 commented Mar 12, 2017

tmylk commented Mar 13, 2017

chinmayapancholi13 commented Mar 13, 2017

gojomo commented Mar 13, 2017 •

edited

Loading

chinmayapancholi13 commented Mar 13, 2017

tmylk Mar 16, 2017

tmylk Mar 16, 2017

tmylk Mar 16, 2017

tmylk commented Mar 17, 2017

tmylk commented Mar 17, 2017

piskvorky Mar 20, 2017

chinmayapancholi13 commented Mar 20, 2017

tmylk commented Mar 20, 2017

chinmayapancholi13 commented Mar 21, 2017

exoticknight commented Mar 21, 2017

chinmayapancholi13 commented Mar 21, 2017

exoticknight commented Mar 23, 2017

chinmayapancholi13 commented Mar 23, 2017

yzexeter commented May 19, 2017

chinmayapancholi13 commented May 19, 2017

yzexeter commented May 20, 2017

chinmayapancholi13 commented May 21, 2017

yzexeter commented May 22, 2017

chinmayapancholi13 commented May 27, 2017

gojomo commented May 29, 2017 •

edited

Loading

yzexeter commented May 29, 2017

[WIP] Added function "predict_output_word" to predict the output word given the context words. Fixes issue #863. #1209

[WIP] Added function "predict_output_word" to predict the output word given the context words. Fixes issue #863. #1209

Conversation

chinmayapancholi13 commented Mar 12, 2017

tmylk commented Mar 13, 2017

chinmayapancholi13 commented Mar 13, 2017

gojomo commented Mar 13, 2017 • edited Loading

chinmayapancholi13 commented Mar 13, 2017

tmylk Mar 16, 2017

Choose a reason for hiding this comment

tmylk Mar 16, 2017

Choose a reason for hiding this comment

tmylk Mar 16, 2017

Choose a reason for hiding this comment

tmylk commented Mar 17, 2017

tmylk commented Mar 17, 2017

piskvorky Mar 20, 2017

Choose a reason for hiding this comment

chinmayapancholi13 commented Mar 20, 2017

tmylk commented Mar 20, 2017

chinmayapancholi13 commented Mar 21, 2017

exoticknight commented Mar 21, 2017

chinmayapancholi13 commented Mar 21, 2017

exoticknight commented Mar 23, 2017

chinmayapancholi13 commented Mar 23, 2017

yzexeter commented May 19, 2017

chinmayapancholi13 commented May 19, 2017

yzexeter commented May 20, 2017

chinmayapancholi13 commented May 21, 2017

yzexeter commented May 22, 2017

chinmayapancholi13 commented May 27, 2017

gojomo commented May 29, 2017 • edited Loading

yzexeter commented May 29, 2017

gojomo commented Mar 13, 2017 •

edited

Loading

gojomo commented May 29, 2017 •

edited

Loading