Skip to content

Commit

Permalink
Fix misleading Doc2Vec.docvecs comment (#2472)
Browse files Browse the repository at this point in the history
* Fix misleading Doc2Vec.docvecs comment

Existing doc-comment was confused & misleading, implying `Doc2Vec` handles word-senses by giving single word tokens different word-vectors in different contexts. (See <https://stackoverflow.com/questions/55939511/word-vectors-from-a-whole-doc2vec-model-vs-word-vectors-from-a-particular-docum/55941468#55941468> for an example confused user.) `Doc2Vec` doesn't do that, so this changes the comment to be matter-of-fact about accessing vectors via `.docvecs`.
  • Loading branch information
gojomo authored and mpenkov committed May 4, 2019
1 parent 460dc1c commit cb3a4a7
Showing 1 changed file with 5 additions and 10 deletions.
15 changes: 5 additions & 10 deletions gensim/models/doc2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -447,18 +447,13 @@ class Doc2Vec(BaseWordEmbeddingsModel):
directly to query those embeddings in various ways. See the module level docstring for examples.
docvecs : :class:`~gensim.models.keyedvectors.Doc2VecKeyedVectors`
This object contains the paragraph vectors. Remember that the only difference between this model and
:class:`~gensim.models.word2vec.Word2Vec` is that besides the word vectors we also include paragraph embeddings
to capture the paragraph.
This object contains the paragraph vectors learned from the training data. There will be one such vector
for each unique document tag supplied during training. They may be individually accessed using the tag
as an indexed-access key. For example, if one of the training documents used a tag of 'doc003':
In this way we can capture the difference between the same word used in a different context.
For example we now have a different representation of the word "leaves" in the following two sentences ::
1. Manos leaves the office every day at 18:00 to catch his train
2. This season is called Fall, because leaves fall from the trees.
.. sourcecode:: pycon
In a plain :class:`~gensim.models.word2vec.Word2Vec` model the word would have exactly the same representation
in both sentences, in :class:`~gensim.models.doc2vec.Doc2Vec` it will not.
>>> model.docvecs['doc003']
vocabulary : :class:`~gensim.models.doc2vec.Doc2VecVocab`
This object represents the vocabulary (sometimes called Dictionary in gensim) of the model.
Expand Down

0 comments on commit cb3a4a7

Please sign in to comment.