Skip to content

Commit

Permalink
Update Doc2Vec documentation: how tags are assigned in `corpus_file…
Browse files Browse the repository at this point in the history
…` mode (#2320)

* add clarification regarding tags of documents in corpus_file mode for Doc2Vec

* based on -> equal to
  • Loading branch information
persiyanov authored and menshikh-iv committed Jan 8, 2019
1 parent 9c5215a commit e0bfb3f
Showing 1 changed file with 6 additions and 3 deletions.
9 changes: 6 additions & 3 deletions gensim/models/doc2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -487,7 +487,8 @@ def __init__(self, documents=None, corpus_file=None, dm_mean=None, dm=1, dbow_wo
corpus_file : str, optional
Path to a corpus file in :class:`~gensim.models.word2vec.LineSentence` format.
You may use this argument instead of `sentences` to get performance boost. Only one of `sentences` or
`corpus_file` arguments need to be passed (or none of them).
`corpus_file` arguments need to be passed (or none of them). Documents' tags are assigned automatically
and are equal to line number, as in :class:`~gensim.models.doc2vec.TaggedLineDocument`.
dm : {1,0}, optional
Defines the training algorithm. If `dm=1`, 'distributed memory' (PV-DM) is used.
Otherwise, `distributed bag of words` (PV-DBOW) is employed.
Expand Down Expand Up @@ -761,7 +762,8 @@ def train(self, documents=None, corpus_file=None, total_examples=None, total_wor
corpus_file : str, optional
Path to a corpus file in :class:`~gensim.models.word2vec.LineSentence` format.
You may use this argument instead of `sentences` to get performance boost. Only one of `sentences` or
`corpus_file` arguments need to be passed (not both of them).
`corpus_file` arguments need to be passed (not both of them). Documents' tags are assigned automatically
and are equal to line number, as in :class:`~gensim.models.doc2vec.TaggedLineDocument`.
total_examples : int, optional
Count of sentences.
total_words : int, optional
Expand Down Expand Up @@ -1140,7 +1142,8 @@ def build_vocab(self, documents=None, corpus_file=None, update=False, progress_p
corpus_file : str, optional
Path to a corpus file in :class:`~gensim.models.word2vec.LineSentence` format.
You may use this argument instead of `sentences` to get performance boost. Only one of `sentences` or
`corpus_file` arguments need to be passed (not both of them).
`corpus_file` arguments need to be passed (not both of them). Documents' tags are assigned automatically
and are equal to a line number, as in :class:`~gensim.models.doc2vec.TaggedLineDocument`.
update : bool
If true, the new words in `sentences` will be added to model's vocab.
progress_per : int
Expand Down

0 comments on commit e0bfb3f

Please sign in to comment.