-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor documentation for gensim.similarities.docsim
.
#1910
Changes from 3 commits
cb092aa
6bf32bd
abe79ce
1cdb6e8
8d25d65
6cd1c86
6063cba
5fe4ae5
e92d2d7
262bc1c
7139e6a
f2cb977
5ceda90
70f2de4
91d1ee1
b84535d
0909472
67e445b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -216,13 +216,31 @@ def __init__(self, input=None, dictionary=None, metadata=False, character_filter | |
|
||
Examples | ||
-------- | ||
>>> #TODO Example with inheritance | ||
>>> from gensim.corpora.textcorpus import TextCorpus | ||
>>> from gensim import corpora | ||
>>> from gensim.test.utils import datapath | ||
>>> from gensim import utils | ||
>>> | ||
>>> corpus = TextCorpus(datapath('head500.noblanks.cor.bz2')) | ||
>>> for bow in corpus: | ||
... pass | ||
>>> class CorpusMiislita(corpora.TextCorpus): | ||
>>> stoplist = set('for a of the and to in on'.split()) | ||
>>> | ||
>>> def get_texts(self): | ||
>>> for doc in self.getstream(): | ||
>>> yield [word for word in utils.to_unicode(doc).lower().split() | ||
>>> if word not in CorpusMiislita.stoplist] | ||
>>> | ||
>>> def __len__(self): | ||
>>> if 'length' not in self.__dict__: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no need to write something with logger, this should be simple & small example |
||
>>> logger.info("caching corpus size (calculating number of documents)") | ||
>>> self.length = sum(1 for _ in self.get_texts()) | ||
>>> return self.length | ||
>>> | ||
>>> corpus = CorpusMiislita(datapath('head500.noblanks.cor.bz2')) | ||
>>> corpus.get_texts() | ||
<generator object get_texts at 0x7fa932f397d0> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. bad output, can you show the concrete line of the dataset |
||
>>> corpus.__len__() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please |
||
250 | ||
|
||
|
||
""" | ||
self.input = input | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some issues with formatting