-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove outdated bz2 examples from tutorials #1867
Conversation
>>> logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO) | ||
|
||
>>> # load id->word mapping (the dictionary) | ||
>>> id2word = gensim.corpora.Dictionary.load_from_text('wiki_en_wordids.txt') | ||
>>> # load corpus iterator | ||
>>> mm = gensim.corpora.MmCorpus('wiki_en_tfidf.mm') | ||
>>> # mm = gensim.corpora.MmCorpus(bz2.BZ2File('wiki_en_tfidf.mm.bz2')) # use this if you compressed the TFIDF output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this line doesn't sound right -- we still support bz2
!
Just remove the (superfluous) bz2.BZFile
wrapper. Dtto below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@piskvorky fixed in #1868
Why don't we support file-like objects in |
@piskvorky I see code for support from gensim.corpora import MmCorpus
import bz2
f = bz2.BZ2File("testcorpus.mm.bz2")
print(f.closed) # 0
corpus = MmCorpus(f)
print(f.closed) # 1 ??? for this reason, if we try to read from this, we'll receive an exception suggested in mailing list. |
@piskvorky UPD, I found what's a reason for this behavior: in this line, we using This is a bug anyway (because internally we use |
* Revert "Remove outdated `bz2` + `MmCorpus` examples from tutorials (piskvorky#1867)" This reverts commit 5342153. * remove bz2 wrapper * remove bz2 wrapper[2]
MmReader support only
filename
as input (notfile-like object
), but in the old documentation (wiki.rst
/dist_lsi.rst
) we usedfile-like object
too as input.Current PR remove this outdated usage from examples.
Based on mailing list post