Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove outdated bz2 examples from tutorials #1867

Merged
merged 1 commit into from
Feb 1, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions docs/src/dist_lsi.rst
Original file line number Diff line number Diff line change
Expand Up @@ -120,14 +120,13 @@ Distributed LSA on Wikipedia
First, download and prepare the Wikipedia corpus as per :doc:`wiki`, then load
the corpus iterator with::

>>> import logging, gensim, bz2
>>> import logging, gensim
>>> logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

>>> # load id->word mapping (the dictionary)
>>> id2word = gensim.corpora.Dictionary.load_from_text('wiki_en_wordids.txt')
>>> # load corpus iterator
>>> mm = gensim.corpora.MmCorpus('wiki_en_tfidf.mm')
>>> # mm = gensim.corpora.MmCorpus(bz2.BZ2File('wiki_en_tfidf.mm.bz2')) # use this if you compressed the TFIDF output
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this line doesn't sound right -- we still support bz2!

Just remove the (superfluous) bz2.BZFile wrapper. Dtto below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@piskvorky fixed in #1868


>>> print(mm)
MmCorpus(3199665 documents, 100000 features, 495547400 non-zero entries)
Expand Down
6 changes: 2 additions & 4 deletions docs/src/wiki.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,14 +38,13 @@ Latent Semantic Analysis

First let's load the corpus iterator and dictionary, created in the second step above::

>>> import logging, gensim, bz2
>>> import logging, gensim
>>> logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

>>> # load id->word mapping (the dictionary), one of the results of step 2 above
>>> id2word = gensim.corpora.Dictionary.load_from_text('wiki_en_wordids.txt')
>>> # load corpus iterator
>>> mm = gensim.corpora.MmCorpus('wiki_en_tfidf.mm')
>>> # mm = gensim.corpora.MmCorpus(bz2.BZ2File('wiki_en_tfidf.mm.bz2')) # use this if you compressed the TFIDF output (recommended)

>>> print(mm)
MmCorpus(3931787 documents, 100000 features, 756379027 non-zero entries)
Expand Down Expand Up @@ -93,14 +92,13 @@ Latent Dirichlet Allocation

As with Latent Semantic Analysis above, first load the corpus iterator and dictionary::

>>> import logging, gensim, bz2
>>> import logging, gensim
>>> logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

>>> # load id->word mapping (the dictionary), one of the results of step 2 above
>>> id2word = gensim.corpora.Dictionary.load_from_text('wiki_en_wordids.txt')
>>> # load corpus iterator
>>> mm = gensim.corpora.MmCorpus('wiki_en_tfidf.mm')
>>> # mm = gensim.corpora.MmCorpus(bz2.BZ2File('wiki_en_tfidf.mm.bz2')) # use this if you compressed the TFIDF output

>>> print(mm)
MmCorpus(3931787 documents, 100000 features, 756379027 non-zero entries)
Expand Down