Skip to content

Commit

Permalink
Remove outdated bz2 + MmCorpus examples from tutorials (piskvorky…
Browse files Browse the repository at this point in the history
  • Loading branch information
menshikh-iv authored Feb 1, 2018
1 parent 3159fa8 commit 5342153
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 6 deletions.
3 changes: 1 addition & 2 deletions docs/src/dist_lsi.rst
Original file line number Diff line number Diff line change
Expand Up @@ -120,14 +120,13 @@ Distributed LSA on Wikipedia
First, download and prepare the Wikipedia corpus as per :doc:`wiki`, then load
the corpus iterator with::

>>> import logging, gensim, bz2
>>> import logging, gensim
>>> logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

>>> # load id->word mapping (the dictionary)
>>> id2word = gensim.corpora.Dictionary.load_from_text('wiki_en_wordids.txt')
>>> # load corpus iterator
>>> mm = gensim.corpora.MmCorpus('wiki_en_tfidf.mm')
>>> # mm = gensim.corpora.MmCorpus(bz2.BZ2File('wiki_en_tfidf.mm.bz2')) # use this if you compressed the TFIDF output

>>> print(mm)
MmCorpus(3199665 documents, 100000 features, 495547400 non-zero entries)
Expand Down
6 changes: 2 additions & 4 deletions docs/src/wiki.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,14 +38,13 @@ Latent Semantic Analysis

First let's load the corpus iterator and dictionary, created in the second step above::

>>> import logging, gensim, bz2
>>> import logging, gensim
>>> logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

>>> # load id->word mapping (the dictionary), one of the results of step 2 above
>>> id2word = gensim.corpora.Dictionary.load_from_text('wiki_en_wordids.txt')
>>> # load corpus iterator
>>> mm = gensim.corpora.MmCorpus('wiki_en_tfidf.mm')
>>> # mm = gensim.corpora.MmCorpus(bz2.BZ2File('wiki_en_tfidf.mm.bz2')) # use this if you compressed the TFIDF output (recommended)

>>> print(mm)
MmCorpus(3931787 documents, 100000 features, 756379027 non-zero entries)
Expand Down Expand Up @@ -93,14 +92,13 @@ Latent Dirichlet Allocation

As with Latent Semantic Analysis above, first load the corpus iterator and dictionary::

>>> import logging, gensim, bz2
>>> import logging, gensim
>>> logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)

>>> # load id->word mapping (the dictionary), one of the results of step 2 above
>>> id2word = gensim.corpora.Dictionary.load_from_text('wiki_en_wordids.txt')
>>> # load corpus iterator
>>> mm = gensim.corpora.MmCorpus('wiki_en_tfidf.mm')
>>> # mm = gensim.corpora.MmCorpus(bz2.BZ2File('wiki_en_tfidf.mm.bz2')) # use this if you compressed the TFIDF output

>>> print(mm)
MmCorpus(3931787 documents, 100000 features, 756379027 non-zero entries)
Expand Down

0 comments on commit 5342153

Please sign in to comment.