Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Fix similarity bug in NMSLIB indexer #2821

Closed
wants to merge 22 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,9 @@ matrix:
- python: '3.6'
env: TOXENV="py36-linux"

- python: '2.7'
env: TOXENV="py27-linux"


install:
- pip install tox
Expand Down
69 changes: 69 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,75 @@ Changes

## :warning: 3.8.x will be the last gensim version to support Py2.7. Starting with 4.0.0, gensim will only support Py3.5 and above

## 3.8.3, 2020-05-03

This is primarily a bugfix release to bring back Py2.7 compatibility to gensim 3.8.

### :red_circle: Bug fixes

* Bring back Py27 support (PR [#2812](https://github.com/RaRe-Technologies/gensim/pull/2812), __[@mpenkov](https://github.com/mpenkov)__)
* Fix wrong version reported by setup.py (Issue [#2796](https://github.com/RaRe-Technologies/gensim/issues/2796))
* Fix missing C extensions (Issues [#2794](https://github.com/RaRe-Technologies/gensim/issues/2794) and [#2802](https://github.com/RaRe-Technologies/gensim/issues/2802))

### :+1: Improvements

* Wheels for Python 3.8 (__[@menshikh-iv](https://github.com/menshikh-iv)__)
* Prepare for removal of deprecated `lxml.etree.cElementTree` (PR [#2777](https://github.com/RaRe-Technologies/gensim/pull/2777), __[@tirkarthi](https://github.com/tirkarthi)__)

### :books: Tutorial and doc improvements

* Update test instructions in README (PR [#2814](https://github.com/RaRe-Technologies/gensim/pull/2814), __[@piskvorky](https://github.com/piskvorky)__)

### :warning: Deprecations (will be removed in the next major release)

* Remove
- `gensim.models.FastText.load_fasttext_format`: use load_facebook_vectors to load embeddings only (faster, less CPU/memory usage, does not support training continuation) and load_facebook_model to load full model (slower, more CPU/memory intensive, supports training continuation)
- `gensim.models.wrappers.fasttext` (obsoleted by the new native `gensim.models.fasttext` implementation)
- `gensim.examples`
- `gensim.nosy`
- `gensim.scripts.word2vec_standalone`
- `gensim.scripts.make_wiki_lemma`
- `gensim.scripts.make_wiki_online`
- `gensim.scripts.make_wiki_online_lemma`
- `gensim.scripts.make_wiki_online_nodebug`
- `gensim.scripts.make_wiki` (all of these obsoleted by the new native `gensim.scripts.segment_wiki` implementation)
- "deprecated" functions and attributes

* Move
- `gensim.scripts.make_wikicorpus` ➡ `gensim.scripts.make_wiki.py`
- `gensim.summarization` ➡ `gensim.models.summarization`
- `gensim.topic_coherence` ➡ `gensim.models._coherence`
- `gensim.utils` ➡ `gensim.utils.utils` (old imports will continue to work)
- `gensim.parsing.*` ➡ `gensim.utils.text_utils`

## 3.8.2, 2020-04-10

### :red_circle: Bug fixes

* Pin `smart_open` version for compatibility with Py2.7

### :warning: Deprecations (will be removed in the next major release)

* Remove
- `gensim.models.FastText.load_fasttext_format`: use load_facebook_vectors to load embeddings only (faster, less CPU/memory usage, does not support training continuation) and load_facebook_model to load full model (slower, more CPU/memory intensive, supports training continuation)
- `gensim.models.wrappers.fasttext` (obsoleted by the new native `gensim.models.fasttext` implementation)
- `gensim.examples`
- `gensim.nosy`
- `gensim.scripts.word2vec_standalone`
- `gensim.scripts.make_wiki_lemma`
- `gensim.scripts.make_wiki_online`
- `gensim.scripts.make_wiki_online_lemma`
- `gensim.scripts.make_wiki_online_nodebug`
- `gensim.scripts.make_wiki` (all of these obsoleted by the new native `gensim.scripts.segment_wiki` implementation)
- "deprecated" functions and attributes

* Move
- `gensim.scripts.make_wikicorpus` ➡ `gensim.scripts.make_wiki.py`
- `gensim.summarization` ➡ `gensim.models.summarization`
- `gensim.topic_coherence` ➡ `gensim.models._coherence`
- `gensim.utils` ➡ `gensim.utils.utils` (old imports will continue to work)
- `gensim.parsing.*` ➡ `gensim.utils.text_utils`

## 3.8.1, 2019-09-23

### :red_circle: Bug fixes
Expand Down
11 changes: 6 additions & 5 deletions appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,11 @@ environment:
PYTHON_ARCH: "64"
TOXENV: "py36-win"

- PYTHON: "C:\\Python27-x64"
PYTHON_VERSION: "2.7.17"
PYTHON_ARCH: "64"
TOXENV: "py27-win"

init:
- "ECHO %PYTHON% %PYTHON_VERSION% %PYTHON_ARCH%"
- "ECHO \"%APPVEYOR_SCHEDULED_BUILD%\""
Expand All @@ -53,11 +58,7 @@ install:
- "SET PATH=%PYTHON%;%PYTHON%\\Scripts;%PATH%"
- "python -m pip install -U pip tox"

# Next line only to demo that py3.8-on-Appveyor *could* install Cython from wheel just fine,
# despite mysterious following attempt/failure to build-and-use Cython on that one Appveyor config.
# Delete when py3.8-on-Appveyor starts working normally,
# see comment at <https://github.com/RaRe-Technologies/gensim/pull/2715#issuecomment-569457589>
- "python -m pip install Cython==0.29.14 numpy==1.18.0"
- "python -m pip install Cython==0.29.14"

# Check that we have the expected versions and architecture
- "pip --version"
Expand Down
6 changes: 3 additions & 3 deletions docs/src/_matutils.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
:mod:`_matutils` -- Cython matutils
===================================
:mod:`_matutils` -- Compiled extension for math utils
=====================================================

.. automodule:: gensim._matutils
:synopsis: Cython math utils
:synopsis: Compiled extension for math utils
:members:
:inherited-members:
:undoc-members:
Expand Down
5 changes: 3 additions & 2 deletions docs/src/apiref.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ Modules:
models/_fasttext_bin
models/phrases
models/poincare
viz/poincare
models/coherencemodel
models/basemodel
models/callbacks
Expand All @@ -72,7 +73,8 @@ Modules:
models/base_any2vec
similarities/docsim
similarities/termsim
similarities/index
similarities/annoy
similarities/nmslib
sklearn_api/atmodel
sklearn_api/d2vmodel
sklearn_api/hdp
Expand Down Expand Up @@ -111,4 +113,3 @@ Modules:
summarization/summariser
summarization/syntactic_unit
summarization/textcleaner
viz/poincare
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions docs/src/auto_examples/core/run_similarity_queries.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"\nSimilarity Queries\n==================\n\nDemonstrates querying a corpus for similar documents.\n\n"
"\nSimilarity Queries\n==================\n\nDemonstrates querying a corpus for similar documents.\n"
]
},
{
Expand Down Expand Up @@ -190,7 +190,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
"version": "3.7.3"
}
},
"nbformat": 4,
Expand Down
2 changes: 1 addition & 1 deletion docs/src/auto_examples/core/run_similarity_queries.py.md5
Original file line number Diff line number Diff line change
@@ -1 +1 @@
a3eaf7347874a32d1d25a455753206dc
54804120deb345715247f0eed42b5e0e
51 changes: 31 additions & 20 deletions docs/src/auto_examples/core/run_similarity_queries.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
.. note::
:class: sphx-glr-download-link-note
.. only:: html

.. note::
:class: sphx-glr-download-link-note

Click :ref:`here <sphx_glr_download_auto_examples_core_run_similarity_queries.py>` to download the full example code
.. rst-class:: sphx-glr-example-title
Click :ref:`here <sphx_glr_download_auto_examples_core_run_similarity_queries.py>` to download the full example code
.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_core_run_similarity_queries.py:
.. _sphx_glr_auto_examples_core_run_similarity_queries.py:


Similarity Queries
Expand All @@ -25,6 +27,7 @@ Demonstrates querying a corpus for similar documents.




Creating the Corpus
-------------------

Expand Down Expand Up @@ -78,6 +81,7 @@ if you completed it, feel free to skip to the next section.




Similarity interface
--------------------

Expand Down Expand Up @@ -111,6 +115,7 @@ LSI space:




For the purposes of this tutorial, there are only two things you need to know about LSI.
First, it's just another transformation: it transforms vectors from one space to another.
Second, the benefit of LSI is that enables identifying patterns and relationships between terms (in our case, words in a document) and topics.
Expand Down Expand Up @@ -142,7 +147,8 @@ no random-walk static ranks, just a semantic extension over the boolean keyword

.. code-block:: none

[(0, 0.4618210045327158), (1, 0.07002766527900064)]
[(0, 0.4618210045327162), (1, -0.07002766527900038)]




Expand Down Expand Up @@ -173,6 +179,7 @@ might also be indexing a different corpus altogether.




.. warning::
The class :class:`similarities.MatrixSimilarity` is only appropriate when the whole
set of vectors fits into memory. For example, a corpus of one million documents
Expand All @@ -198,6 +205,7 @@ Index persistency is handled via the standard :func:`save` and :func:`load` func




This is true for all similarity indexing classes (:class:`similarities.Similarity`,
:class:`similarities.MatrixSimilarity` and :class:`similarities.SparseMatrixSimilarity`).
Also in the following, `index` can be an object of any of these. When in doubt,
Expand Down Expand Up @@ -230,6 +238,7 @@ To obtain similarities of our query document against the nine indexed documents:




Cosine measure returns similarities in the range `<-1, 1>` (the greater, the more similar),
so that the first document has a score of 0.99809301 etc.

Expand All @@ -254,15 +263,16 @@ order, and obtain the final answer to the query `"Human computer interaction"`:

.. code-block:: none

(2, 0.9984453) Human machine interface for lab abc computer applications
(0, 0.998093) A survey of user opinion of computer system response time
(3, 0.9865886) The EPS user interface management system
(1, 0.93748635) System and human system engineering testing of EPS
(4, 0.90755945) Relation of user perceived response time to error measurement
(8, 0.050041765) The generation of random binary unordered trees
(7, -0.09879464) The intersection graph of paths in trees
(6, -0.10639259) Graph minors IV Widths of trees and well quasi ordering
(5, -0.12416792) Graph minors A survey
0.9984453 The EPS user interface management system
0.998093 Human machine interface for lab abc computer applications
0.9865886 System and human system engineering testing of EPS
0.93748635 A survey of user opinion of computer system response time
0.90755945 Relation of user perceived response time to error measurement
0.050041765 Graph minors A survey
-0.09879464 Graph minors IV Widths of trees and well quasi ordering
-0.10639259 The intersection graph of paths in trees
-0.12416792 The generation of random binary unordered trees




Expand Down Expand Up @@ -319,17 +329,18 @@ on large datasets easily, and to facilitate prototyping of new algorithms for re

.. code-block:: none

/Volumes/work/workspace/gensim_misha/docs/src/gallery/core/run_similarity_queries.py:194: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure.
/Volumes/work/workspace/gensim/trunk/docs/src/gallery/core/run_similarity_queries.py:194: UserWarning: Matplotlib is currently using agg, which is a non-GUI backend, so cannot show the figure.
plt.show()





.. rst-class:: sphx-glr-timing

**Total running time of the script:** ( 0 minutes 0.663 seconds)
**Total running time of the script:** ( 0 minutes 1.563 seconds)

**Estimated memory usage:** 6 MB
**Estimated memory usage:** 37 MB


.. _sphx_glr_download_auto_examples_core_run_similarity_queries.py:
Expand All @@ -342,13 +353,13 @@ on large datasets easily, and to facilitate prototyping of new algorithms for re



.. container:: sphx-glr-download
.. container:: sphx-glr-download sphx-glr-download-python

:download:`Download Python source code: run_similarity_queries.py <run_similarity_queries.py>`



.. container:: sphx-glr-download
.. container:: sphx-glr-download sphx-glr-download-jupyter

:download:`Download Jupyter notebook: run_similarity_queries.ipynb <run_similarity_queries.ipynb>`

Expand Down
15 changes: 10 additions & 5 deletions docs/src/auto_examples/core/sg_execution_times.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,14 @@

Computation times
=================
**00:00.844** total execution time for **auto_examples_core** files:
**00:01.563** total execution time for **auto_examples_core** files:

- **00:00.844**: :ref:`sphx_glr_auto_examples_core_run_topics_and_transformations.py` (``run_topics_and_transformations.py``)
- **00:00.000**: :ref:`sphx_glr_auto_examples_core_run_core_concepts.py` (``run_core_concepts.py``)
- **00:00.000**: :ref:`sphx_glr_auto_examples_core_run_corpora_and_vector_spaces.py` (``run_corpora_and_vector_spaces.py``)
- **00:00.000**: :ref:`sphx_glr_auto_examples_core_run_similarity_queries.py` (``run_similarity_queries.py``)
+--------------------------------------------------------------------------------------------------------------+-----------+---------+
| :ref:`sphx_glr_auto_examples_core_run_similarity_queries.py` (``run_similarity_queries.py``) | 00:01.563 | 37.4 MB |
+--------------------------------------------------------------------------------------------------------------+-----------+---------+
| :ref:`sphx_glr_auto_examples_core_run_core_concepts.py` (``run_core_concepts.py``) | 00:00.000 | 0.0 MB |
+--------------------------------------------------------------------------------------------------------------+-----------+---------+
| :ref:`sphx_glr_auto_examples_core_run_corpora_and_vector_spaces.py` (``run_corpora_and_vector_spaces.py``) | 00:00.000 | 0.0 MB |
+--------------------------------------------------------------------------------------------------------------+-----------+---------+
| :ref:`sphx_glr_auto_examples_core_run_topics_and_transformations.py` (``run_topics_and_transformations.py``) | 00:00.000 | 0.0 MB |
+--------------------------------------------------------------------------------------------------------------+-----------+---------+
Loading