2vec saveload fixes #11

piskvorky · 2020-09-07T13:05:44Z

Merged current develop, resolved conflicts + a few minor docstring edits in 49b35b7.

Make docs clearer on `alpha` parameter in LDA model

[MRG] Docs: Update Hoffman paper link for Online LDA

…ings

- failing with weird errors on py3.7+3.8, see https://travis-ci.org/github/RaRe-Technologies/gensim/jobs/713448950#L862

[MRG] Fix similarity bug in NMSLIB indexer + documentation fixes

[MRG] Refresh docs for run_annoy tutorial

@maciejkula

…e the positive_definite parameter, and extend normalization capabilities of the inner_product method (#2783) * Deprecate SparseTermSimilarityMatrix's positive_definite parameter * Reference paper on efficient implementation of soft cosine similarity * Add example with Annoy indexer to SparseTermSimilarityMatrix * Add example of obtaining word embeddings from SparseTermSimilarityMatrix * Reduce space complexity of SparseTermSimilarityMatrix construction Build matrix using arrays and bitfields rather than DOK sparse format This work is based on the following blog post by @maciejkula: https://maciejkula.github.io/2015/02/22/incremental-construction-of-sparse-matrices/ * Fix a typo in the soft cosine similarity Jupyter notebook * Add human-readable string representation for TermSimilarityIndex * Avoid sparse term similarity matrix computation when nonzero_limit <= 0 * Extend normalization in the inner_product method Support the `maintain` vector normalization scheme. Support separate vector normalization schemes for queries and documents. * Remove a note in the docstring of SparseTermSimilarityMatrix * Rerun continuous integration tests * Use ==/!= to compare constant literals * Add human-readable string representation for TermSimilarityIndex (cont.) * Prod flake8 with a coding style violation in a docstring * Collapse two lambdas into one internal function * Revert "Prod flake8 with a coding style violation in a docstring" This reverts commit 6557b84. * Avoid str.format() * Slice SparseTermSimilarityMatrix.inner_product tests by input types * Remove similarity_type_code local variable * Remove starting underscore from local function name * Save indentation level and define populate_buffers function * Extract SparseTermSimilarityMatrix constructor body to _create_source * Extract NON_NEGATIVE_NORM_ASSERTION_MESSAGE to a module-level constant * Extract cell assignment logic to cell_full local function * Split variable swapping into three separate statements * Extract normalization from the body of SparseTermSimilarityMatrix.inner_product * Wrap overlong line * Add test_inner_product_zerovector_zerovector and test_inner_product_zerovector_vector tests * Further split test_inner_product into 63 test cases * Raise ValueError when dictionary is empty

* bug fix: wikicorpus getstream from data file-path \n Replace fname with input * refactor: use property decorator for input Co-authored-by: jshah02 <jenisnehal.shah@factset.com>

Co-authored-by: Radim Řehůřek <me@radimrehurek.com>

This reverts commit b5794ee.

xh2 and others added 30 commits July 24, 2020 14:09

Make docs clearer on alpha parameter in LDA model

03c8bb9

Merge pull request #1 from xh2/patch-1

7791b74

Make docs clearer on `alpha` parameter in LDA model

Update Hoffman paper link

4e1b09c

rm whitespace

25005c5

Update gensim/models/ldamodel.py

f34956c

Update gensim/models/ldamodel.py

7d0ef9e

Merge pull request #2896 from xh2/bugfix/lda-doc-alpha

a662e8d

Make docs clearer on `alpha` parameter in LDA model

Update gensim/models/ldamodel.py

78778a9

Merge pull request #2897 from xh2/bugfix/hoffman-paper-link

344c4ab

[MRG] Docs: Update Hoffman paper link for Online LDA

re-applying changes from #2821

b70c826

migrating + regenerating changed docs

a81e547

fix forgotten iteritems

78fe1c4

remove extra model.wv

a0e40ca

split overlong doc line

4cf4da0

get rid of six in doc2vec

161ad55

increase test timeout for Visdom server

31d2b87

add 32/64 bits report

bc95bcb

add deprecations for init_sims()

c834e06

remove vectors_norm + add link to migration guide to deprecation warn…

172e37f

…ings

rename vectors_norm everywhere, update tests, regen docs

3919b68

put back no-op property setter of deprecated vectors_norm

d40f685

fix typo

872c8ed

fix flake8

4c1b3f7

disable Keras tests

b39eec2

- failing with weird errors on py3.7+3.8, see https://travis-ci.org/github/RaRe-Technologies/gensim/jobs/713448950#L862

Merge pull request #2899 from RaRe-Technologies/pr2821

d5556ea

[MRG] Fix similarity bug in NMSLIB indexer + documentation fixes

test showing FT failure as W2V

f2fd045

set .vectors even when ngrams off

7ab1501

Update gensim/test/test_fasttext.py

ce16168

Update gensim/test/test_fasttext.py

779fe46

refresh docs for run_annoy tutorial

9289c3b

piskvorky and others added 9 commits August 3, 2020 10:28

Merge pull request #2910 from RaRe-Technologies/rerun_tutorial

4b7e372

[MRG] Refresh docs for run_annoy tutorial

Fix doc2vec crash for large sets of doc-vectors (#2907)

28a2110

Fix AttributeError in WikiCorpus (#2901)

817cac9

* bug fix: wikicorpus getstream from data file-path \n Replace fname with input * refactor: use property decorator for input Co-authored-by: jshah02 <jenisnehal.shah@factset.com>

intensify cbow+hs tests; bulk testing method

fc4b97f

use increment operator

030e650

Co-authored-by: Radim Řehůřek <me@radimrehurek.com>

Change num_words to topn in dtm_coherence (#2926)

6e0d00b

Merge branch 'develop' into 2vec_saveload_fixes

d524fa4

docstirng fixes

49b35b7

piskvorky mentioned this pull request Sep 7, 2020

[WIP] 2Vec SaveLoad improvements piskvorky/gensim#2892

Closed

get rid of python2 constructs

3f972a6

gojomo merged commit b5794ee into gojomo:2vec_saveload_fixes Sep 8, 2020

gojomo added a commit that referenced this pull request Sep 8, 2020

Revert "2vec saveload fixes (#11)"

171e55a

This reverts commit b5794ee.

piskvorky mentioned this pull request Sep 9, 2020

[MRG] *2Vec SaveLoad improvements piskvorky/gensim#2939

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2vec saveload fixes #11

2vec saveload fixes #11

piskvorky commented Sep 7, 2020 •

edited

Loading

2vec saveload fixes #11

2vec saveload fixes #11

Conversation

piskvorky commented Sep 7, 2020 • edited Loading

piskvorky commented Sep 7, 2020 •

edited

Loading