Fix epsilon according to dtype in LdaModel #1770

menshikh-iv · 2017-12-07T17:16:26Z

That's small fix for #1656, big thanks @rmalouf for catch this potential bug #1656 (comment)

What's done:

limit number of types (only numpy.float16, numpy.float32, numpy.float64)
add adoptive eps depends on dtype (instead of hardcoded 1e-100)

piskvorky · 2017-12-07T18:27:43Z

gensim/models/ldamodel.py

@@ -286,6 +286,9 @@ def __init__(self, corpus=None, num_topics=100, id2word=None,
        >>> lda = LdaModel(corpus, num_topics=50, alpha='auto', eval_every=5)  # train asymmetric alpha from data

        """
+        if dtype not in {np.float16, np.float32, np.float64}:
+            raise ValueError("Incorrect 'dtype', please choice one of numpy.float16, numpy.float32 or numpy.float64")


choice => choose

piskvorky · 2017-12-07T18:29:16Z

gensim/models/ldamodel.py

@@ -498,7 +501,8 @@ def inference(self, chunk, collect_sstats=False):
            # The optimal phi_{dwk} is proportional to expElogthetad_k * expElogbetad_w.
            # phinorm is the normalizer.
            # TODO treat zeros explicitly, instead of adding 1e-100?
-            phinorm = np.dot(expElogthetad, expElogbetad) + 1e-100
+            eps = 1e-100 if self.dtype == np.float64 else (1e-35 if self.dtype == np.float32 else 1e-5)


Easier to read and maintain as a mapping?

dtype_to_eps = { np.float64: 1e-100, np.float32: 1e-35, np.float16: 1e-5, }

And then also if dtype not in dtype_to_eps: ...

piskvorky · 2017-12-07T18:52:42Z

gensim/models/ldamodel.py

@@ -498,7 +501,13 @@ def inference(self, chunk, collect_sstats=False):
            # The optimal phi_{dwk} is proportional to expElogthetad_k * expElogbetad_w.
            # phinorm is the normalizer.
            # TODO treat zeros explicitly, instead of adding 1e-100?


Should update comment too: 1e-100 => epsilon.

piskvorky · 2017-12-07T18:53:44Z

gensim/models/ldamodel.py

@@ -498,7 +501,13 @@ def inference(self, chunk, collect_sstats=False):
            # The optimal phi_{dwk} is proportional to expElogthetad_k * expElogbetad_w.
            # phinorm is the normalizer.
            # TODO treat zeros explicitly, instead of adding 1e-100?
-            phinorm = np.dot(expElogthetad, expElogbetad) + 1e-100
+            dtype_to_eps = {


Better defined at module level, so __init__ can make use of the allowed keys? Now the same information is in three places, not DRY.

add type check + adopt eps for different dtypes

db1ce0f

piskvorky requested changes Dec 7, 2017

View reviewed changes

menshikh-iv added 2 commits December 7, 2017 23:32

fix typo

5f56a1c

improve readability

488dfd1

piskvorky requested changes Dec 7, 2017

View reviewed changes

menshikh-iv added 2 commits December 8, 2017 00:03

move mapping to module level, remove duplication

e4e96e4

fix typo + remove duplication

3d46db8

menshikh-iv merged commit be4500e into develop Dec 7, 2017

menshikh-iv deleted the fix-lda-const branch December 7, 2017 21:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix epsilon according to dtype in LdaModel #1770

Fix epsilon according to dtype in LdaModel #1770

menshikh-iv commented Dec 7, 2017

piskvorky Dec 7, 2017

piskvorky Dec 7, 2017

piskvorky Dec 7, 2017 •

edited

Loading

piskvorky Dec 7, 2017

Fix epsilon according to dtype in LdaModel #1770

Fix epsilon according to dtype in LdaModel #1770

Conversation

menshikh-iv commented Dec 7, 2017

piskvorky Dec 7, 2017

Choose a reason for hiding this comment

piskvorky Dec 7, 2017

Choose a reason for hiding this comment

piskvorky Dec 7, 2017 • edited Loading

Choose a reason for hiding this comment

piskvorky Dec 7, 2017

Choose a reason for hiding this comment

piskvorky Dec 7, 2017 •

edited

Loading