-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix epsilon according to dtype in LdaModel #1770
Conversation
gensim/models/ldamodel.py
Outdated
@@ -286,6 +286,9 @@ def __init__(self, corpus=None, num_topics=100, id2word=None, | |||
>>> lda = LdaModel(corpus, num_topics=50, alpha='auto', eval_every=5) # train asymmetric alpha from data | |||
|
|||
""" | |||
if dtype not in {np.float16, np.float32, np.float64}: | |||
raise ValueError("Incorrect 'dtype', please choice one of numpy.float16, numpy.float32 or numpy.float64") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
choice
=> choose
gensim/models/ldamodel.py
Outdated
@@ -498,7 +501,8 @@ def inference(self, chunk, collect_sstats=False): | |||
# The optimal phi_{dwk} is proportional to expElogthetad_k * expElogbetad_w. | |||
# phinorm is the normalizer. | |||
# TODO treat zeros explicitly, instead of adding 1e-100? | |||
phinorm = np.dot(expElogthetad, expElogbetad) + 1e-100 | |||
eps = 1e-100 if self.dtype == np.float64 else (1e-35 if self.dtype == np.float32 else 1e-5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Easier to read and maintain as a mapping?
dtype_to_eps = {
np.float64: 1e-100,
np.float32: 1e-35,
np.float16: 1e-5,
}
And then also if dtype not in dtype_to_eps: ...
gensim/models/ldamodel.py
Outdated
@@ -498,7 +501,13 @@ def inference(self, chunk, collect_sstats=False): | |||
# The optimal phi_{dwk} is proportional to expElogthetad_k * expElogbetad_w. | |||
# phinorm is the normalizer. | |||
# TODO treat zeros explicitly, instead of adding 1e-100? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should update comment too: 1e-100
=> epsilon
.
gensim/models/ldamodel.py
Outdated
@@ -498,7 +501,13 @@ def inference(self, chunk, collect_sstats=False): | |||
# The optimal phi_{dwk} is proportional to expElogthetad_k * expElogbetad_w. | |||
# phinorm is the normalizer. | |||
# TODO treat zeros explicitly, instead of adding 1e-100? | |||
phinorm = np.dot(expElogthetad, expElogbetad) + 1e-100 | |||
dtype_to_eps = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better defined at module level, so __init__
can make use of the allowed keys? Now the same information is in three places, not DRY.
That's small fix for #1656, big thanks @rmalouf for catch this potential bug #1656 (comment)
What's done:
numpy.float16
,numpy.float32
,numpy.float64
)eps
depends ondtype
(instead of hardcoded1e-100
)CC: @piskvorky