-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added malletmodel2ldamodel transformation function to mallet wrapper #766
Conversation
------- | ||
model_gensim : LdaModel instance; copied gensim LdaModel | ||
""" | ||
model_gensim = LdaModel(id2word=mallet_model.id2word, num_topics=mallet_model.num_topics, alpha=mallet_model.alpha, iterations=100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, is there no way to pick up the number of iterations from the mallet model? I notice you've kept it as the default 100
right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes even I thought of doing that but I instead just followed what @piskvorky wrote in the mailing list. Both are giving similar results though. I guess I'll switch this to mallet_model.iterations
for better resemblance to the mallet model.
tm2 = ldamallet.malletmodel2ldamodel(tm1) | ||
for document in corpus: | ||
self.assertAlmostEqual(tm1[document][0][1], tm2[document][0][1], places=1) | ||
self.assertAlmostEqual(tm1[document][1][1], tm2[document][1][1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@piskvorky I'm not too sure about this test. Values sometimes are differing at the first place too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try more accurate gamma_threshold
and iterations
. But I don't think the results can really be made identical -- the inference algorithms are completely different.
I think the best we can hope for is a sanity check -- topics in same order, on some (model, query_document) pair where it's clear what the topic order should be, and where any deviation in topic order is clearly an error.
@piskvorky I've addressed the comments. Could you please check? |
model_gensim = LdaModel(id2word=mallet_model.id2word, num_topics=mallet_model.num_topics, | ||
alpha=mallet_model.alpha, iterations=mallet_model.iterations) | ||
model_gensim = LdaModel( | ||
id2word=mallet_model.id2word, num_topics=mallet_model.num_topics, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PEP8: Hanging indent of 4 spaces.
@piskvorky I've made the changes. Could you please check? |
This is concerning this thread in the mailing list. @piskvorky @tmylk could you please check?