Added malletmodel2ldamodel transformation function to mallet wrapper #766

devashishd12 · 2016-06-30T05:51:36Z

This is concerning this thread in the mailing list. @piskvorky @tmylk could you please check?

bhargavvader · 2016-06-30T05:59:26Z

gensim/models/wrappers/ldamallet.py

+    -------
+    model_gensim : LdaModel instance; copied gensim LdaModel
+    """
+    model_gensim = LdaModel(id2word=mallet_model.id2word, num_topics=mallet_model.num_topics, alpha=mallet_model.alpha, iterations=100)


Just curious, is there no way to pick up the number of iterations from the mallet model? I notice you've kept it as the default 100 right now.

Yes even I thought of doing that but I instead just followed what @piskvorky wrote in the mailing list. Both are giving similar results though. I guess I'll switch this to mallet_model.iterations for better resemblance to the mallet model.

devashishd12 · 2016-06-30T06:16:38Z

gensim/test/test_ldamallet_wrapper.py

+        tm2 = ldamallet.malletmodel2ldamodel(tm1)
+        for document in corpus:
+            self.assertAlmostEqual(tm1[document][0][1], tm2[document][0][1], places=1)
+            self.assertAlmostEqual(tm1[document][1][1], tm2[document][1][1])


@piskvorky I'm not too sure about this test. Values sometimes are differing at the first place too.

Try more accurate gamma_threshold and iterations. But I don't think the results can really be made identical -- the inference algorithms are completely different.

I think the best we can hope for is a sanity check -- topics in same order, on some (model, query_document) pair where it's clear what the topic order should be, and where any deviation in topic order is clearly an error.

devashishd12 · 2016-07-02T20:25:01Z

@piskvorky I've addressed the comments. Could you please check?

piskvorky · 2016-07-03T05:37:16Z

gensim/models/wrappers/ldamallet.py

-    model_gensim = LdaModel(id2word=mallet_model.id2word, num_topics=mallet_model.num_topics,
-                            alpha=mallet_model.alpha, iterations=mallet_model.iterations)
+    model_gensim = LdaModel(
+                id2word=mallet_model.id2word, num_topics=mallet_model.num_topics,


PEP8: Hanging indent of 4 spaces.

devashishd12 · 2016-07-06T05:58:50Z

@piskvorky I've made the changes. Could you please check?

Added mallet to ldamodel transformation function

c36dfc1

bhargavvader reviewed Jun 30, 2016
View reviewed changes

devashishd12 changed the title ~~Added malletmode2ldamodel transformation function to mallet wrapper~~ Added malletmodel2ldamodel transformation function to mallet wrapper Jun 30, 2016

changed 100 iterations to mallet_model.iterations.

6c3a0c9

devashishd12 reviewed Jun 30, 2016
View reviewed changes

modified tests, added gamma_thresh and iter params.

d4b9d45

piskvorky reviewed Jul 3, 2016
View reviewed changes

Made changes in documentation, indentation.

5d9149a

tmylk merged commit fb8e4e5 into piskvorky:develop Jul 6, 2016

devashishd12 deleted the mallet_to_model branch July 7, 2016 10:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added malletmodel2ldamodel transformation function to mallet wrapper #766

Added malletmodel2ldamodel transformation function to mallet wrapper #766

devashishd12 commented Jun 30, 2016

bhargavvader Jun 30, 2016

devashishd12 Jun 30, 2016

devashishd12 Jun 30, 2016

piskvorky Jun 30, 2016

devashishd12 commented Jul 2, 2016

piskvorky Jul 3, 2016

devashishd12 commented Jul 6, 2016

Added malletmodel2ldamodel transformation function to mallet wrapper #766

Added malletmodel2ldamodel transformation function to mallet wrapper #766

Conversation

devashishd12 commented Jun 30, 2016

bhargavvader Jun 30, 2016

Choose a reason for hiding this comment

devashishd12 Jun 30, 2016

Choose a reason for hiding this comment

devashishd12 Jun 30, 2016

Choose a reason for hiding this comment

piskvorky Jun 30, 2016

Choose a reason for hiding this comment

devashishd12 commented Jul 2, 2016

piskvorky Jul 3, 2016

Choose a reason for hiding this comment

devashishd12 commented Jul 6, 2016