Fix docstrings for `gensim.models.AuthorTopicModel` #1907

souravsingh · 2018-02-14T18:46:12Z

The PR aims to fix the docstrings for Author-topic model.

Fixes the documentation of Author-Topic model.

The PR aims to fix the docstrings for Author-topic model.

menshikh-iv

Good start, please fix current problems & write docstrings for the next part of this file

menshikh-iv · 2018-02-15T04:27:24Z

gensim/models/atmodel.py

+
+    Parameters
+    ----------
+    doc2author: dict


dict of ??? here and everywhere

menshikh-iv · 2018-02-15T04:27:36Z

gensim/models/atmodel.py

+
+    Parameters
+    ----------
+    corpus: list of list of str


maybe iterable of ...?

menshikh-iv · 2018-02-15T04:28:06Z

gensim/models/atmodel.py

+        ----------
+        eta: float
+            Dirichlet topic parameter for sparsity.
+        lambda_shape: float


incorrect type

menshikh-iv · 2018-02-15T04:28:38Z

gensim/models/atmodel.py

@@ -107,96 +134,69 @@ def construct_author2doc(doc2author):


 class AuthorTopicModel(LdaModel):
-    """
-    The constructor estimates the author-topic model parameters based


Need end2end example of usage in docstring (i.e. I can copy-paste it and this should works

Done at the end of the docstring as a separate section.

menshikh-iv · 2018-02-15T04:29:13Z

gensim/models/atmodel.py

+        ----------
+        num_topic: int, optional
+            Number of topics to be extracted from the training corpus.
+


No needed empty lines between parameter definition (here and everywhere)

menshikh-iv · 2018-02-15T04:29:40Z

gensim/models/atmodel.py

@@ -68,6 +68,18 @@ class AuthorTopicState(LdaState):
    """


missing docstring for module (examples of usage, liks to related papers, etc)

I have added a usage example in the docstring.

what's about related papers/links/etc? add one more example - almost always a good idea

I have added the paper on Author topic model to the top of the script.

menshikh-iv · 2018-02-20T04:47:40Z

@souravsingh hi, have you some schedule for this file (some date when you plan to finish)?

souravsingh · 2018-02-20T05:22:03Z

I will be making some final updates by today

menshikh-iv · 2018-02-20T05:37:58Z

You have many "TODO" now, good luck, I hope you will succeed today @souravsingh 👍

souravsingh · 2018-02-20T13:10:18Z

I have pushed the changes for the rest of the functions. I would need a review for the changes before the patch can be merged.

menshikh-iv · 2018-02-21T07:59:03Z

gensim/models/atmodel.py

+        Example
+        -------
+        >>> import numpy as np
+        >>> from gensim.models import AuthorTopicModel
        >>> model = AuthorTopicModel(corpus, num_topics=100, author2doc=author2doc, id2word=id2word)  # train model


this example doesn't work (try to copy-paste it to console and run), this must work.

menshikh-iv · 2018-02-21T07:59:23Z

gensim/models/atmodel.py

+
+        Parameters
+        ----------
+        expElogthetad: numpy.ndarray


where is parameter description?

I wasnt sure about what to write for the description, which is why I left it blank.

menshikh-iv · 2018-02-21T07:59:39Z

gensim/models/atmodel.py

@@ -379,6 +381,26 @@ def inference(self, chunk, author2doc, doc2author, rhot, collect_sstats=False, c
        Avoids computing the `phi` variational parameter directly using the
        optimization presented in **Lee, Seung: Algorithms for non-negative matrix factorization, NIPS 2001**.

+        Parameters
+        ----------
+        chunk: int


chunk: int -> chunk : int (here and everywhere)

menshikh-iv · 2018-02-21T07:59:47Z

gensim/models/atmodel.py

+        ----------
+        chunk: int
+            The chunk numer of the sparse document vector on which inference needs to be done.
+        author2doc: dict


dict of ???

Is it a dict of str and str?

dict of (type, type), check by tests, what's types should be here

menshikh-iv · 2018-02-21T08:00:28Z

gensim/models/atmodel.py

@@ -332,6 +323,11 @@ def extend_corpus(self, corpus):
        are added in the process. If serialization is not used, the corpus, as a list
        of documents, is simply extended.

+        Parameters
+        ----------
+        corpus: list of list of str


Are you sure?

menshikh-iv · 2018-02-21T08:01:14Z

gensim/models/atmodel.py

@@ -68,6 +68,18 @@ class AuthorTopicState(LdaState):
    """


what's about related papers/links/etc? add one more example - almost always a good idea

menshikh-iv · 2018-02-22T00:24:55Z

@souravsingh ping me when you'll be ready with PR please

souravsingh · 2018-02-22T18:28:24Z

@menshikh-iv The patch is ready for review. I will fix the flake8 issues once I take care of the comments.

menshikh-iv · 2018-02-22T23:51:35Z

gensim/models/atmodel.py

@@ -58,7 +55,7 @@
 class AuthorTopicState(LdaState):
    """
    NOTE: distributed mode not available yet in the author-topic model. This AuthorTopicState
-    object is kept so that when the time comes to imlement it, it will be easier.
+    object is kept so that when the time comes to implement it, it will be easier.


Need to use

Notes -------- ....

Instead of NOTE: and similar things

menshikh-iv · 2018-02-22T23:52:46Z

gensim/models/atmodel.py


        Do not call this method directly, instead use `model[author_names]`.

+        Parameters
+        ----------
+        author_names : str


incorrect type

menshikh-iv · 2018-02-22T23:53:12Z

gensim/models/atmodel.py

@@ -950,10 +1085,22 @@ def __getitem__(self, author_names, eps=None):
        Return topic distribution for input author as a list of
        (topic_id, topic_probabiity) 2-tuples.

-        Ingores topics with probaility less than `eps`.
+        Ignores topics with probaility less than `eps`.

        Do not call this method directly, instead use `model[author_names]`.


This should be wrapped in Warnings section

menshikh-iv · 2018-02-22T23:53:44Z

gensim/models/atmodel.py

+
+        Example
+        -------
+        >>> author_vecs = [model.get_author_topics(author) for author in model.id2author.values()]


This example doesn't work, you should define all variable & make all imports first.

Simple test for example: I copy-paste it to REPL, run and this works successfully

menshikh-iv · 2018-02-22T23:54:16Z

gensim/models/atmodel.py

+
+        Returns
+        -------
+        list of 2-tuples


should be list of (some_type_1, some_type_2), here and everywhere

menshikh-iv · 2018-02-22T23:55:32Z

gensim/models/atmodel.py

+        ----------
+        chunk : int
+            The chunk numer of the sparse document vector on which inference needs to be done.
+        author2doc : dict of {str: list of ints}


use () instead of {} in type definition, here and everywhere

menshikh-iv · 2018-02-22T23:56:31Z

gensim/models/atmodel.py

-        For other parameter settings, see :class:`AuthorTopicModel` constructor.
+        Parameters
+        ----------
+        corpus : iterable of iterable of (int, int)


Better to use iterable of list of (int, number), here and everywhere

menshikh-iv · 2018-02-22T23:57:17Z

gensim/models/atmodel.py

+        Parameters
+        ----------
+        expElogthetad: numpy.ndarray
+            Value of variational distribution q(theta|gamma).


Better use latex for this stuff (this will be rendered in documentation)

What would be the representation for this in LaTex?

@souravsingh look into original paper + :math: in sphinx

Would it be like this- q(\theta|\gamma)

@souravsingh maybe, I'm not sure, check original paper.

menshikh-iv · 2018-02-22T23:57:58Z

gensim/models/atmodel.py

        >>> model = AuthorTopicModel(corpus, num_topics=100, author2doc=author2doc, id2word=id2word)  # train model
-        >>> model.update(corpus2)  # update the author-topic model with additional documents
+        >>> model.update(corpus, author2doc)  # update the author-topic model with additional documents


Example must work, corpus, author2doc, id2word undefined here.

Also, "update with additional", but you use exactly same corpus here, this confused

menshikh-iv · 2018-02-28T07:14:24Z

@souravsingh please ping me when you'll be ready

menshikh-iv · 2018-03-08T07:37:40Z

@souravsingh how is going? when you plan to finish PR?

souravsingh · 2018-03-08T18:37:23Z

@menshikh-iv Ready for review

menshikh-iv · 2018-03-09T05:30:57Z

gensim/models/atmodel.py

@@ -19,15 +19,12 @@
 The model is closely related to Latent Dirichlet Allocation. The AuthorTopicModel class
 inherits the LdaModel class, and its usage is thus similar.

-Distributed computation and multiprocessing is not implemented at the moment, but may be
-coming in the future.
+The model was introduced by Rosen-Zvi and co-authors in 2004 and is described in [1]_


Please don't use [1] references (this doesn't works properly with one of our sphinx plugin), intead of, you should use something like, here and everywhere

The model was introduced by Rosen-Zvi and co-authors in 2004 and is described in `The Author-Topic Model for Authors and Documents <https://arxiv.org/abs/1207.4169>`_

Also, it's a good idea to add example of usage in current docstring (head of file)

menshikh-iv · 2018-03-09T05:34:34Z

gensim/models/atmodel.py

+        ----------
+        eta: float
+            Dirichlet topic parameter for sparsity.
+        lambda_shape: int


Incorrect types (this is (int, int), 2d arrays)

menshikh-iv · 2018-03-09T05:35:31Z

gensim/models/atmodel.py

+
+    Parameters
+    ----------
+    corpus: list of list of str


iterable of list of str

menshikh-iv · 2018-03-09T05:35:56Z

gensim/models/atmodel.py

+    ----------
+    corpus: list of list of str
+        Corpus of documents.
+    author2doc: dict of (str: list of int)


Should looks like dict of (str, list of int), here and everywhere (for all dict of ..)

menshikh-iv · 2018-03-09T05:37:25Z

gensim/models/atmodel.py

-
-    Model persistency is achieved through its `load`/`save` methods.
-    """
+    """The constructor estimates the author-topic model parameters based on a training corpus."""


Better to move example here (instead of __init__)

menshikh-iv · 2018-03-09T05:38:00Z

gensim/models/atmodel.py

+        ... }
+        >>> corpus = mmcorpus.MmCorpus(datapath('testcorpus.mm'))
+        >>> model = AuthorTopicModel(corpus, author2doc=author2doc, id2word=dictionary, num_topics=4, passes=100)  # train model
+        >>> model.update(corpus, author2doc)  # update the author-topic model with additional documents


Ok, but how to apply model? This is main usecase.

menshikh-iv · 2018-03-09T05:38:28Z

gensim/models/atmodel.py

@@ -306,6 +330,7 @@ def __init__(self, corpus=None, num_topics=100, id2word=None, author2doc=None, d
            self.update(corpus, author2doc, doc2author, chunks_as_numpy=use_numpy)

    def __str__(self):
+        """Return a string representation of AuthorTopicModel class."""


Ok, where is Returns section?

menshikh-iv · 2018-03-09T05:40:21Z

gensim/models/atmodel.py

+        Parameters
+        ----------
+        expElogthetad: numpy.ndarray
+            Value of variational distribution .. math:: q(theta|gamma).


this is incorrect using of math (don't rendered)

menshikh-iv · 2018-03-09T05:41:03Z

gensim/models/atmodel.py

+
+        Returns
+        -------
+        list of (topic_id, topic_probability) as a 2-tuple


list of (int, float)

menshikh-iv · 2018-03-13T13:27:33Z

ping @souravsingh, how is going?

menshikh-iv · 2018-03-14T04:27:34Z

gensim/models/atmodel.py

+>>> model = AuthorTopicModel(corpus, author2doc=author2doc, id2word=dictionary, num_topics=4, passes=100)  # train model
+>>> model.update(corpus, author2doc)  # update the author-topic model with additional documents
+>>> author_vecs = [model.get_author_topics(author) for author in model.id2author.values()]
+>>> print(author_vecs) #Prints top authors


this isn't top authors, this is all authors.

menshikh-iv · 2018-03-14T04:28:17Z

gensim/models/atmodel.py

+        ----------
+        chunk : int
+            The chunk numer of the sparse document vector on which inference needs to be done.
+        author2doc : dict of (strm list of ints)


ints -> int
strm -> str,

here and everywhere

menshikh-iv · 2018-03-14T04:28:45Z

gensim/models/atmodel.py

+
+        Returns
+        -------
+        list of (int, float) as a 2-tuple


list of (int, float)

menshikh-iv · 2018-03-14T04:29:31Z

gensim/models/atmodel.py

@@ -19,14 +19,31 @@
 The model is closely related to Latent Dirichlet Allocation. The AuthorTopicModel class


this should be references to classes like

:class:`~gensim.models...`

menshikh-iv · 2018-03-14T04:37:19Z

gensim/models/atmodel.py

-
-A tutorial can be found at
-https://github.com/RaRe-Technologies/gensim/tree/develop/docs/notebooks/atmodel_tutorial.ipynb.
+The model was introduced by Rosen-Zvi and co-authors in 2004 and is described in 


Also, it's a good idea to add basic description here

what's an use cases for this model

how this works (in general)

menshikh-iv · 2018-04-03T06:03:33Z

Thank you @souravsingh

piskvorky

Requesting changes.

piskvorky · 2018-04-03T07:48:13Z

gensim/models/atmodel.py

+        chunksize : int, optional
+            Controls the size of the mini-batches.
+        passes : int, optional
+            Number of times the model makes a pass over the entire training data.


training data => training corpus (consistency helps with clarity)

piskvorky · 2018-04-03T07:50:39Z

gensim/models/atmodel.py

+            Threshold value of gamma(topic difference between consecutive two topics)
+            until which the iterations continue.
+        serialized : bool, optional
+            Indicates whether the input corpora to the model are simple lists


What are "input corpora"? Where do they come from?

What does "or saved to the hard-drive" mean? If false, "input corpora" are saved to a hard-drive? Why are we saving input?

The documentation of these two parameters (serialized and serialization_path) is confusing. Who should use this and why?

This is important because the docstring example at the top actually uses these parameters (that's how I found them, I wanted to know what it is).

piskvorky · 2018-04-03T07:51:02Z

gensim/models/atmodel.py

@@ -306,14 +309,21 @@ def __init__(self, corpus=None, num_topics=100, id2word=None, author2doc=None, d
            self.update(corpus, author2doc, doc2author, chunks_as_numpy=use_numpy)

    def __str__(self):
+        """Get a string representation of object.


Remove this docstring completely (no added value, just adds fluff).

piskvorky · 2018-04-03T07:53:37Z

gensim/models/atmodel.py

+
+        Raises
+        ------
+        AssertionError


AssertionError is for programmer mistakes. Bad user input is either ValueError or TypeError.

Question for the author of this code, not for me.

Add docustrings for Author-topic model

7f699bf

The PR aims to fix the docstrings for Author-topic model.

souravsingh changed the title ~~Add docustrings for Author-topic model~~ Add docstrings for Author-topic model Feb 14, 2018

menshikh-iv suggested changes Feb 15, 2018

View reviewed changes

Make PEP8 fixes

fc82f9a

Add docstrings for rest of the functions.

5e81b7e

Make updates to the docstrings

5017968

menshikh-iv suggested changes Feb 21, 2018

View reviewed changes

Make fixes to docstrings

d4d13fd

menshikh-iv suggested changes Feb 22, 2018

View reviewed changes

souravsingh added 2 commits February 27, 2018 20:49

Update atmodel.py

8e4459e

Make update to example

98087f0

Make final edits

0dbf168

menshikh-iv suggested changes Mar 9, 2018

View reviewed changes

menshikh-iv changed the title ~~Add docstrings for Author-topic model~~ Fix docstrings for gensim.models.AuthorTopicModel Mar 9, 2018

Update atmodel.py

c7e4db8

menshikh-iv suggested changes Mar 14, 2018

View reviewed changes

souravsingh added 2 commits March 23, 2018 00:19

Update atmodel.py

0eb9184

Merge branch 'develop' into add-doc

6262647

menshikh-iv added 5 commits April 2, 2018 20:06

fix PEP8

2626d57

fix atmodel[1]

e8e91b7

fix atmodel[2]

6ed5d10

fix atmodel[3]

db7484d

fix atmodel[4]

82a9e40

menshikh-iv merged commit c396ad9 into piskvorky:develop Apr 3, 2018

piskvorky reviewed Apr 3, 2018

View reviewed changes

		@@ -19,14 +19,31 @@
		The model is closely related to Latent Dirichlet Allocation. The AuthorTopicModel class

Fix docstrings for gensim.models.AuthorTopicModel #1907

Fix docstrings for gensim.models.AuthorTopicModel #1907

Conversation

souravsingh commented Feb 14, 2018

menshikh-iv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

menshikh-iv commented Feb 20, 2018

souravsingh commented Feb 20, 2018

menshikh-iv commented Feb 20, 2018 • edited Loading

souravsingh commented Feb 20, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

menshikh-iv commented Feb 22, 2018

souravsingh commented Feb 22, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

menshikh-iv commented Feb 28, 2018

menshikh-iv commented Mar 8, 2018

souravsingh commented Mar 8, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

menshikh-iv commented Mar 13, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

menshikh-iv commented Apr 3, 2018

piskvorky left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

piskvorky Apr 3, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fix docstrings for `gensim.models.AuthorTopicModel` #1907

Fix docstrings for `gensim.models.AuthorTopicModel` #1907

menshikh-iv commented Feb 20, 2018 •

edited

Loading

piskvorky Apr 3, 2018 •

edited

Loading