-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Refactor documentation API Reference for gensim.summarization #1709
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great start! Thanks @yurkai, let's continue your work.
gensim/summarization/textcleaner.py
Outdated
"""Text Cleaner | ||
|
||
This module contains functions and processors used for processing text, | ||
extracting sentences from text, working with acronyms and abbreviations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to add examples/highlights/motivation here (after you finish with docstrings in this file).
gensim/summarization/textcleaner.py
Outdated
RE_SENTENCE = re.compile(r'(\S.+?[.!?])(?=\s+|$)|(\S.+?)(?=[\n]|$)', re.UNICODE) | ||
"""str: special separator used in abbreviations.""" | ||
RE_SENTENCE = re.compile(r'(\S.+?[.!?])(?=\s+|$)|(\S.+?)(?=[\n]|$)', re.UNICODE) # backup (\S.+?[.!?])(?=\s+|$)|(\S.+?)(?=[\n]|$) | ||
"""SRE_Pattern: pattern to split text to sentences.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Problem with building here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurkai good example, how to document it: https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/parsing/preprocessing.py#L21
gensim/summarization/textcleaner.py
Outdated
|
||
|
||
def split_sentences(text): | ||
"""Splits and returns list of sentences from given text. It preserves |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Examples
section should be nice (here and everywhere).
gensim/summarization/textcleaner.py
Outdated
|
||
Returns | ||
------- | ||
str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't match with return type
gensim/summarization/textcleaner.py
Outdated
Input text. | ||
separator : str | ||
The separator between words to be replaced. | ||
regexs : str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't match
gensim/summarization/textcleaner.py
Outdated
---------- | ||
words : list | ||
List of words. | ||
separator : str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
str
-> str, optional
gensim/summarization/textcleaner.py
Outdated
words : list | ||
List of words. | ||
separator : str | ||
The separator bertween elements. Blank set as default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blank? I see " "
, not ""
gensim/summarization/textcleaner.py
Outdated
---------- | ||
text : list | ||
Input text. | ||
deacc : bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bool, option
- here and everywhere
gensim/summarization/textcleaner.py
Outdated
|
||
Parameters | ||
---------- | ||
text : list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type doesn't match
gensim/summarization/textcleaner.py
Outdated
|
||
Parameters | ||
---------- | ||
text : list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type doesn't match
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review for the small part (I'll review fully later)
gensim/summarization/commons.py
Outdated
|
||
Parameters | ||
---------- | ||
sequence : list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
list of ?
gensim/summarization/commons.py
Outdated
Returns | ||
------- | ||
Graph | ||
Created graph. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graph produced by sequence
gensim/summarization/commons.py
Outdated
|
||
Returns | ||
------- | ||
Graph |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concrete link to type like
:class:`~gensim. ... ...
here and everywhere (for "gensim-defined" types).
gensim/summarization/commons.py
Outdated
from gensim.summarization.graph import Graph | ||
|
||
|
||
def build_graph(sequence): | ||
"""Creates and returns graph with given sequence of values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's "type" of graph (oriented, etc)?
gensim/summarization/commons.py
Outdated
@@ -15,6 +51,15 @@ def build_graph(sequence): | |||
|
|||
|
|||
def remove_unreachable_nodes(graph): | |||
"""Removes unreachable nodes (nodes with no edges). Works inplace. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
. Works inplace.
-> , inplace.
gensim/summarization/graph.py
Outdated
@param node: Node identifier | ||
Parameters | ||
---------- | ||
node : str or float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
float?
gensim/summarization/graph.py
Outdated
@attention: While nodes can be of any type, it's strongly recommended | ||
"""Adds given node to the graph. | ||
|
||
Note |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note
-> Warning
gensim/summarization/keywords.py
Outdated
---------- | ||
text : str | ||
Sequence of values. | ||
ratio : float |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is optional
too
gensim/summarization/keywords.py
Outdated
If no "words" option is selected, the number of sentences is | ||
reduced by the provided ratio, else, the ratio is ignored. | ||
words : list | ||
. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to add descriptions to parameters
gensim/summarization/bm25.py
Outdated
@@ -3,20 +3,72 @@ | |||
# | |||
# Licensed under the GNU LGPL v2.1 - http://www.gnu.org/licenses/lgpl.html | |||
|
|||
"""This module contains function of computing BM25 scores for documents in | |||
corpus and helper class `BM25` used in calculations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed link to BM25 algorithm (wiki for example)
Nice work @yurkai 👍, pay attention to my commits and continue your work 🥇 |
Fix #1668.