-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running example, results not same as yours #46
Comments
Hi! By default we use newlines to divide into sentences. Can you try removing the line breaks from the input text? |
What do you mean breaks? Like replacing dots with '\n' or replacing '\b' with '\n' ? |
Removing "\n" from the text altogether. |
But then wont things like this: A turn to 'AB' in some user made text? Would it be better to turn '\n' to '.' and remove any duplicate dots (... to .)? |
Doing above (replacing newlines with . and making sure only one consecutive dot is allowed), it breaks. Now summarize(text) and summarize text with ratio=0.2 are empty (print nothing) and summarize with words=10 prints "Document summarization is another." , keywords are still same. |
Seems it adds dots in places it shouldnt have because the input has newlines in some unnecessary places, but if i remove newlines it can result in words being one word when they shouldnt... |
Just removing newlines ("\n" to "") results in same output as turning "\n" to "." |
Except that the keywords now has this "technologyis search" which is obviously an artifact of removing newlines: An example of the use of summarization technology |
It gives a good result When i manually correct the text to remove un-needed newlines, but the summary with words=10 is empty now. Could it be it cant summarize short enough? |
Our summaries consist of the most relevant sentences in a given text. The task of splitting a text into sentences is not solved, so we make a best effort using this regex. That regex treat different lines (i.e.: a piece of text with The other behavior does seem like a bug. It could be that the summarizer misbehaves when the Thank you! |
With input of
text = """Automatic summarization is the process of reducing a text document with a
computer program in order to create a summary that retains the most important points
of the original document. As the problem of information overload has grown, and as
the quantity of data has increased, so has interest in automatic summarization.
Technologies that can make a coherent summary take into account variables such as
length, writing style and syntax. An example of the use of summarization technology
is search engines such as Google. Document summarization is another."""
print(summarize(text))
I get
Automatic summarization is the process of reducing a text document with a
Document summarization is another.
Which is much derpier and not same as the results you got (in the example):
Automatic summarization is the process of reducing a text document with a computer
program in order to create a summary that retains the most important points of the
original document.
The text was updated successfully, but these errors were encountered: