'NoneType' object is not iterable when using summarize #1531

nabergh · 2017-08-14T13:13:46Z

When using gensim.summarization.summarize, TypeError: 'NoneType' object is not iterable is occasionally thrown when summarizing random complaints in this dataset of consumer complaints. I'm not sure what it is about the strings that causes this error to be thrown but I feel the error should not be thrown regardless. Here is a self-contained example:

from gensim.summarization import summarize
text = "To whom it may concern, Only after weeks of using my new Wal-Mart master " \
    "card was I given the option to opt out of personal information sharing including :" \
    " Social Security #, income, account balance, payment history, credit history and credit" \
    " scores. According to the supervising operator at the wal-mart credit card call center my" \
    " personal information had already been shared. Furthermore, the letter informing me of my " \
    " rights was apart from my billing statement. Seemingly hidden between some flyers in the same" \
    " envelope. I almost threw it in the recycle bin without seeing it. If this is not illegal it " \
    "certainly appears to be an attempt to deceive me and disseminate my private information without " \
    "my permission. Thank you for your time and effort. Sincerely, XXXX XXXX"
summary = summarize(text)

I pasted the stack trace below but I'm sure you'll see it when running the above example.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-15020c19bb47> in <module>()
      1 from gensim.summarization import summarize
      2 text = "To whom it may concern, Only after weeks of using my new Wal-Mart master "     "card was I given the option to opt out of personal information sharing including :"     " Social Security #, income, account balance, payment history, credit history and credit"     " scores. According to the supervising operator at the wal-mart credit card call center my"     " personal information had already been shared. Furthermore, the letter informing me of my "     " rights was apart from my billing statement. Seemingly hidden between some flyers in the same"     " envelope. I almost threw it in the recycle bin without seeing it. If this is not illegal it "     "certainly appears to be an attempt to deceive me and disseminate my private information without "    "my permission. Thank you for your time and effort. Sincerely, XXXX XXXX"
----> 3 summary = summarize(text)

/usr/local/lib/python3.6/site-packages/gensim/summarization/summarizer.py in summarize(text, ratio, word_count, split)
    213 
    214     # Extracts the most important sentences with the selected criterion.
--> 215     extracted_sentences = _extract_important_sentences(sentences, corpus, most_important_docs, word_count)
    216 
    217     # Sorts the extracted sentences by apparition order in the original text.

/usr/local/lib/python3.6/site-packages/gensim/summarization/summarizer.py in _extract_important_sentences(sentences, corpus, important_docs, word_count)
    112 
    113 def _extract_important_sentences(sentences, corpus, important_docs, word_count):
--> 114     important_sentences = _get_important_sentences(sentences, corpus, important_docs)
    115 
    116     # If no "word_count" option is provided, the number of sentences is

/usr/local/lib/python3.6/site-packages/gensim/summarization/summarizer.py in _get_important_sentences(sentences, corpus, important_docs)
     87     hashable_corpus = _build_hasheable_corpus(corpus)
     88     sentences_by_corpus = dict(zip(hashable_corpus, sentences))
---> 89     return [sentences_by_corpus[tuple(important_doc)] for important_doc in important_docs]
     90 
     91 

TypeError: 'NoneType' object is not iterable

I'm assuming this is a bug but there's no example in the gensim ~~documentation~~ api page so I'm not entirely sure if I'm using the function correctly.

The text was updated successfully, but these errors were encountered:

piskvorky · 2017-08-15T12:41:36Z

Thanks for reporting.

@menshikh-iv let's link to the examples from the documentation (I think there was a blog post or notebook for summarization, at the very least).

menshikh-iv · 2017-08-17T11:16:59Z

@nabergh Docs: ipython tutorial and blogpost.

But you use it correctly, looks like a bug, @olavurmortensen please look at this problem.

olavurmortensen · 2017-08-17T13:28:52Z

@menshikh-iv I wrote that tutorial on summarize, but haven't had anything to do with the code. Can't help you here.

menshikh-iv · 2017-08-17T14:11:55Z

Ok, ping @fedelopez77 @fbarrios

menshikh-iv · 2017-08-18T08:11:38Z

John Mercer sent me an email

hey there :) this happens when there is period without space after it, e.g. text = "To whom it may concern.weeks of using my new Wal-Mart master card. was I given the option to opt out of personal information sharing including."
summary = summarize(text)

and is fixed when adding space:

text = "To whom it may concern. weeks of using my new Wal-Mart master card. was I given the option to opt out of personal information sharing including."
summary = summarize(text)

@nabergh please check this

nabergh · 2017-08-21T14:23:31Z

I just checked this out and it seems like John is correct. If there is a period followed by no space in some cases the same error is thrown, which can be fixed by adding a space after the period. Although I do not agree with this design decision (I think a more informative error should be thrown at the very least), this case does not pertain to the example I posted above and does not solve this issue.

piskvorky · 2017-08-21T18:40:03Z

Looks like a bug to me, not a design decision.

Although probably related to the bug in this ticket, is my guess, and a good lead.

diegospd · 2017-09-04T17:55:04Z

I''m confirming this bug which became apparent with the following text which doesn't have a dot followed by no space.

Digital rights (and wrongs)
Attempting to grasp the many conflicts and proposed safeguards for intellectual
	property is extremely difficult. Legal, political, economic, and
	cultural issues-both domestic and international-loom large, almost
	dwarfing the daunting technological challenges. Solutions devised by
	courts and legislatures and regulatory agencies are always late out of
	the blocks and fall ever farther behind. Recently proposed legislation
	only illustrates the depth and complexity of the problem

In the meantime I'm using this function to workaround this bug

from gensim.summarization.textcleaner import clean_text_by_sentences
from gensim.summarization.summarizer import _build_corpus, summarize_corpus

def _summarize_if_possible(text):
    sentences = clean_text_by_sentences(text)
    if len(sentences) < 10:
        return text
    corpus = _build_corpus(sentences)
    most_important_docs = summarize_corpus(corpus, ratio=1)
    if most_important_docs is None:
        return text
    return summarize(text, word_count=600)

Here's another self contained example

from gensim.summarization import summarize

text = 'Digital rights (and wrongs)\n'
text += 'Attempting to grasp the many conflicts and proposed '
text += 'safeguards for intellectual\n\t'
text += 'property is extremely difficult. Legal, political, '
text += 'economic, and\n\tcultural issues-both domestic and '
text += 'international-loom large, almost\n\tdwarfing the '
text += 'daunting technological challenges. Solutions devised by\n'
text += '\tcourts and legislatures and regulatory agencies are '
text += 'always late out of\n\tthe blocks and fall ever farther '
text += 'behind. Recently proposed legislation\n\t'
text += 'only illustrates the depth and complexity of the problem\n'

assert len(text) == 514
summarize(text)

fbarrios · 2017-09-05T16:45:50Z

I'll look into it during the day. Hopefully I can propose a PR fixing this by today or tomorrow :)

fbarrios · 2017-09-05T19:50:36Z

The NoneType issue was due to this return: https://github.com/RaRe-Technologies/gensim/blob/develop/gensim/summarization/summarizer.py#L163

This all happened simply because the summarizer module doesn't have enough material to work with (a warning was in fact beign issued). I created #1570 to fix the NoneType error.

fbarrios · 2017-09-05T20:00:36Z

By the way @diegospd, the summarizer module is splitting your text in a way that you probably don't want. You should remove the newlines so TextRank can make a proper summary:

In [1]: from gensim.summarization.textcleaner import split_sentences
Slow version of gensim.models.doc2vec is being used

In [2]: text = 'Digital rights (and wrongs)\n'
   ...: text += 'Attempting to grasp the many conflicts and proposed '
   ...: text += 'safeguards for intellectual\n\t'
   ...: text += 'property is extremely difficult. Legal, political, '
   ...: text += 'economic, and\n\tcultural issues-both domestic and '
   ...: text += 'international-loom large, almost\n\tdwarfing the '
   ...: text += 'daunting technological challenges. Solutions devised by\n'
   ...: text += '\tcourts and legislatures and regulatory agencies are '
   ...: text += 'always late out of\n\tthe blocks and fall ever farther '
   ...: text += 'behind. Recently proposed legislation\n\t'
   ...: text += 'only illustrates the depth and complexity of the problem\n'
   ...: 

In [3]: sentences = split_sentences(text)

In [4]: len(sentences)
Out[4]: 11

In [5]: sentences
Out[5]: 
['Digital rights (and wrongs)',
 'Attempting to grasp the many conflicts and proposed safeguards for intellectual',
 'property is extremely difficult.',
 'Legal, political, economic, and',
 'cultural issues-both domestic and international-loom large, almost',
 'dwarfing the daunting technological challenges.',
 'Solutions devised by',
 'courts and legislatures and regulatory agencies are always late out of',
 'the blocks and fall ever farther behind.',
 'Recently proposed legislation',
 'only illustrates the depth and complexity of the problem']

In [6]: text = text.replace("\n", "")

In [7]: sentences = split_sentences(text)

In [8]: sentences
Out[8]: 
['Digital rights (and wrongs)Attempting to grasp the many conflicts and proposed safeguards for intellectual\tproperty is extremely difficult.',
 'Legal, political, economic, and\tcultural issues-both domestic and international-loom large, almost\tdwarfing the daunting technological challenges.',
 'Solutions devised by\tcourts and legislatures and regulatory agencies are always late out of\tthe blocks and fall ever farther behind.',
 'Recently proposed legislation\tonly illustrates the depth and complexity of the problem']

In [9]: len(sentences)
Out[9]: 4

diegospd · 2017-09-06T17:28:37Z

Thanks @fbarrios! I was having a hard time figuring out why my texts had so many sentences. I think there should be warnings in the documentation about not being able to handle texts with less than 10 sentences and that newlines in the middle of sentences will confuse the summarizer.

I'm cleaning my texts with this in order to replace \n,\r,\t with a space, and then reducing long sequences of spaces to a single one. Maybe you could add this preprocessing before calling clean_text_by_sentences

import re
text = re.sub(r'\n|\r|\t', ' ', text)
text = re.sub(r'\s+', ' ', text)

fbarrios · 2017-09-06T18:24:13Z

At the moment the logger issues a warning in those cases.
Can you please create a new issue for including the preprocessing?

diegospd · 2017-09-07T23:17:16Z

I created issue #1575

…red a warning will be logged and the original text will be returned as if it had been succesfully summarized

nabergh · 2017-11-21T15:14:19Z

For those reading this thread, the result of running the original code in the newest release of gensim (3.1.0) is for gensim.summarization.summarize to return an empty string. I'm not sure if this is intentional but if it was, I don't even know what the point of it is. It is essentially the same as one putting the original code in a try/catch block and doesn't summarize the text or elucidate why the text can't be summarized.

fbarrios · 2017-11-21T15:29:13Z

@nabergh The rationale behind that decision was to be consistent with the return types.
The NoneType error happened because there weren't enough sentences to process. Also now a logging warning is issued, so what do you think the expected behavior should be?

piskvorky · 2017-11-21T15:39:25Z

Maybe return the entire original text? Or raise a ValueError explaining the reason?

I'm also not sure what the expected behaviour is. What is your suggestion @nabergh ?

piskvorky · 2017-11-21T15:42:03Z

Of course, ideally we'd want to summarize even the shorter text passages. @fbarrios how central is the length limitation to the algorithm, how was it chosen? What happens if it is relaxed?

menshikh-iv · 2017-11-22T08:34:31Z

Ping @fedelopez77, can your answer for #1531 (comment) please? What do you think about behavior (how it should look like)?

nabergh · 2017-11-22T15:17:29Z

Hmm I don't see the warning when running the code I posted in the interactive shell or in a Jupyter notebook. Do I have to enable warnings somehow?

@fbarrios I like both of @piskvorky's suggestions for expected behavior. The original text is a better summary than an empty string so I (and I assume most users) would expect that first. If this only happens for very short texts, the original text is probably short enough to be a summary anyway.

The documentation for the summarizer says "The input must be longer than INPUT_MIN_LENGTH". How long is INPUT_MIN_LENGTH and is this the length you are currently using to decide whether to return a summary or not?

Again, thanks for working on this.

piskvorky assigned menshikh-iv Aug 15, 2017

menshikh-iv added the bug Issue described a bug label Aug 17, 2017

diegospd pushed a commit to diegospd/gensim that referenced this issue Sep 8, 2017

Temporary workaround for bug piskvorky#1531. When the bug is encounte…

53bab04

…red a warning will be logged and the original text will be returned as if it had been succesfully summarized

menshikh-iv closed this as completed in 02ba343 Sep 18, 2017

zsef123 mentioned this issue Sep 24, 2017

Error when most_important_docs in summarizer.py is None #1597

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

'NoneType' object is not iterable when using summarize #1531

'NoneType' object is not iterable when using summarize #1531

nabergh commented Aug 14, 2017 •

edited

Loading

piskvorky commented Aug 15, 2017

menshikh-iv commented Aug 17, 2017

olavurmortensen commented Aug 17, 2017

menshikh-iv commented Aug 17, 2017

menshikh-iv commented Aug 18, 2017

nabergh commented Aug 21, 2017 •

edited

Loading

piskvorky commented Aug 21, 2017

diegospd commented Sep 4, 2017 •

edited

Loading

fbarrios commented Sep 5, 2017

fbarrios commented Sep 5, 2017

fbarrios commented Sep 5, 2017

diegospd commented Sep 6, 2017 •

edited

Loading

fbarrios commented Sep 6, 2017 •

edited

Loading

diegospd commented Sep 7, 2017

nabergh commented Nov 21, 2017 •

edited

Loading

fbarrios commented Nov 21, 2017

piskvorky commented Nov 21, 2017 •

edited

Loading

piskvorky commented Nov 21, 2017 •

edited

Loading

menshikh-iv commented Nov 22, 2017

nabergh commented Nov 22, 2017

'NoneType' object is not iterable when using summarize #1531

'NoneType' object is not iterable when using summarize #1531

Comments

nabergh commented Aug 14, 2017 • edited Loading

piskvorky commented Aug 15, 2017

menshikh-iv commented Aug 17, 2017

olavurmortensen commented Aug 17, 2017

menshikh-iv commented Aug 17, 2017

menshikh-iv commented Aug 18, 2017

nabergh commented Aug 21, 2017 • edited Loading

piskvorky commented Aug 21, 2017

diegospd commented Sep 4, 2017 • edited Loading

fbarrios commented Sep 5, 2017

fbarrios commented Sep 5, 2017

fbarrios commented Sep 5, 2017

diegospd commented Sep 6, 2017 • edited Loading

fbarrios commented Sep 6, 2017 • edited Loading

diegospd commented Sep 7, 2017

nabergh commented Nov 21, 2017 • edited Loading

fbarrios commented Nov 21, 2017

piskvorky commented Nov 21, 2017 • edited Loading

piskvorky commented Nov 21, 2017 • edited Loading

menshikh-iv commented Nov 22, 2017

nabergh commented Nov 22, 2017

nabergh commented Aug 14, 2017 •

edited

Loading

nabergh commented Aug 21, 2017 •

edited

Loading

diegospd commented Sep 4, 2017 •

edited

Loading

diegospd commented Sep 6, 2017 •

edited

Loading

fbarrios commented Sep 6, 2017 •

edited

Loading

nabergh commented Nov 21, 2017 •

edited

Loading

piskvorky commented Nov 21, 2017 •

edited

Loading

piskvorky commented Nov 21, 2017 •

edited

Loading