Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Traceback on parsing email without info which email caused it. #132

Open
mwdg opened this issue Nov 25, 2020 · 4 comments
Open

Traceback on parsing email without info which email caused it. #132

mwdg opened this issue Nov 25, 2020 · 4 comments
Labels
🐛 bug Something isn't working, or a fix is proposed

Comments

@mwdg
Copy link

mwdg commented Nov 25, 2020

I'm deduping a large maildir.

Command line:

mdedup -n -S 0 -C 0 -a move-selected -e maildir -E ~/mail/Maildir/.GMAllMailNonDupe -s select-oldest  ~/mail/Maildir/`

Output ends with:

Check mail differences are below the thresholds.
Select all mails sharing the oldest 1299180509 timestamp...
warning: Skip set: all 5 mails within were selected. The strategy criterion was not able to discard some.
◼ 7 mails sharing hash faad932b06e1bed2520658105bcae3d483acf05fd605db23f365adc6
Check mail differences are below the thresholds.
Traceback (most recent call last):
  File "/usr/local/bin/mdedup", line 8, in <module>
    sys.exit(mdedup())
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/mail_deduplicate/cli.py", line 385, in mdedup
    dedup.select_all()
  File "/usr/local/lib/python3.9/site-packages/mail_deduplicate/deduplicate.py", line 417, in select_all
    candidates = duplicates.select_candidates()
  File "/usr/local/lib/python3.9/site-packages/mail_deduplicate/deduplicate.py", line 277, in select_candidates
    selected = apply_strategy(self.conf.strategy, self)
  File "/usr/local/lib/python3.9/site-packages/mail_deduplicate/strategy.py", line 262, in apply_strategy
    return set(method(duplicates))
  File "/usr/local/lib/python3.9/site-packages/mail_deduplicate/strategy.py", line 49, in select_oldest
    f"Select all mails sharing the oldest {duplicates.oldest_timestamp} "
  File "/usr/local/lib/python3.9/site-packages/boltons/cacheutils.py", line 610, in __get__
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/usr/local/lib/python3.9/site-packages/mail_deduplicate/deduplicate.py", line 156, in oldest_timestamp
    return min(map(attrgetter("timestamp"), self.pool))
  File "/usr/local/lib/python3.9/site-packages/boltons/cacheutils.py", line 610, in __get__
    value = obj.__dict__[self.func.__name__] = self.func(obj)
  File "/usr/local/lib/python3.9/site-packages/mail_deduplicate/mail.py", line 118, in timestamp
    value = email.utils.mktime_tz(email.utils.parsedate_tz(value))
  File "/usr/local/Cellar/python@3.9/3.9.0_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/email/_parseaddr.py", line 185, in mktime_tz
    if data[9] is None:
TypeError: 'NoneType' object is not subscriptable

It looks to me like there's a defective email, but how am I supposed to figure out which one?

@kdeldycke kdeldycke added the bug label Dec 7, 2020
@leggewie
Copy link
Contributor

leggewie commented Jan 23, 2021

Hello and thank you for your report. Do you not get error messages in Phase 1 like I do in #155? What is the output of mdedup --version?

The -H switch looks like something you might want to try to only compute the hashes, then grep for the hash faad932b06e1bed2520658105bcae3d483acf05fd605db23f365adc6 you were given to find out which files are affected.

@mwdg
Copy link
Author

mwdg commented Jan 25, 2021

I tried adding the -H switch to the same command line (hope that's valid):

mdedup -n -H -S 0 -C 0 -a move-selected -e maildir -E ~/mail/Maildir/.GMAllMailNonDupe -s select-oldest ~/mail/Maildir/

It crashed with this:

`...
15 mails found.
76 mails found.
3 mails found.
0 mails found.
0 mails found.
Mail sources [####################################] 1/1

● Phase #1 - Compute hashes and group duplicates
Use [date, from, to, subject, mime-version, content-type, content-disposition, user-agent, x-priority, message-id] headers to compute hashes.
Hashed mails [####################################] 166405/166405
Traceback (most recent call last):
File "/usr/local/bin/mdedup", line 8, in
sys.exit(mdedup())
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 829, in call
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/click/decorators.py", line 21, in new_func
return f(get_current_context(), *args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/mail_deduplicate/cli.py", line 380, in mdedup
click.echo(mail.pretty_headers)
AttributeError: 'MaildirDedupMail' object has no attribute 'pretty_headers'`

The output of mdedup --version is:

mdedup 6.0.2 {'username': '-', 'guid': 'd6c8125a96344232225e5b1af003146', 'hostname': '-', 'hostfqdn': '-', 'uname': {'system': 'Darwin', 'node': '-', 'release': '19.6.0', 'version': 'Darwin Kernel Version 19.6.0: Tue Nov 10 00:10:30 PST 2020; root:xnu-6153.141.10~1/RELEASE_X86_64', 'machine': 'x86_64', 'processor': 'i386'}, 'linux_dist_name': '', 'linux_dist_version': '', 'cpu_count': 4, 'fs_encoding': 'utf-8', 'ulimit_soft': 256, 'ulimit_hard': 9223372036854775807, 'cwd': '-', 'umask': '0o2', 'python': {'argv': '-', 'bin': '-', 'version': '3.9.1 (default, Jan 8 2021, 17:17:17) [Clang 12.0.0 (clang-1200.0.32.28)]', 'compiler': 'Clang 12.0.0 (clang-1200.0.32.28)', 'build_date': 'Jan 8 2021 17:17:17', 'version_info': [3, 9, 1, 'final', 0], 'features': {'openssl': 'OpenSSL 1.1.1i 8 Dec 2020', 'expat': 'expat_2.2.8', 'sqlite': '3.34.0', 'tkinter': '8.6', 'zlib': '1.2.11', 'unicode_wide': True, 'readline': True, '64bit': True, 'ipv6': True, 'threading': True, 'urandom': True}}, 'time_utc': '2021-01-25 02:44:22.370920', 'time_utc_offset': -8.0, '_eco_version': '1.0.1'}

@mwdg
Copy link
Author

mwdg commented Jan 25, 2021

For the original problem, I'm not getting any errors or warnings in Phase 1:

● Phase #1 - Compute hashes and group duplicates Use [date, from, to, subject, mime-version, content-type, content-disposition, user-agent, x-priority, message-id] headers to compute hashes. Hashed mails [####################################] 166405/166405

@leggewie
Copy link
Contributor

leggewie commented Nov 7, 2021

Hello and sorry for the delayed response. Would you mind to update your installation to the latest release (for example via pip3 install -U mail-deduplicate) and let us know if you can still reproduce the issue there?

@kdeldycke kdeldycke added 🐛 bug Something isn't working, or a fix is proposed and removed bug labels Nov 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working, or a fix is proposed
Projects
None yet
Development

No branches or pull requests

3 participants