-
Notifications
You must be signed in to change notification settings - Fork 1
/
TODO
194 lines (153 loc) · 8.11 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
- After Module::Install support CPAN::Meta 2.0, specify gpl_2 as the license.
Suggested by Paul Howarth <paul@city-fan.org>
- After Module::Install support CPAN::Meta 2.0, add info for repository URL
and bug tracker. Suggested by Paul Howarth <paul@city-fan.org>
- Support for searches on email threads. (Thanks to Zack Brown
<zbrown@tumblerings.org> for the excellent feature idea and a prototype
implementation.)
- implement unimplemented code
- update anonymize_mailbox to anonymize more of the header, and handle
attachments.
- Add mbx format support (email request)
https://sourceforge.net/p/grepmail/feature-requests/10/
- Improve unique to avoid X-MimeOLE, X-MozillaStatus, and
Content-Type/boundary differences. (Maybe just hash on the received
headers?) Skye Otten <skizye@fastmail.fm>
- Interpret transfer encodings to search the actual content of emails (SF
102207)
- IMAP URLs (SF 894486)
- grepmailrc
- override system paths to decompressor programs
- set flags
- Decode different transfer encodings before checking the pattern
- Then support charset-independent patterns and mailboxes
https://sourceforge.net/p/grepmail/bugs/34/
- Newest 25 messages. (Beth Scaer <bscaer@yahoo.com>)
- grepmail -D --help doesn't display debug info.
- -D should print module versions
- Support ^M as a line terminator. Requested by Giovanni Bechis
(https://sourceforge.net/users/gbechis).
https://sourceforge.net/tracker/?func=detail&aid=1789911&group_id=2207&atid=352207
- Emit an X-UID header with an IMAP URL. Requested by Bart Schaefer
(https://sourceforge.net/users/barts).
https://sourceforge.net/tracker/index.php?func=detail&aid=894486&group_id=2207&atid=352207
- Michael D. Schleif <mds@helices.org> suggested grepmail have support for
compressed mail directories.
------------------------------------------------------------------------------
From Egmont Koblinger (https://sourceforge.net/users/egmont):
- grepmail should operate at a higher level of semantics, interpreting the
mbox format to extract the real text of the email for searching.
Details, including sample code is here:
https://sourceforge.net/tracker/?func=detail&aid=1058206&group_id=2207&atid=352207
Hi,
I'm new to grepmail and tried it to see whether it fits my needs.
Unfortunately it doesn't really. My problem is that grepmail performs the grep
on the low level mailbox files, this way it is not *much* more than a simple
grep, though definitely it is more.
Basically I mean there are two problems: neither transfer encoding nor
character set encodings are taken into accont.
Mailbox format is a very complicated format which converts between human
readable texts and byte sequences. The same human readable content can usually
be converted into mailbox format more ways: there are more character set
conversions (iso- 8859-1, iso-8859-2, koi8-r, utf-8...) and more transfer
encodings (8- bit, base64, quoted-printable...) to choose from.
I think that greping the low level mailbox doesn't make much sense, since I
really don't care what charset or transfer encoding a message is encoded in.
I'd like to grep in the _meaning_, the interpretation of each particular
message. I'd like to find all letters where the sender typed a certain
human-readable word.
If a word in message body is split over multiple lines, for example, it is
encoded in quoted-printable the mailbox file and there's an equal sign at the
end of a line such as this:
... this is an examp=
le to show what I'm talking about
then I see the "example" word in my mail client without being wrapped, but
`grepmail example' doesn't find it matching.
Similarly, `grepmail 8859' matches every letter encoded in some of the
iso-8859-* charsets, but hey, I don't care what encoding a message uses, I'm
interested in whether the real content of what my friend typed includes the
number 8859.
And greping for accented letters is nearly impossible as each letter uses
different character set encodings, but I don't want to find a particular byte
sequence, I want to find a particular sequence of human-visible characters, no
matter what encodings they use.
So this is what IMHO grepmail should use:
- convert the pattern it founds on the command line from my locale to utf-8,
- for each message found in the mailbox, separately, first decode it from its
transfer encoding (quoted-printable, base64...) and then iconv it from its
charset to utf-8, specially taking care of the complicated encodings used in
the header fields such as Subjec,
- and after all these it should perform the greping, and for matching messages
it should print their original (unconverted) raw mailbox version.
------------------------------------------------------------------------------
- Revise -E functionality. (Thanks to Vadim <vadim@genego.com> for the
feedback and suggestions)
Dear David,
in the README of your 'grepmail' I found that you are looking for
feedback on the '-E' option. So I decided do comment on it.
I like this feature and I want to use all the power of perl
here, so there is no need for alternatives. But I think something
can be improved. First of all the variable names '$email_header'
'$email_body' and '$email' are too long for command-line option. I
would prefer to use for example '$h', '$b' and '$a' instead.
I would also suggest to add an option (say -A) similar to perl's
'-M' to allow including any perl module, so the complex conditions
like
grepmail -E 'use Foo::Bar; ...'
may be simplified to
grepmail -AFoo::Bar -E '...'
Also I think that modification and compilation of user's pattern
for every e-mail (in function Do_Complex_Pattern_Matching) is not
an efficient way to allow complex matching. I would suggest to
compile user's pattern once into anonymous subroutine and call that
subroutine for each e-mail. Probably it's a good idea to use
Data::Alias module to avoid extra copying. Let me summarize in
pseudocode:
use Data::Alias;
our $h; # $a and $b are always present
#...
my $sub_ref;
if ( $opts{'E'} ) {
$sub_ref = eval "sub { $opts{'E'} }" or die "...";
}
# Before calling Do_Complex_Pattern_Matching
alias local $a = $email;
alias local $h = $email_header;
alias local $b = $email_body;
... = Do_Complex_Pattern_Matching(...);
# In Do_Complex_Pattern_Matching
if( $sub_ref ) {
$matchesEmail = $sub_ref->();
}
Thank you for 'grepmail' and I hope you will find something useful
in my feedback.
Vadim.
------------------------------------------------------------------------------
- Add UTF mime support. (Thanks to Peter Jakobi <jakobi@acm.org> for the
feature suggestion and patch.)
See email and grepmail.jakobi*
Hi,
given the recent increase in utf mime mails, I was a bit annoyed at grepmail's
lack of support.
And living in Germany, the few umlauts make it far more likely to
write/receive/archive mimed-email. From header =?...?= to real mimes.
And encountering old latin1 umlauts in source, filenames and stuff on an utf-8
system is nice. Nearly as nice as not having set LC_COLLATE and wondering why
[a-z]* suddenly likes to match uppercase on a new ubuntu box.
Anyway, annoyance breeds code, and code lacks testing. I'd love if you could
take a look at this, as I don't have enough interesting test cases on my own.
Worse, as it works for me more or less now, the annoyance level is too low to
properly maintain and bulletproof the patch.
The basic idea is: grep a mangled mail version on -z, but return the valid one
on output. Furthermore, I'm not that much interested in anything but German
umlauts and the most common European one, and I want to grep for words
containing them in Plain-Ascii (i.e. the eMail is mangled to Ue instead of
uppercase U diaeris), where latin1==ascii==utf-8, so this kind of allows
skipping the whole thorny $LANG issue during grep. As the mangled version
isn't for output, this allowed to add on a basic de-html and paragraph mode to
ease grepping. My few test cases were in simple mode, but as I'm outside the
grep functions, -E queries should also behave.
See attachments; patch is against ubuntu7.10beta's take on the ancient stable
version of grepmail.
what do you think?
------------------------------------------------------------------------------