Removes ">From" and "From " lines from mail headers.
Thunderbird adds invalid mail headers to it's local folders. Cyrus IMAP is strict about them. This script walks through all files in the given directories and removes any line that starts with ">From" or "From " (note the space and no colon).
Requires Python 2.5+.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 | #!/usr/bin/env python
"""Removes ">From" and "From " lines from mail headers.
Thunderbird adds invalid mail headers. Cyrus IMAP is strict about them. This
script walks through all files in the given directories and removes any line
that starts with ">From" or "From " (no colon).
Real "From:" lines must have the colon.
"""
from __future__ import with_statement
from sys import argv, exit, stderr
from os import listdir
from os.path import abspath, normpath, join, isfile, basename
from tempfile import mkdtemp
from contextlib import contextmanager
from shutil import rmtree, move
class FixMailHeadersError(Exception):
"""Base class for all exceptions in this module.
This is different than the standard Exception class, in that sub-classes
should define a _default_message attribute with the default message to
present to the user.
If not message is given at instantiation time, then the default message is
used automatically.
Also, note that only one argument (the message) can be given when
instantiating these exceptions.
"""
_default_message = u'Kaboom!' # override me!
def __init__(self, message=None):
"""Set the given message or the default message is one is not given."""
super(FixMailHeadersError, self).__init__(message if message is not
None else self._default_message)
class InsufficientDirectories(FixMailHeadersError):
"""At least one directory must be given to clean_headers for it to work."""
_default_message = u'You must specify at least one mail directory to scan.'
def get_file_paths(*dir_paths):
"""Yields all non-hidden, non-backup files in each given directory."""
for dir_path in dir_paths:
dir_path = abspath(normpath(dir_path))
for file_name in listdir(unicode(dir_path)):
file_path = join(dir_path, file_name)
if (isfile(file_path) and not file_name.startswith(u'.') and not
file_name.endswith(u'~')):
yield file_path
@contextmanager
def make_temp_dir():
"""Context manager that creates and removes a temporary directory.
All contents are also removed.
"""
temp_dir_path = mkdtemp()
try:
yield temp_dir_path
finally:
rmtree(temp_dir_path)
def filter_file(in_file_path, temp_dir_path, filter=None):
"""Runs each line of the file through the given filter.
The original files is backed up with a "~" added to the end of the file
name.
During processing a temporary file, with the same name as as the file to
process, is created in the given temporary directory. It only replaces the
original file if there are no errors.
"""
temp_file_path = join(temp_dir_path, basename(in_file_path))
with open(in_file_path, 'r') as in_file:
with open(temp_file_path, 'w') as temp_file:
for line in in_file:
if filter is None or filter(line):
temp_file.write(line)
move(in_file_path, in_file_path + u'~')
move(temp_file_path, in_file_path)
def _default_filter(line):
"""Default bad header filter.
Filters out lines that start with ">From" or "From " (no colon).
"""
line = line.strip()
return (False if line.startswith('>From') or line.startswith('From ') else
True)
def clean_headers(dir_paths, filter=_default_filter):
"""Remove bad header lines from all mail files in all given directories.
Bad header lines are lines that start with ">From" or "From " (no colon) as
created by Thunderbird. :(
You can override this behaviour by providing yoru own filter callable. It
should accept a text line as the only argument and return True to keep the
line and false to omit it.
Directories are *not* recursed. You must specify each directory
explicitly.
This is the function to call if you are using this module as a library
rather than a command-line script.
An exception is raised if no directory paths are given.
This is a generator. It first yields the file it's going to process next,
and then on the next iteration, processes that file and yields the next
file name to process. This way you can provide feedback to the user before
each file is processed.
You can cause a file to be skipped by sending a true value into the
generator instead of just calling next().
"""
if not dir_paths:
raise InsufficientDirectories()
with make_temp_dir() as temp_dir_path:
for file_path in get_file_paths(*dir_paths):
if not (yield file_path):
filter_file(file_path, temp_dir_path, filter)
def main(dir_paths):
"""Main function called when running as a command-line script.
Progress and errors are printed to stdout and stderr, respectively.
"""
try:
for file_path in clean_headers(dir_paths):
print file_path
except FixMailHeadersError, error:
print >>stderr, error
exit(-1)
if __name__ == '__main__':
main(argv[1:])
|
I recently migrated to Kolab, which uses Cyrus IMAP, for all my mail. Some messages would not copy over, however. The error was "Message contains invalid header". I used to use Thunderbird and according to http://kb.mozillazine.org/Message_contains_invalid_header, it inserts invalid headers in the form of ">From" and "From " (note the trailing space and no colon). Normally this is not a problem, but they not not rfc-compliant and Cyrus is very strict.
So... I wrote this script to process all my mail locally (maildir format in Kmail/Kontact). Then I was able to copy all my messages over tot he IMAP server.
This is my first cookbook script. It may be a bit over-engineered, but I believe it is good clean code that shows off generators and context managers and decent docstrings.
Feel free to use it however you want. I consider it public domain, provided "AS IS" and use as your own risk. The standard "I disclaim everything" applies.
Hope it helps! :)
Notes:
- Original messages are backed up in the same folder with a "~" added to the end of the file name. If the changes worked you should delete the backups if you are working in a maildir folder or you will/could get double messages in your mail client.
- Real "From:" lines have a colon and are not touched.
- Directories are not recursed. You must specify each directory explicitly.
- All files are processed, so plan ahead if yoru mail folder has non mail files in it.
- Tested in maildir directories, but should work on directories of mbox files too (not tested though). YMMV.
Example usage:
fix_mail_headers.py ~/.kde/share/apps/kmail/mail/folder1/cur ~/.kde/share/apps/kmail/mail/folder2/cur
The diagnosis is correct but the current script is more destructive than I found necessary. Only lines of the form:
>From - Day Mon DD hh:mm:ss YYYY
Needed to be deleted in my case. >From in the message body is not a problem and is often a line of body text.
In place of line 107, try this less destructive version first:
return (False if line.startswith('>From - ') else
Note that the following UNIX command:
grep -v '^>From - ' mboxfile >mboxfile.nofrom
appears to work just as well as the script on a single mailbox file, in this example mboxfile.
WARNING: the script on this page is destructive, it deletes text line in the body of emails!
After browsing through the mail data file I noticed that the unwanted >From headers were always preceded by a X-Mozilla-... header, so I constructed a SED command to remove only those >From lines:
You need to run this command on each of the mail folded files in your profile directory that contains incorrect headers. You need to pick the file without file extension, not the msf file. Sed will filter the file to a new mail folder file, which will be picked up automatically next time you start Thunderbird.
To find your Thunderbird profile folder: http://kb.mozillazine.org/Profile_folder_-_Thunderbird In your profile folder go to /Mail/Local Folders.
In other regex engines you would probably be able to use this: