Welcome, guest | Sign In | My Account | Store | Cart

Removes ">From" and "From " lines from mail headers.

Thunderbird adds invalid mail headers to it's local folders. Cyrus IMAP is strict about them. This script walks through all files in the given directories and removes any line that starts with ">From" or "From " (note the space and no colon).

Requires Python 2.5+.

Python, 161 lines
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
#!/usr/bin/env python

"""Removes ">From" and "From " lines from mail headers.

Thunderbird adds invalid mail headers.  Cyrus IMAP is strict about them.  This
script walks through all files in the given directories and removes any line
that starts with ">From" or "From " (no colon).

Real "From:" lines must have the colon.

"""


from __future__ import with_statement
from sys import argv, exit, stderr
from os import listdir
from os.path import abspath, normpath, join, isfile, basename
from tempfile import mkdtemp
from contextlib import contextmanager
from shutil import rmtree, move


class FixMailHeadersError(Exception):

    """Base class for all exceptions in this module.

    This is different than the standard Exception class, in that sub-classes
    should define a _default_message attribute with the default message to
    present to the user.

    If not message is given at instantiation time, then the default message is
    used automatically.

    Also, note that only one argument (the message) can be given when
    instantiating these exceptions.

    """

    _default_message = u'Kaboom!' # override me!

    def __init__(self, message=None):
        """Set the given message or the default message is one is not given."""
        super(FixMailHeadersError, self).__init__(message if message is not
          None else self._default_message)


class InsufficientDirectories(FixMailHeadersError):

    """At least one directory must be given to clean_headers for it to work."""

    _default_message = u'You must specify at least one mail directory to scan.'


def get_file_paths(*dir_paths):
    """Yields all non-hidden, non-backup files in each given directory."""
    for dir_path in dir_paths:
        dir_path = abspath(normpath(dir_path))
        for file_name in listdir(unicode(dir_path)):
            file_path = join(dir_path, file_name)
            if (isfile(file_path) and not file_name.startswith(u'.') and not
              file_name.endswith(u'~')):
                yield file_path


@contextmanager
def make_temp_dir():
    """Context manager that creates and removes a temporary directory.

    All contents are also removed.

    """
    temp_dir_path = mkdtemp()
    try:
        yield temp_dir_path
    finally:
        rmtree(temp_dir_path)


def filter_file(in_file_path, temp_dir_path, filter=None):
    """Runs each line of the file through the given filter.

    The original files is backed up with a "~" added to the end of the file
    name.

    During processing a temporary file, with the same name as as the file to
    process, is created in the given temporary directory.  It only replaces the
    original file if there are no errors.

    """
    temp_file_path = join(temp_dir_path, basename(in_file_path))
    with open(in_file_path, 'r') as in_file:
        with open(temp_file_path, 'w') as temp_file:
            for line in in_file:
                if filter is None or filter(line):
                    temp_file.write(line)
    move(in_file_path, in_file_path + u'~')
    move(temp_file_path, in_file_path)


def _default_filter(line):
    """Default bad header filter.

    Filters out lines that start with ">From" or "From " (no colon).

    """
    line = line.strip()
    return (False if line.startswith('>From') or line.startswith('From ') else
      True)


def clean_headers(dir_paths, filter=_default_filter):
    """Remove bad header lines from all mail files in all given directories.

    Bad header lines are lines that start with ">From" or "From " (no colon) as
    created by Thunderbird. :(

    You can override this behaviour by providing yoru own filter callable.  It
    should accept a text line as the only argument and return True to keep the
    line and false to omit it.

    Directories are *not* recursed.  You must specify each directory
    explicitly.

    This is the function to call if you are using this module as a library
    rather than a command-line script.

    An exception is raised if no directory paths are given.

    This is a generator.  It first yields the file it's going to process next,
    and then on the next iteration, processes that file and yields the next
    file name to process.  This way you can provide feedback to the user before
    each file is processed.

    You can cause a file to be skipped by sending a true value into the
    generator instead of just calling next().

    """
    if not dir_paths:
        raise InsufficientDirectories()
    with make_temp_dir() as temp_dir_path:
        for file_path in get_file_paths(*dir_paths):
            if not (yield file_path):
                filter_file(file_path, temp_dir_path, filter)


def main(dir_paths):
    """Main function called when running as a command-line script.

    Progress and errors are printed to stdout and stderr, respectively.

    """
    try:
        for file_path in clean_headers(dir_paths):
            print file_path
    except FixMailHeadersError, error:
        print >>stderr, error
        exit(-1)


if __name__ == '__main__':
    main(argv[1:])

I recently migrated to Kolab, which uses Cyrus IMAP, for all my mail. Some messages would not copy over, however. The error was "Message contains invalid header". I used to use Thunderbird and according to http://kb.mozillazine.org/Message_contains_invalid_header, it inserts invalid headers in the form of ">From" and "From " (note the trailing space and no colon). Normally this is not a problem, but they not not rfc-compliant and Cyrus is very strict.

So... I wrote this script to process all my mail locally (maildir format in Kmail/Kontact). Then I was able to copy all my messages over tot he IMAP server.

This is my first cookbook script. It may be a bit over-engineered, but I believe it is good clean code that shows off generators and context managers and decent docstrings.

Feel free to use it however you want. I consider it public domain, provided "AS IS" and use as your own risk. The standard "I disclaim everything" applies.

Hope it helps! :)

Notes:

  • Original messages are backed up in the same folder with a "~" added to the end of the file name. If the changes worked you should delete the backups if you are working in a maildir folder or you will/could get double messages in your mail client.
  • Real "From:" lines have a colon and are not touched.
  • Directories are not recursed. You must specify each directory explicitly.
  • All files are processed, so plan ahead if yoru mail folder has non mail files in it.
  • Tested in maildir directories, but should work on directories of mbox files too (not tested though). YMMV.

Example usage:

fix_mail_headers.py ~/.kde/share/apps/kmail/mail/folder1/cur ~/.kde/share/apps/kmail/mail/folder2/cur

2 comments

Mr Trilby 12 years, 11 months ago  # | flag

The diagnosis is correct but the current script is more destructive than I found necessary. Only lines of the form:
>From - Day Mon DD hh:mm:ss YYYY
Needed to be deleted in my case. >From in the message body is not a problem and is often a line of body text.

In place of line 107, try this less destructive version first:
return (False if line.startswith('>From - ') else

Note that the following UNIX command:
grep -v '^>From - ' mboxfile >mboxfile.nofrom
appears to work just as well as the script on a single mailbox file, in this example mboxfile.

DennisV 7 years, 3 months ago  # | flag

WARNING: the script on this page is destructive, it deletes text line in the body of emails!

After browsing through the mail data file I noticed that the unwanted >From headers were always preceded by a X-Mozilla-... header, so I constructed a SED command to remove only those >From lines:

sed ':begin;$!N;s/^\(X-Mozilla-.*\)\n>From .*$/\1/;tbegin;P;D' < "My Mail Folder" > "My Mail Folder Fixed"

You need to run this command on each of the mail folded files in your profile directory that contains incorrect headers. You need to pick the file without file extension, not the msf file. Sed will filter the file to a new mail folder file, which will be picked up automatically next time you start Thunderbird.

To find your Thunderbird profile folder: http://kb.mozillazine.org/Profile_folder_-_Thunderbird In your profile folder go to /Mail/Local Folders.

In other regex engines you would probably be able to use this:

s/(X-Mozilla-Status[^\n]+)\n>From [^\n]+/$1/gs