For some people, thunderbird lacks of a powerfull automatic detach function that keeps the link to the detached file from the mail itself (like Eudora has). Some plugins exists but there is now way to detach a document and replace it with a link to it's new location. For security reasons, the developper team doesn't want to allow the opening of a local file from thunderbird. When you have to keep large history of your emails, this may become a real problem for the size of your box and the facilities to automate some tasks on files (you can't use programs to see if you have doubles and so on). If, like me, you want to keep your email box as slim as possible by detaching attached documents and keep links to the detached files, this may be for you.
The trick is that the file is detached and replaced by an html one in which you'll find the link to the file. The html file will be opened in your browser where the link to your local filesystem will operate.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 | # -*- coding: iso-8859-1 -*-
""" author : Jean-Christophe Clavier
For some people, thunderbird lacks of a powerfull automatic detach function that keeps the link
to the detached file from the mail itself (like Eudora has).
Some plugins exists but there is now way to detach a document and replace it with a link to it's
new location.
For security reasons, the developper team doesn't want to allow the opening of a local file from
thunderbird.
When you have to keep large history of your emails, this may become a real problem for the size
of your box and the facilities to automate some tasks on files (you can't use programs to see if
you have doubles and so on).
If, like me, you want to keep your email box as slim as possible by detaching attached documents
and keep links to the detached files, this may be for you.
The trick is that the file is detached and replaced by an html one in which you'll find the link
to the file. The html file will be opened in your browser where the link to your local
filesystem will operate.
To keep it possible to change your detached files directory location, I have chosen to make the
links with javascript (<a href="javascript:detachFile(file)">...). The javascript file name and
location is a parameter of this script but will have to stay in the place you've decided.
This way, the javascript must stay the same place but all the detached files may move without
breaking the links with the mails (you'll just have to change the javascript)
The best would be to read some environment variable from an embeded javascript so that the js
script would not be fixed in some place forever but i don't know how to do that. If someone has
an idea, i would be glad to know how
The prm class is used to store the parameters
"""
import os, os.path, pprint, fnmatch, re, time, sys, traceback
import email, email.Utils, mimetypes, email.MIMEText
import quopri
class prm(object):
""" Used to simply store parameters
this could be in some kind of ini file but i wanted to keep this short
"""
TEST=False # If True, the module just lists directories without detaching docs
jsfile="C:\\Documents and Settings\\__User__\\Application Data\\Thunderbird\\Profiles\\__User__\\detach.js" # The JS script used to open files
# (I use an external jsscript to ease the work in case one wants to change the detached files directory:
# only this script is to change)
maildir="C:\\Documents and Settings\\__User__\\Application Data\\Thunderbird\\Profiles\\__User__\\Mail\\pop.__Fai__.fr" # The root dir of the mails
outdir="C:\\Some\\Path\\You\\Want" # The root dir where to put the modified Thunderbird files
attachedFilesDir="C:\\Some\\Other\\Path\\To\\Your\\attached\\Files\\Dir" # Where to put the attached files
httpAttachedFilesDir="file:///C|/Some/Other/Path/To/Your/attached/Files/Dir/" # What to use in your links (you may use a http server if you like)
# This will be used in the javascript function
includeFiles=[] # We want to detach docs only for these files (empty=all files)
excludeFiles=["Trash"] # We don't want to detach docs for these files
includeDirs=[] # We want to detach docs only for these dirs (empty=all dirs)
excludeDirs=["Trash.sbd"] # We don't want to detach docs for these dirs
logfilename="gestmail.log"
class gestMail(object):
""" Processes the mail directory
"""
def __init__(self):
self.lf=file(prm.logfilename,'w')
def __del__(self):
self.lf.close()
def writeJS(self):
jsScript="""
function openFile(file) {
window.open("file:///%s/" + file, "_self")
}""" % prm.attachedFilesDir.replace("\\", "/").replace(":", "|")
f=file(prm.jsfile,'w')
f.write(jsScript)
f.close()
def writelog(self, *msg):
sep=""
for m in msg:
print m.decode('latin1').encode('cp850', 'replace'), # display in M$ Windows command.
self.lf.write("%s%s" % (sep, str(m)))
sep=" "
print ""
self.lf.write("\n")
def getfiles(self, dir):
""" Walk throught the dir where mails are stored
"""
for (dirpath, dirnames, filenames) in os.walk(dir):
subdir=self.getSubdir(dir, dirpath)
if (prm.includeDirs==[] or subdir in prm.includeDirs) and subdir not in prm.excludeDirs:
for file in [filenames[i] for i in range(len(filenames)) if not fnmatch.fnmatch(filenames[i], '*.msf')]:
if (prm.includeFiles==[] or file in prm.includeFiles) and file not in prm.excludeFiles:
self.writelog("-"*78)
self.writelog(os.path.join(subdir, file))
yield (dirpath, file)
else:
for e in dirnames[:]:
dirnames.pop()
def getSubdir(self, rootdir, currentdir):
subdir=currentdir[len(rootdir):]
if len(subdir) > 0 and subdir[0] in ("\\", "/"):
subdir=subdir[1:]
return subdir
def getHtmlLinksFileContent(self, filenames):
""" returns the template of the html doc with the links
"""
links=''
for i, filename in enumerate(filenames):
try:
if len(filenames) > 1:
if i==0:
links+='<p><a href="javascript:openFile(\'%s\')">%s</a> - Existe déjà</p>\n' % (filename, filename)
else:
links+='<p>Renommé en <a href="javascript:openFile(\'%s\')">%s</a></p>\n' % (filename, filename)
else:
links+='<p><a href="javascript:openFile(\'%s\')">%s</a></p>\n' % (filename, filename)
except:
self.writelog(filenames, filename)
raise
txt="""
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
<script type="text/javascript" src="file:///%s"></script>
</head>
<body>
%s
</body>
</html>
""" % (prm.jsfile.replace(":","|"), links)
return txt
def decode_Q(self, s):
""" Decodes the file names when special characters are used (thanks to wacky)
"""
try:
qstart = s.index("=?") # Find the start of the Q-encoded string
qend = s.index("?=", qstart + 2) # Find the end of it
qenc_start = s.index("?", qstart + 2) # Find the encoding start
qenc_end = s.index("?", qenc_start + 1) # Find the encoding end
if s[qenc_start:qenc_end+1].upper() == "?Q?": # Q-encoded?
enc = s[qstart+2:qenc_start] # Get the character encoding
deQ = s[:qstart] # Get everything before the Q-encoded portion
Qstring = s[qenc_end+1:qend] # Get the Q-encoded portion
deQ += quopri.decodestring(Qstring, True).decode(enc) # Decode it
deQ += s[qend+2:] # Get everything after the Q-encoded portion
return self.decode_Q(deQ) # De-Q-code any further Q-encoded portions
else:
raise Exception("String is not Q-encoded: %s" % s) # It blew up
except ValueError: # Index not found in string <s>; not Q-encoded
return s # Returns a Unicode string
def detachFile(self, att_name, filecontent, date=None):
""" Writes the file and return the corresponding html
The html file will store a link to the detached file
If the file already exists, we compute a new name and create a link to the computed name file
and to the file that has the same name...
date is the date to use for the detached file
"""
att_name=os.path.split(att_name)[1] # sometimes, the name of the attached file contains the whole path which is not wanted
if att_name.find('=?') != -1: # If the filename is encoded, we decode it
filename=quopri.decodestring(self.decode_Q(att_name).encode('latin1')) # and encode it in latin1 (you may change this to utf-8)
else:
filename=att_name
filelist=[filename] #
# we write the attached file
# Sometimes, due to formating, we find \n, \r or \t characters in the filenames. So we cut them off
ofname=os.path.join(prm.attachedFilesDir, filename.strip().replace('\n','').replace('\r','').replace('?','').expandtabs(0))
if os.path.exists(ofname):
i=1
fname=ofname
sfname=os.path.splitext(fname)
while os.path.exists(fname):
fname="%s[%d]%s" % (sfname[0], i, sfname[1])
i+=1
filelist.append(os.path.split(fname)[-1])
ofname=fname # the outfile name is the new one
of=open(ofname,'wb')
of.write(filecontent)
of.close()
if date:
os.utime(ofname, (date, date))
return self.getHtmlLinksFileContent(filelist)
def processMessage(self, m):
""" walk through the parts of multiparts MIME messages
"""
self.writelog(" - process mail : ", re.compile("=\?.*?Q\?").sub("", quopri.decodestring(m.get_all("Subject")[0])))
for i in m.walk():
if i.is_multipart():
continue
if i.get_content_maintype() == 'text':
pl = i.get_payload(decode=False)
continue
att_name = i.get_filename(None) # get the filename
if not att_name:
# This part is relative to an attached file without name
ext = mimetypes.guess_extension(i.get_content_type())
att_name = 'makeitup%s' % ext
pl = i.get_payload(decode=False) # Get the content of this part of the message without decoding it
else:
# This part is relative to an attached file
ext=att_name.split(".")[-1]
if ext != "html": # we dont detach html files because our links will be contained in html files
# So, if whe use this script more than once, we don't want to detach html links files
pl = i.get_payload(decode=True) # We get the content of this part of the message and decode it
date=self.getDate(m.get_all("Date")[0])
htmlcontent=self.detachFile(att_name, pl, date)
# We replace the header to mention this is an html file
i.set_type("text/html")
try:
i.replace_header("Content-Transfer-Encoding", "7bit")
except KeyError:
self.writelog("ERROR - Content-Transfer-Encoding")
traceback.print_exc(file=self.lf)
i.replace_header("Content-Disposition", 'inline; filename="%s.html"' % att_name)
i.del_param("name")
# We replace the original included file by the html file containing the links
i.set_payload(htmlcontent)
return str(m)
def getDate(self, date):
signe=re.compile(" [+-]")
fuseau=signe.split(date.replace("GMT","").strip())
tdate=list(time.strptime(fuseau[0], "%a, %d %b %Y %H:%M:%S"))
try:
decalage=int(fuseau[1])/100
#print date, "*%s*" % signe.search(date).group(0)
if signe.search(date).group(0)==" +":
tdate[3]+=decalage
elif signe.search(date).group(0)==" -":
tdate[3]-=decalage
except IndexError:
pass
# newdate=time.strftime("%d/%m/%Y %H:%M:%S", tdate)
newdate=time.mktime(tdate)
return newdate
def process(self):
# we open all the files containing emails
try:
for path, filename in self.getfiles(prm.maildir):
if not prm.TEST:
f=file(os.path.join(path,filename),'r')
# we separate all the messages contained in the file
lmsg=f.read().split("From - ")
# we turn the messages into strings
msgs=[email.message_from_string("From - " + msg) for msg in lmsg[1:]]
# we write the out files
curdir=os.path.join(prm.outdir, self.getSubdir(prm.maildir, path))
if not os.path.exists(curdir):
os.makedirs(curdir)
curfile=os.path.join(curdir, filename)
of=file(curfile,'w')
for msg in msgs:
try:
of.write("%s\n" % self.processMessage(msg))
except:
of.write(str(msg))
self.writelog("ERROR while detaching - File is kept in the email")
traceback.print_exc(file=self.lf)
of.close()
except:
self.writelog("Fatal ERROR")
traceback.print_exc(file=self.lf)
raise
if __name__ == "__main__":
reload(sys)
sys.setdefaultencoding('latin1') # Encoding problems have been very painful
# This is not very clean and may cause trouble for you, i don't know
# If someone knows a better way to manage encoding in this module, i would
# be glad to know too...
g=gestMail()
g.writeJS()
g.process()
|
This snippet works well but some points may be improved : - I had many problems with encoding and haven't tested many configurations so this may not work very well. If someone sees possible improvement on the encoding, I would be glad to know how. - The location of the detached files may be changed as this location is pointed at in a javascript file. But the location of this javascript file is not to be changed to keep the links valid. I don't know how I can read some data (environment variable, local file...) with javascript on the client side so i'm stuck with that too... - Any other thing you may see
Anyway, you don't take many risks if you decide to test it as the detaching operation generates a copy of the thunderbird files... You have the possibility to run this script, see if the result is ok for you (by copying some mail files in the thunderbird mails location) and choose to overwrite the thunderbird files or not...
Feedbacks are welcome :)