Most viewed recipes tagged "eml" Code RecipesFix mbox files after importing EML into TB using ImportExportTools (Python) 2010-05-02T13:21:00-07:00Denis Barmenkov <p style="color: grey"> Python recipe 577214 by <a href="/recipes/users/57155/">Denis Barmenkov</a> (<a href="/recipes/tags/eml/">eml</a>, <a href="/recipes/tags/from/">from</a>, <a href="/recipes/tags/import/">import</a>, <a href="/recipes/tags/importexporttools/">importexporttools</a>, <a href="/recipes/tags/mbox/">mbox</a>, <a href="/recipes/tags/tb/">tb</a>, <a href="/recipes/tags/thunderbird/">thunderbird</a>). Revision 2. </p> <p>I've found a bug in import EML file into Thunderbird using ImportExportTools addon: when I import eml file into TB there are a 'From' line added to mbox followed with EML file contents. TB maintains right 'From' line for messages fetched from mailservers:</p> <pre class="prettyprint"><code>From - Tue Apr 27 19:42:22 2010 </code></pre> <p>ImportExportTools formats this line wrong I suppose that used some system function with default specifier so I saw in mbox file:</p> <pre class="prettyprint"><code>From - Sat May 01 2010 15:07:31 GMT+0400 (Russian Daylight Time) </code></pre> <p>So there are two errors: 1) sequence 'time year' broken into 'year time' 2) extra trash with GMT info along with time zone name</p> <p>This prevents the mbox file parsing using Python standard library (for sample) because there are a hardcoded regexp for matching From line (file lib/, class UnixMailbox):</p> <pre class="prettyprint"><code>_fromlinepattern = r"From \s*[^\s]+\s+\w\w\w\s+\w\w\w\s+\d?\d\s+" \ r"\d?\d:\d\d(:\d\d)?(\s+[^\s]+)?\s+\d\d\d\d\s*$" </code></pre> <p>Attached script fixes incorrect From lines so parsing those mboxes using Python standard library will become ok.</p>