Often, emails, email text archives and various other Internet-related applications embed Base64 encoded text data. The encoded string is usually split into several lines and is not easily convertible back to the orginal file. The following one-liner makes such a conversion a breeze.
1 2 3 4 5 6 7 8 9 10 11 | import sys, base64
if len(sys.argv) < 3:
print """Usage: %s in_b64_enc_file out_dec_file
in_b64_enc_file - The Base64 encoded file to be converted
out_dec_file - The output decoded file
"""%sys.argv[0]
sys.exit(0)
base64.decode(open(sys.argv[1], 'rb'), open(sys.argv[2], 'wb'))
|
The other day, I had to extract a Base64 encoded gzipped tar-ball from a Mailman raw text archive. The encoded text looks like this:
name="foo.tgz" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="foo.tgz"
H4sIAHBEyj4AA+y9a2Mbx5Eomq/Br2iD0QFgg+BbTGhRuxRJydhQJJek7PhKOsgQGJJjAhhk ZiCKSXx++61Xv+YBgrIsZ7OY3VjgTD+qq6urq6qrq9JhlN2EycrvfsVndXVzdXtrC/5dXd1+ +pT+XdvcpH/l+R1+WN/cXF9dW/vd6tr66vb679TWrwmUfqZpFiRK/W6S3gRXQTijXJikXwKg L/ukMv/d4/OLvaOjX4UOHjH/a1sb8H1tY31tYzH/X+LJz/9BEn0Ik/14fNWZ3H+mPlbXVlef 4hxXzP/6U1jz2xtPt9ZXn67BxK+ubWFxtfqZ+p/5/C+f/1pt/+T4ZfdV7/XJwZujQ7WrGq/v ..... ..... sf2eaage4wvBjOQ/dHTkP9RSHP+Fi5Ar/7WA6uQ/VJ38H13ZdUXXBRdccMEFF1xwwQUXXHDB BRdccMEFF2YI/wNPSNJQAHAcAA==
I needed to convert this into the foo.tgz. One way is to copy the entire thing into a text file and remove all the line breaks manually, read the string in the file and decode the Base64 string. This is impractical, since the files typically contain thousands of lines.
The script, however, does exactly that, once the encoded content is saved to a file, without the tedium. Copy and paste the entire Base64 encoded text into a text file and save it as foo.txt. The line below takes care of the rest.
open("foo.tgz", 'wb').write(base64.decodestring('\n'.join(open("foo.txt", 'rb').readlines())))
foo.txt is opened and all lines are read into a list. The lines are joined removing the line breaks and decoded. The decoded content is then written to the output file.
Note that the script reads the entire content into memory before decoding. This could be a problem for very large files. However, the problem domain is usually attachments to emails and stuff like that where file sizes are rather small.
Alexander Semenov pointed out that base64.decode(...) does the trick without the need to concatenate the lines outrselves. See comments below.
Here is the version using just the library calls:
base64.decode(open(sys.argv[1], 'rb'), open(sys.argv[2], 'wb'))
I hope this is a useful addition to your handy one-liners. It certainly is for me.
base64.decode. Did you try base64.decode(file(sys.argv[1],'rb'), file(sys.argv[2],'wb')) ?
Cool! As a matter of fact I didn't. I sort of assumed decode only works with strings without line breaks. Well, I was wrong.
I eschewed the use file(...) construct for better compatibility with older versions.
I will modify the recipe and give you credit for the change.
Cheers, Sarat
Thanks! Worked for me with some emails botched by a Mac system that arrived at my machine titled "noname".