Often, emails, email text archives and various other Internet-related applications embed Base64 encoded text data. The encoded string is usually split into several lines and is not easily convertible back to the orginal file. The following one-liner makes such a conversion a breeze.
1 2 3 4 5 6 7 8 9 10 11
import sys, base64 if len(sys.argv) < 3: print """Usage: %s in_b64_enc_file out_dec_file in_b64_enc_file - The Base64 encoded file to be converted out_dec_file - The output decoded file """%sys.argv sys.exit(0) base64.decode(open(sys.argv, 'rb'), open(sys.argv, 'wb'))
The other day, I had to extract a Base64 encoded gzipped tar-ball from a Mailman raw text archive. The encoded text looks like this:
name="foo.tgz" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="foo.tgz"
H4sIAHBEyj4AA+y9a2Mbx5Eomq/Br2iD0QFgg+BbTGhRuxRJydhQJJek7PhKOsgQGJJjAhhk ZiCKSXx++61Xv+YBgrIsZ7OY3VjgTD+qq6urq6qrq9JhlN2EycrvfsVndXVzdXtrC/5dXd1+ +pT+XdvcpH/l+R1+WN/cXF9dW/vd6tr66vb679TWrwmUfqZpFiRK/W6S3gRXQTijXJikXwKg L/ukMv/d4/OLvaOjX4UOHjH/a1sb8H1tY31tYzH/X+LJz/9BEn0Ik/14fNWZ3H+mPlbXVlef 4hxXzP/6U1jz2xtPt9ZXn67BxK+ubWFxtfqZ+p/5/C+f/1pt/+T4ZfdV7/XJwZujQ7WrGq/v ..... ..... sf2eaage4wvBjOQ/dHTkP9RSHP+Fi5Ar/7WA6uQ/VJ38H13ZdUXXBRdccMEFF1xwwQUXXHDB BRdccMEFF2YI/wNPSNJQAHAcAA==
I needed to convert this into the foo.tgz. One way is to copy the entire thing into a text file and remove all the line breaks manually, read the string in the file and decode the Base64 string. This is impractical, since the files typically contain thousands of lines.
The script, however, does exactly that, once the encoded content is saved to a file, without the tedium. Copy and paste the entire Base64 encoded text into a text file and save it as foo.txt. The line below takes care of the rest.
open("foo.tgz", 'wb').write(base64.decodestring('\n'.join(open("foo.txt", 'rb').readlines())))
foo.txt is opened and all lines are read into a list. The lines are joined removing the line breaks and decoded. The decoded content is then written to the output file.
Note that the script reads the entire content into memory before decoding. This could be a problem for very large files. However, the problem domain is usually attachments to emails and stuff like that where file sizes are rather small.
Alexander Semenov pointed out that base64.decode(...) does the trick without the need to concatenate the lines outrselves. See comments below.
Here is the version using just the library calls:
base64.decode(open(sys.argv, 'rb'), open(sys.argv, 'wb'))
I hope this is a useful addition to your handy one-liners. It certainly is for me.