Python 2.6 and 3.0 make it practical to implicitly convert hexadecimal Unicode escape sequences sent to stdout or stderr (or other text files) back to the original Unicode characters.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | # WARNING: This recipe currently breaks display of the representation of strings containing other string escape sequences such as '\n'. I can't find a way to get ASPN to hide the recipe from public view until I can figure out a way to fix it though :(
# NOTE: This recipe is written to work with Python 3.0
# It would likely require changes to work on Python 2.6, and won't work at all
# on earlier 2.x versions
import sys, io
# With the new IO module, it's easy to create a variant of an
# existing IO class
class ParseUnicodeEscapes(io.TextIOWrapper):
def write(self, text):
super().write(text.encode('latin-1').decode('unicode_escape'))
# To replace sys.stdout/stderr, we first collect the necessary
# constructor arguments from the current streams
stdout_args = (sys.stdout.buffer, sys.stdout.encoding, sys.stdout.errors,
None, sys.stdout.line_buffering)
stderr_args = (sys.stderr.buffer, sys.stderr.encoding, sys.stderr.errors,
None, sys.stderr.line_buffering)
# Once we replace the streams, any '\uXXXX' sequences written to
# sys.stdout or sys.stderr will be replaced with the corresponding
# Unicode characters
sys.stdout = ParseUnicodeEscapes(*stdout_args)
sys.stderr = ParseUnicodeEscapes(*stderr_args)
|
For programmers working in non-ASCII based languages such as Russian or Japanese, the use of Unicode hexadecimal escape sequences in things such as internal string representations can be quite inconvenient, especially when working at the interactive prompt, or when dealing with file names containing non-ASCII characters.
With the new Python based IO system being introduced for Python 3.0 (and backported to Python 2.6), it becomes fairly straightforward to insert a conversion in the output stream that will replace all those Unicode escape sequences with the original Unicode characters.
Before:<pre>
>>> "тест"
'\u0442\u0435\u0441\u0442'
>>> f = open("тест")
Traceback (most recent call last):
File "", line 1, in
File "/devel/py3k/Lib/io.py", line 212, in __new__
return open(*args, **kwargs)
File "/devel/py3k/Lib/io.py", line 151, in open
closefd)
IOError: [Errno 2] No such file or directory: '\u0442\u0435\u0441\u0442'
>>> f = open("тест", "w")
>>> f.name
'\u0442\u0435\u0441\u0442'</pre>
After:<pre>
>>> "тест"
'тест'
>>> open("тест")
Traceback (most recent call last):
File "", line 1, in
File "/devel/py3k/Lib/io.py", line 212, in __new__
return open(*args, **kwargs)
File "/devel/py3k/Lib/io.py", line 151, in open
closefd)
IOError: [Errno 2] No such file or directory: 'тест'>>> f = open("тест", "w")
>>> f.name
'тест'</pre>