Welcome, guest | Sign In | My Account | Store | Cart

Python 2.6 and 3.0 make it practical to implicitly convert hexadecimal Unicode escape sequences sent to stdout or stderr (or other text files) back to the original Unicode characters.

Python, 27 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# WARNING: This recipe currently breaks display of the representation of strings containing other string escape sequences such as '\n'. I can't find a way to get ASPN to hide the recipe from public view until I can figure out a way to fix it though :(

# NOTE: This recipe is written to work with Python 3.0
# It would likely require changes to work on Python 2.6, and won't work at all
# on earlier 2.x versions

import sys, io

# With the new IO module, it's easy to create a variant of an
# existing IO class
class ParseUnicodeEscapes(io.TextIOWrapper):
  def write(self, text):
    super().write(text.encode('latin-1').decode('unicode_escape'))

# To replace sys.stdout/stderr, we first collect the necessary
# constructor arguments from the current streams

stdout_args = (sys.stdout.buffer, sys.stdout.encoding, sys.stdout.errors,
               None, sys.stdout.line_buffering)
stderr_args = (sys.stderr.buffer, sys.stderr.encoding, sys.stderr.errors,
               None, sys.stderr.line_buffering)

# Once we replace the streams, any '\uXXXX' sequences written to
# sys.stdout or sys.stderr will be replaced with the corresponding
# Unicode characters
sys.stdout = ParseUnicodeEscapes(*stdout_args)
sys.stderr = ParseUnicodeEscapes(*stderr_args)

For programmers working in non-ASCII based languages such as Russian or Japanese, the use of Unicode hexadecimal escape sequences in things such as internal string representations can be quite inconvenient, especially when working at the interactive prompt, or when dealing with file names containing non-ASCII characters.

With the new Python based IO system being introduced for Python 3.0 (and backported to Python 2.6), it becomes fairly straightforward to insert a conversion in the output stream that will replace all those Unicode escape sequences with the original Unicode characters.

Before:<pre>

>>> "тест"
'\u0442\u0435\u0441\u0442'
>>> f = open("тест")
Traceback (most recent call last):
  File "", line 1, in
  File "/devel/py3k/Lib/io.py", line 212, in __new__
    return open(*args, **kwargs)
  File "/devel/py3k/Lib/io.py", line 151, in open
    closefd)
IOError: [Errno 2] No such file or directory: '\u0442\u0435\u0441\u0442'
>>> f = open("тест", "w")
>>> f.name
'\u0442\u0435\u0441\u0442'</pre>

After:<pre>

>>> "тест"
'тест'
>>> open("тест")
Traceback (most recent call last):
  File "", line 1, in
  File "/devel/py3k/Lib/io.py", line 212, in __new__
    return open(*args, **kwargs)
  File "/devel/py3k/Lib/io.py", line 151, in open
    closefd)
IOError: [Errno 2] No such file or directory: 'тест'>>> f = open("тест", "w")
>>> f.name
'тест'</pre>
Created by Nick Coghlan on Wed, 16 Apr 2008 (PSF)
Python recipes (4591)
Nick Coghlan's recipes (11)

Required Modules

Other Information and Tasks