It's a simple recipe to convert a str type string with pure unicode code point (e.g string = "\u5982\u679c\u7231" ) to an unicode type string. Actually, this method has the same effect with 'u' prefix. But differently, it allows you to pass a variable of code points string as well as a constant one.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | def u_converter( string = "\u5982\u679c\u7231" ):
"""
Simple handler for converting a str type string with pure unicode
code point (that is it has '\u' in string but no 'u' prefix) to
an unicode type string.
Actually, this method has the same effect with 'u' prefix. But differently,
it allows you to pass a variable of code points string as well as a constant
one.
"""
chars = string.split("\u")
chinese = ''
for char in chars:
if len(char):
try:
ncode = int(char,16)
except ValueError:
continue
try:
uchar = unichr(ncode)
except ValueError:
continue
chinese += uchar
return chinese
if __name__ == "__main__":
pure_string = '\u9633\u5149\u707f\u70c2\u7684\u65e5\u5b50'
print u_converter(pure_string)
|
Usually we can easily decode a string(say 'gbk' encoded) to a unicode string. But now, I want to convert a str type string with pure unicode code point (that is it has '\u' and integer followed in string but no 'u' prefix, e.g "\u5982\u679c\u7231") an unicode type string. If the str is constant, just adding a 'u' prefix will do it. If the str is a variable, the 'u' prefix and unicode function cannot apply, it would be treated as pure string. The main point is to use unichr function.
You have just duplicated what the 'unicode_escape' codec does.
Thanks so much. Actually, as I'm new in python and afraid there may be a built-in function or easy way to do this job. But after searching and asking such question in several forums, I got no explicit answer.
'unicode_escape' indeed can handle this. Thanks again~
The comp.lang.python newsgroup is a friendly place to ask such questions. See http://www.python.org/community/lists/