ActiveState Code

Recipe 65211: Convert a string into a raw string


This function takes in an arbitrary string and converts it into its raw string equivalent. Unfortunately \x will raise a ValueError and I cannot figure out how to deal with it.

[2001-06-18: Completely reworked function for performance]

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
escape_dict={'\a':r'\a',
           '\b':r'\b',
           '\c':r'\c',
           '\f':r'\f',
           '\n':r'\n',
           '\r':r'\r',
           '\t':r'\t',
           '\v':r'\v',
           '\'':r'\'',
           '\"':r'\"',
           '\0':r'\0',
           '\1':r'\1',
           '\2':r'\2',
           '\3':r'\3',
           '\4':r'\4',
           '\5':r'\5',
           '\6':r'\6',
           '\7':r'\7',
           '\8':r'\8',
           '\9':r'\9'}

def raw(text):
    """Returns a raw string representation of text"""
    new_string=''
    for char in text:
        try: new_string+=escape_dict[char]
        except KeyError: new_string+=char
    return new_string

Discussion

This is very useful for when a user needs to input text and you want the raw equivalent to be used and not the processed version. A good example of this situation is when a user passes in a regex from some place like the command line.

Its main limitation is that it does not deal with all octal escape characters. If anyone has any suggestions on how to handle them without resorting to typing out every octal escape character then please respond to this recipe with the solution.

This function can be (and was originally) implemented as a huge if/elif/else. It can also have escape_dict contain all possible printable characters (thanks to string.printable[:-5]) and thus not have to deal with the try/except. But numbers don't lie and profiling all three different ways put this version as the fastest one.

Comments

  1. 1. At 7:53 p.m. on 21 oct 2001, Scott David Daniels said:

    \0 through \9 are not great examples. The example dictionary should not contain codes like r'\0'. While the letter escapes work, you need to represent the numeric codes as full 3-digit octal (r'\000') or hex (r'\x00') values, or you will be surprised translating the two-character string "\x007' into a constant that becomes a bell character. '\8' and '\9' are really just the characters '8' and '9'. It is also a bad idea to have a dictionary constant with conflicting entries ('\7' and '\a').

    So, replace the last part of the escape_dict definition with:

    ...,
    '\"':r'\"',
    '\0':r'\000',
    '\1':r'\001',
    '\2':r'\002',
    '\3':r'\003',
    '\4':r'\004',
    '\5':r'\005',
    '\6':r'\006'}
    
  2. 2. At 10:10 p.m. on 22 oct 2001, Brett Cannon (the author) said:

    Defeats Purpose. If I use your suggestion, I end up with the following result:

    >>> raw('\1\2\3')
    '\\001\\002\\003'
    >>> r'\1\2\3'
    '\\1\\2\\3'
    

    The whole point of this is to get back the string to the form it should be in had an r been appended to the string when it was created. Unfortunately your solution breaks that.

Sign in to comment