Welcome, guest | Sign In | My Account | Store | Cart

This function takes in an arbitrary string and converts it into its raw string equivalent. Unfortunately \x will raise a ValueError and I cannot figure out how to deal with it.

[2001-06-18: Completely reworked function for performance]

Python, 28 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
escape_dict={'\a':r'\a',
           '\b':r'\b',
           '\c':r'\c',
           '\f':r'\f',
           '\n':r'\n',
           '\r':r'\r',
           '\t':r'\t',
           '\v':r'\v',
           '\'':r'\'',
           '\"':r'\"',
           '\0':r'\0',
           '\1':r'\1',
           '\2':r'\2',
           '\3':r'\3',
           '\4':r'\4',
           '\5':r'\5',
           '\6':r'\6',
           '\7':r'\7',
           '\8':r'\8',
           '\9':r'\9'}

def raw(text):
    """Returns a raw string representation of text"""
    new_string=''
    for char in text:
        try: new_string+=escape_dict[char]
        except KeyError: new_string+=char
    return new_string

This is very useful for when a user needs to input text and you want the raw equivalent to be used and not the processed version. A good example of this situation is when a user passes in a regex from some place like the command line.

Its main limitation is that it does not deal with all octal escape characters. If anyone has any suggestions on how to handle them without resorting to typing out every octal escape character then please respond to this recipe with the solution.

This function can be (and was originally) implemented as a huge if/elif/else. It can also have escape_dict contain all possible printable characters (thanks to string.printable[:-5]) and thus not have to deal with the try/except. But numbers don't lie and profiling all three different ways put this version as the fastest one.

8 comments

Scott David Daniels 22 years, 5 months ago  # | flag

\0 through \9 are not great examples. The example dictionary should not contain codes like r'\0'. While the letter escapes work, you need to represent the numeric codes as full 3-digit octal (r'\000') or hex (r'\x00') values, or you will be surprised translating the two-character string "\x007' into a constant that becomes a bell character. '\8' and '\9' are really just the characters '8' and '9'. It is also a bad idea to have a dictionary constant with conflicting entries ('\7' and '\a').

So, replace the last part of the escape_dict definition with:

...,
'\"':r'\"',
'\0':r'\000',
'\1':r'\001',
'\2':r'\002',
'\3':r'\003',
'\4':r'\004',
'\5':r'\005',
'\6':r'\006'}
Brett Cannon (author) 22 years, 5 months ago  # | flag

Defeats Purpose. If I use your suggestion, I end up with the following result:

>>> raw('\1\2\3')
'\\001\\002\\003'
>>> r'\1\2\3'
'\\1\\2\\3'

The whole point of this is to get back the string to the form it should be in had an r been appended to the string when it was created. Unfortunately your solution breaks that.

Nishanth Amuluru 14 years, 1 month ago  # | flag

To convert a normal string that is generated during runtime into a raw string...

>>> plain_str = 'newline \n'
>>> raw_str = r'newline\n'
>>> plain_str == raw_str
False
>>> temp_str = "%r"%plain_str
>>> print temp_str
'newline \\n'
>>> new_raw_str = temp_str[1:-1]
>>> new_raw_str == raw_str
True
Daniel Upton 14 years, 1 month ago  # | flag

@Nishanth Thank you. I scoured the web for an hour looking for this answer. I knew that I had to convert the runtime string to a literal, just was unsure about how to do it.

Jed Alexander 13 years, 9 months ago  # | flag

After doing some research to determine what the "%r" was doing in the format string above, this turns out to be really easy. The built-in function repr() does exactly this.

Brian Stewart 12 years, 10 months ago  # | flag

@Nishanth: that doesn't convert a plain string to a raw string--it converts to the "printable representation" of the plain string, and then chops off the quotes.

see this counter-example:

>>> plain = "\\a\a\a\b"
>>> raw = r"\\a\a\a\b"
>>> print plain
\a
>>> print raw
\\a\a\a\b
>>> plain == raw
False
>>> temp = "%r" % plain
>>> new = temp[1:-1]
>>> print new
\\a\x07\x07\x08
>>> print raw
\\a\a\a\b
>>> new == raw
False
>>>

I'm trying to figure out how to "rawify" strings containing backslashes that I'm reading from stdin and need to insert into a database. I'll continue looking and try to report back with what I find.

any further help would be wonderful!

Ratnadeep Debnath 12 years, 2 months ago  # | flag

I use the following method to convert a python string (str or unicode) into a raw string:

def raw_string(s):
    if isinstance(s, str):
        s = s.encode('string-escape')
    elif isinstance(s, unicode):
        s = s.encode('unicode-escape')
    return s

Example usage:

import re

s = "This \\"
re.sub("this", raw_string(s), "this is a text")

This works great :)

Michal Gloc 11 years, 10 months ago  # | flag

@Ratnadeep Debnath Not exactly. If I want to use '\1foo\2' it gets translated to '\x01foo\x02' which is not the same as r'\1foo\2', and doesn't work with re.sub() properly. Any solution to that?

Created by Brett Cannon on Thu, 14 Jun 2001 (PSF)
Python recipes (4591)
Brett Cannon's recipes (16)

Required Modules

  • (none specified)

Other Information and Tasks