This function takes in an arbitrary string and converts it into its raw string equivalent. Unfortunately \x will raise a ValueError and I cannot figure out how to deal with it.
[2001-06-18: Completely reworked function for performance]
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | escape_dict={'\a':r'\a',
'\b':r'\b',
'\c':r'\c',
'\f':r'\f',
'\n':r'\n',
'\r':r'\r',
'\t':r'\t',
'\v':r'\v',
'\'':r'\'',
'\"':r'\"',
'\0':r'\0',
'\1':r'\1',
'\2':r'\2',
'\3':r'\3',
'\4':r'\4',
'\5':r'\5',
'\6':r'\6',
'\7':r'\7',
'\8':r'\8',
'\9':r'\9'}
def raw(text):
"""Returns a raw string representation of text"""
new_string=''
for char in text:
try: new_string+=escape_dict[char]
except KeyError: new_string+=char
return new_string
|
This is very useful for when a user needs to input text and you want the raw equivalent to be used and not the processed version. A good example of this situation is when a user passes in a regex from some place like the command line.
Its main limitation is that it does not deal with all octal escape characters. If anyone has any suggestions on how to handle them without resorting to typing out every octal escape character then please respond to this recipe with the solution.
This function can be (and was originally) implemented as a huge if/elif/else. It can also have escape_dict contain all possible printable characters (thanks to string.printable[:-5]) and thus not have to deal with the try/except. But numbers don't lie and profiling all three different ways put this version as the fastest one.
\0 through \9 are not great examples. The example dictionary should not contain codes like r'\0'. While the letter escapes work, you need to represent the numeric codes as full 3-digit octal (r'\000') or hex (r'\x00') values, or you will be surprised translating the two-character string "\x007' into a constant that becomes a bell character. '\8' and '\9' are really just the characters '8' and '9'. It is also a bad idea to have a dictionary constant with conflicting entries ('\7' and '\a').
So, replace the last part of the escape_dict definition with:
Defeats Purpose. If I use your suggestion, I end up with the following result:
The whole point of this is to get back the string to the form it should be in had an r been appended to the string when it was created. Unfortunately your solution breaks that.
To convert a normal string that is generated during runtime into a raw string...
@Nishanth Thank you. I scoured the web for an hour looking for this answer. I knew that I had to convert the runtime string to a literal, just was unsure about how to do it.
After doing some research to determine what the "%r" was doing in the format string above, this turns out to be really easy. The built-in function repr() does exactly this.
@Nishanth: that doesn't convert a plain string to a raw string--it converts to the "printable representation" of the plain string, and then chops off the quotes.
see this counter-example:
I'm trying to figure out how to "rawify" strings containing backslashes that I'm reading from stdin and need to insert into a database. I'll continue looking and try to report back with what I find.
any further help would be wonderful!
I use the following method to convert a python string (str or unicode) into a raw string:
Example usage:
This works great :)
@Ratnadeep Debnath Not exactly. If I want to use '\1foo\2' it gets translated to '\x01foo\x02' which is not the same as r'\1foo\2', and doesn't work with re.sub() properly. Any solution to that?