I used a URLOpener to get the HTML file from some web-sites for some parsing. However, the returned data file had ^M everywhere, and it was pretty annoying. Before parsing this file, I want to strip it of all occurences of this control character ^M. Of course, I can use dos2unix or similar tools to do that offline, but I wanna do it the pythonic way.
First, I need to find out the ascii value for '^M'.
>>> import curses.ascii
>>> ascii.ascii('^V^M')
'\r'
Then, I can just do a search and replace '\r' in any string.
>>> string.replace( str, '\r', '' )
In my code, I just have this line in the overriden method handle_data of my html parser class.
1 2 3 4 5 6 7 8 | import string
class Stripper( SGMLParser ) :
...
def handle_data( self, data ) :
data = string.replace( data, '\r', '' )
...
|