When comparing text generated on different platforms, the newlines are different. This recipe normalizes any string to use unix-style newlines.
This code is used in the TestOOB unit testing framework (http://testoob.sourceforge.net).
1 2 3 | def _normalize_newlines(string):
import re
return re.sub(r'(\r\n|\r|\n)', '\n', string)
|
I've tested this on POSIX and Windows. Anyone with an old Mac care to try it? :-)
Tags: text
Speed up by precompiling regular expression. On the expense of one more line (and the re module plus the regular expression inserted into your namespace), you can get some speed (On my PC, for the contents of a random python script, it finishes in a third of the time) by pulling almost everything out of the function. Of course, this works best if you use this function quite often.
don't use regular expressions when not really needed. It's even better to do two replace calls:
The last version is several times faster (of course this also depends on the string you convert).
Good point. When this function shows up in my profiler I'll probably do this.
Until it does, I prefer the greater readability -- in my eyes -- of not precompiling the expression.
The regular expression isn't there for special features. It's there for readability.
Replacing (\r\n|\r|\n) with whatever (I arbitrarily chose '\n') sits in my mind fairly well. And I understand that regex at a single glance.
I agree that using two replaces, noting that '\n' need not be replaced with '\n', is both efficient and clever.
I'll probably stay with the regex, though, because I find it easier to understand, and it isn't a performance hit in my application yet.