ASPN ActiveState Programmer Network  
ActiveState, a division of Sophos
/ Home / Perl / PHP / Python / Tcl / XSLT /
/ Safari / My ASPN /
Cookbooks | Documentation | Mailing Lists | Modules | News Feeds | Products | User Groups
Submit Recipe
My Recipes

All Recipes
All Cookbooks


View by Category

Title: Normalizing newlines between windows/unix/macs
Submitter: Ori Peleg (other recipes)
Last Updated: 2005/07/02
Version no: 1.0
Category: Text

 

Not Rated yet


Description:

When comparing text generated on different platforms, the newlines are different. This recipe normalizes any string to use unix-style newlines.

This code is used in the TestOOB unit testing framework (http://testoob.sourceforge.net).

Source: Text Source

def _normalize_newlines(string):
    import re
    return re.sub(r'(\r\n|\r|\n)', '\n', string)

Discussion:

I've tested this on POSIX and Windows. Anyone with an old Mac care to try it? :-)



Add comment

Number of comments: 4

Speed up by precompiling regular expression, Andreas Kloss, 2005/07/20
On the expense of one more line (and the re module plus the regular expression inserted into your namespace), you can get some speed (On my PC, for the contents of a random python script, it finishes in a third of the time) by pulling almost everything out of the function. Of course, this works best if you use this function quite often.

import re
_newlines_re = re.compile(r'(\r\n|\r|\r)')
def _normalize_newlines(string):
    return _newlines_re.sub('\n', string)

Add comment

Good point, Ori Peleg, 2005/08/05
When this function shows up in my profiler I'll probably do this. Until it does, I prefer the greater readability -- in my eyes -- of not precompiling the expression.
Add comment

don't use regular expressions when not really needed., Not specified Not specified, 2005/07/20
It's even better to do two replace calls:

#!/usr/bin/env python
import profile, re, random

s = "".join([random.choice(" \n\r") for i in range(10000)])

def use_re_sub():
    global r1
    for i in range(1000):
        r1 = re.sub(r'(\r\n|\r|\n)', '\n', s)

_newlines_re = re.compile(r'(\r\n|\r|\n)')
def use_re_compile():
    global r2
    for i in range(1000):
        r2 = _newlines_re.sub('\n', s)

def use_replace():
    global r3
    for i in range(1000):
        r3 = s.replace('\r\n', '\n').replace('\r', '\n')

profile.run('use_re_sub()')
profile.run('use_re_compile()')
profile.run('use_replace()')
assert r1 == r2 == r3
The last version is several times faster (of course this also depends on the string you convert).
Add comment

The regular expression isn't there for special features, Ori Peleg, 2005/08/05
It's there for readability. Replacing (\r\n|\r|\n) with whatever (I arbitrarily chose '\n') sits in my mind fairly well. And I understand that regex at a single glance.

I agree that using two replaces, noting that '\n' need not be replaced with '\n', is both efficient and clever.
I'll probably stay with the regex, though, because I find it easier to understand, and it isn't a performance hit in my application yet.
Add comment



Highest rated recipes:

1. Implementation of sets ...

2. bag collection class

3. deque collection class

4. Floating Point Simulator

5. HTML colors to/from RGB ...

6. Select the nth smallest ...

7. Function Decorators by ...

8. MS SQL Server log monitor

9. Table objects with ...

10. wx twisted support using ...




Privacy Policy | Email Opt-out | Feedback | Syndication
© 2006 ActiveState Software Inc. All rights reserved.