Welcome, guest | Sign In | My Account | Store | Cart

You want to convert tabs in a string to the appropriate number of spaces, or vice versa.

Python, 43 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# of course, in Python, we do have a built-in tab expansion string method:
# method .expandtabs(tablen=8) of string objects!  If we *didn't* have
# it, though, here's how we might make it ourselves...:

# string processing tends to be faster in a split/process/rejoin
# approach than by repeated overall-string transformations, so...:
def expand_with_re(astring, tablen=8):
    import re
    pieces = re.split(r'(\t)', astring)
    lensofar = 0
    for i in range(len(pieces)):
        if pieces[i]=='\t':
            pieces[i] = ' '*(tablen-lensofar%tablen)
        lensofar += len(pieces[i])
    return ''.join(pieces)

# note we used re.split, rather than plain string splitting, because
# re.split with a '(group)' in the re gives us the splitters too,
# which is quite handy here for us to massage the pieces list into
# our desired form for the final ''.join.  However, '\t'.split,
# "interleaving" the blank joiners, looks a bit better still:
def expand(astring, tablen=8):
    result = []
    for piece in astring.split('\t'):
        result.append(piece)
        result.append(' '*(tablen-len(piece)%tablen))
    return ''.join(result[:-1])

# for the 'unexpanding', though, the "joiners" (spaces) are
# really crucial, so let's go back to the re approach (and
# _here_ we don't have a built-in method of strings...!):
def unexpand(astring, tablen=8):
    import re
    pieces = re.split(r'( +)', astring)
    lensofar = 0
    for i in range(len(pieces)):
        thislen = len(pieces[i])
        if pieces[i][0]==' ':
            numblanks = (lensofar+thislen)%8
            numtabs = (thislen-numblanks+7)/8
            pieces[i] = '\t'*numtabs + ' '*numblanks
        lensofar += thislen
    return ''.join(pieces)

Inspired by Recipe 1.7 in O'Reilly's Perl Cookbook. Again we notice substantially the same power, packaged in a cryptic oneliner in Perl but in a few more-readable statements in Python. Python chooses to make tab expansion (a very frequent task) available in the easiest way, but 'unexpansion' (insertion of tabs to stand for spaces), a task that's not all that frequent and should maybe be discouraged (if you need to compress, there are far better ways:-), is omitted. Perl places both expand and unexpand in a standard package. Either approach is defensible, of course.

1 comment

plok 11 years, 7 months ago  # | flag

This comes almost eleven years later, but I would like to point out two problems with unexpand — at least when run with recent versions of Python:

1) It fails when astring starts with spaces (for example, " abc"), raising IndexError on line 38. The reason for this is that in these cases the list returned by re.split starts with an empty string (due to the capturing group in the separator, which matches at the start of the string).

2) I may be overlooking something, but it seems not to work properly in some cases. For example, repr(unexpand('abcefghij' + ' ' * 9)) returns 'abcefghij\t ', which has two spaces instead of the (9 minus eight that were replaced with the tab) one we would expect.

Another attempt at this, a slightly simplified version of recipe 1.15 in the cookbook:

def unexpand(astring, tablen = 8):
    pieces = re.split(r'( +)', astring.expandtabs(tablen))
    for i, piece in enumerate(pieces):
        thislen = len(piece)
        if piece.isspace():
            numtabs = thislen / tablen
            numblanks = thislen % tablen
            pieces[i] = '\t' * numtabs + ' ' * numblanks
    return ''.join(pieces)