ActiveState Code

Recipe 497010: Format a text block.


This function formats a block of text. The text is broken into tokens. (Whitespace is NOT preserved.) The tokens are reassembled at the specified level of indentation and line width. A string is returned.

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def format(text, indent=2, width=70):
    """
    Format a text block.
    
    This function formats a block of text. The text is broken into
    tokens. (Whitespace is NOT preserved.) The tokens are reassembled
    at the specified level of indentation and line width.  A string is
    returned.

    Arguments:
        `text`   -- the string to be reformatted.
        `indent` -- the integer number of spaces to indent by.
        `width`  -- the maximum width of formatted text (including indent).
    """
    width = width - indent
    out = []
    stack = [word for word in text.replace("\n", " ").split(" ") if word]
    while stack:
        line = ""
        while stack:
            if len(line) + len(" " + stack[0]) > width: break
            if line: line += " "
            line += stack.pop(0)
        out.append(" "*indent + line)
    return "\n".join(out)

Comments

  1. 1. At 2:02 p.m. on 31 aug 2006, Ori Peleg said:

    Nice, but module 'textwrap' may be a better choice. Module textwrap implements this and more, http://docs.python.org/lib/module-textwrap.html

    import textwrap
    def format(text, indent=2, width=70):
      return "\n".join( textwrap.wrap(text, width=width, initial_indent=" "*indent, subsequent_indent=" "*indent) )
    
  2. 2. At 2:08 p.m. on 31 aug 2006, Ori Peleg said:

    Silly me. testwrap.fill is equivalent to "\n".join( textwrap.join ), so maybe

    import textwrap
    def format(text, indent=2, width=70):
        return textwrap.fill(text, width=width, initial_indent=" "*indent, subsequent_indent=" "*indent)
    
  3. 3. At 2:13 p.m. on 31 aug 2006, Ori Peleg said:

    A 'split' comment. Isn't

    stack = text.split()
    

    equivalent to

    stack = [word for word in text.replace("\n", " ").split(" ") if word]
    

    ?

  4. 4. At 7:22 p.m. on 31 aug 2006, Alexander Ross (the author) said:

    Oops. Yes, it is. Maybe I should read the manual, hm?

  5. 5. At 9:30 p.m. on 31 aug 2006, Simon Forman said:

    Yes, and... ...the line "if len(line) + len(" " + stack[0]) > width:" is a little wonky. By adding a " " to stack[0] you're allocating a whole new string and then throwing it away just to take the length. You should have said something like: "if len(line) + 1 + len(stack[0]) > width:" Or better yet, done something like this:

    def format(text, indent=2, width=70):
        width = width - indent
        out = []
    
        # Make a generator to yield words and lengths.
        # Add 1 to the lengths to account for spaces.
        gen = ((len(word) + 1, word) for word in text.split())
    
        line = []
        line_len = -1 # adjust for space for 1st word.
    
        for wlength, word in gen:
    
            # Add word length (plus space length) to line length.
            line_len += wlength
    
            # Check if we've filled a line.
            if  line_len > width:
    
                # Build and append one line.
                out.append(" " * indent + " ".join(line))
    
                # Set line and length to word and length.
                line = [word]
                line_len = wlength - 1
    
            # If not, keep adding words to the line list.
            else:
                line.append(word)
    
        return "\n".join(out)
    

    This avoids all the expensive string operations you used to build your lines, and the repeated calls to len(line) in your inner while loop.

    Of course, you really should have just used the textwrap module, as Ori says.

Sign in to comment