Welcome, guest | Sign In | My Account | Store | Cart

textwrap is a very handy module. The problem with it, though, is that it expects to be used with individual paragraphs. But what if you want to wrap an entire document? It will still wrap the lines, but it will improperly consider it all a single paragraph.

This recipe alleviates that issue by overriding textwrap.TextWrapper.wrap with an implementation that handles spiltting a document into paragraphs and processing each individually. This allows things such as initial_indent to work as expected.

Python, 67 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
"""Wrap textwrap.TextWrapper to properly handle multiple paragraphs"""

import textwrap
import re

class DocWrapper(textwrap.TextWrapper):
    """Wrap text in a document, processing each paragraph individually"""

    def wrap(self, text):
        """Override textwrap.TextWrapper to process 'text' properly when
        multiple paragraphs present"""
        para_edge = re.compile(r"(\n\s*\n)", re.MULTILINE)
        paragraphs = para_edge.split(text)
        wrapped_lines = []
        for para in paragraphs:
            if para.isspace():
                if not self.replace_whitespace:
                    # Do not take the leading and trailing newlines since
                    # joining the list with newlines (as self.fill will do)
                    # will put them back in.
                    if self.expand_tabs:
                        para = para.expandtabs()
                    wrapped_lines.append(para[1:-1])
                else:
                    # self.fill will end up putting in the needed newline to
                    # space out the paragraphs
                    wrapped_lines.append('')
            else:
                wrapped_lines.extend(textwrap.TextWrapper.wrap(self, para))
        return wrapped_lines



if __name__ == '__main__':
    import optparse
    import sys

    default_wrapper = DocWrapper()

    parser = optparse.OptionParser()
    parser.add_option('-w', '--width', dest="width",
            default=default_wrapper.width, type='int',
            help="maximum length of wrapped lines")
    parser.add_option('-t', '--tabs', dest="expand_tabs",
            default=False, action="store_true",
            help="Expand tabs")
    parser.add_option('-s', '--whitespace', dest='replace_whitespace',
            default=False, action="store_true",
            help="Replace whitespace")

    options, args = parser.parse_args()

    if len(args) > 1:
        print "You may only specify a single file"
        sys.exit(1)

    if not args:
        text = sys.stdin.read()
    else:
        try:
            FILE = open(args[0], 'rU')
            text = FILE.read()
        finally:
            FILE.close()
    wrapper = DocWrapper(width=options.width, expand_tabs=options.expand_tabs,
            replace_whitespace=options.replace_whitespace)
    print wrapper.fill(text)

While finding where a paragraph begins and ends is simple enough, the real trick is handling the separation properly. Conceivably you might want to preserve the space (such as in a verbatim section of a TeX document) and thus that option should be available.

Possible improvements to this recipe include making the regex used for finding the separators between paragraphs an attribute of the class. More command-line arguments could also be added.