Welcome, guest | Sign In | My Account | Store | Cart

This Module contains a function that formats paragraphs of text to have a certain linewidth, optionally stretching lines to that width by filling word gaps with spaces. In other words, it does left-justified/word-wrapped and block formatted paragraphs.

Python, 92 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
#!/usr/bin/env python
"""Contains a function to do word-wrapping on text paragraphs."""

__version__ = '0.1'
__author__ = "Christopher Arndt <chris.arndt@web.de"

import re, random

def justify_line(line, width):
    """Stretch a line to width by filling in spaces at word gaps.

    The gaps are picked randomly one-after-another, before it starts
    over again.

    """
    i = []
    while 1:
        # line not long enough already?
        if len(' '.join(line)) < width:
            if not i:
                # index list is exhausted
                # get list if indices excluding last word
                i = range(max(1, len(line)-1))
                # and shuffle it
                random.shuffle(i)
            # append space to a random word and remove its index
            line[i.pop(0)] += ' '
        else:
            # line has reached specified width or wider
            return ' '.join(line)


def fill_paragraphs(text, width=80, justify=0):
    """Split a text into paragraphs and wrap them to width linelength.

    Optionally justify the paragraphs (i.e. stretch lines to fill width).

    Inter-word space is reduced to one space character and paragraphs are
    always separated by two newlines. Indention is currently also lost.

    """
    # split taxt into paragraphs at occurences of two or more newlines
    paragraphs = re.split(r'\n\n+', text)
    for i in range(len(paragraphs)):
        # split paragraphs into a list of words
        words = paragraphs[i].strip().split()
        line = []; new_par = []
        while 1:
           if words:
               if len(' '.join(line + [words[0]])) > width and line:
                   # the line is already long enough -> add it to paragraph
                   if justify:
                       # stretch line to fill width
                       new_par.append(justify_line(line, width))
                   else:
                       new_par.append(' '.join(line))
                   line = []
               else:
                   # append next word
                   line.append(words.pop(0))
           else:
               # last line in paragraph
               new_par.append(' '.join(line))
               line = []
               break
        # replace paragraph with formatted version
        paragraphs[i] = '\n'.join(new_par)
    # return paragraphs separated by two newlines
    return '\n\n'.join(paragraphs)


def _test(width=78, justify=1):
    """Module test case."""
        
    s ="""
This is some text. This is some text. This is some text. This is some text. This is some text. This is some text. This is some text. This is some text. This is some text. This is some text. This is some text. 
This is some text. This is some text. This is some text. 

This is some text. This is some text. This is some text. 
This is some text. This is some text. This is some text. This is some text. This is some text. This is some text. This is some text. This is some text. 
This is some text. This is some text. 
This is some text.

This is some text. 
This is some text. 
This is some text. This is some text. 
This is some text. This is some text. This is some text. 
"""
    print fill_paragraphs(s, width, justify)

if __name__ == '__main__':
    _test()

You know this function from text editors like Emacs or NEdit. Here's a Python solution that you could use, for example, as a plugin for the Glimmer editor.

This function works on single as well as on several paragraphs. As mentioned in the docstrings, indention is lost but that can be easily added.

Because the words of a paragraph are stored in a list and then joined via the join() method, inter-words space is not retained in original form (but this is what paragraph filling is about anyway).

The algorithm for adding spaces between words uses a random approach, that gives a fairly even distribution and look. This could be extended to handle special cases, like adding space after punctuation first etc.

Execute the module file itself to see an example of the formatting.

Too small and negative linewidths are handled gracefully.