This Module contains a function that formats paragraphs of text to have a certain linewidth, optionally stretching lines to that width by filling word gaps with spaces. In other words, it does left-justified/word-wrapped and block formatted paragraphs.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 | #!/usr/bin/env python
"""Contains a function to do word-wrapping on text paragraphs."""
__version__ = '0.1'
__author__ = "Christopher Arndt <chris.arndt@web.de"
import re, random
def justify_line(line, width):
"""Stretch a line to width by filling in spaces at word gaps.
The gaps are picked randomly one-after-another, before it starts
over again.
"""
i = []
while 1:
# line not long enough already?
if len(' '.join(line)) < width:
if not i:
# index list is exhausted
# get list if indices excluding last word
i = range(max(1, len(line)-1))
# and shuffle it
random.shuffle(i)
# append space to a random word and remove its index
line[i.pop(0)] += ' '
else:
# line has reached specified width or wider
return ' '.join(line)
def fill_paragraphs(text, width=80, justify=0):
"""Split a text into paragraphs and wrap them to width linelength.
Optionally justify the paragraphs (i.e. stretch lines to fill width).
Inter-word space is reduced to one space character and paragraphs are
always separated by two newlines. Indention is currently also lost.
"""
# split taxt into paragraphs at occurences of two or more newlines
paragraphs = re.split(r'\n\n+', text)
for i in range(len(paragraphs)):
# split paragraphs into a list of words
words = paragraphs[i].strip().split()
line = []; new_par = []
while 1:
if words:
if len(' '.join(line + [words[0]])) > width and line:
# the line is already long enough -> add it to paragraph
if justify:
# stretch line to fill width
new_par.append(justify_line(line, width))
else:
new_par.append(' '.join(line))
line = []
else:
# append next word
line.append(words.pop(0))
else:
# last line in paragraph
new_par.append(' '.join(line))
line = []
break
# replace paragraph with formatted version
paragraphs[i] = '\n'.join(new_par)
# return paragraphs separated by two newlines
return '\n\n'.join(paragraphs)
def _test(width=78, justify=1):
"""Module test case."""
s ="""
This is some text. This is some text. This is some text. This is some text. This is some text. This is some text. This is some text. This is some text. This is some text. This is some text. This is some text.
This is some text. This is some text. This is some text.
This is some text. This is some text. This is some text.
This is some text. This is some text. This is some text. This is some text. This is some text. This is some text. This is some text. This is some text.
This is some text. This is some text.
This is some text.
This is some text.
This is some text.
This is some text. This is some text.
This is some text. This is some text. This is some text.
"""
print fill_paragraphs(s, width, justify)
if __name__ == '__main__':
_test()
|
You know this function from text editors like Emacs or NEdit. Here's a Python solution that you could use, for example, as a plugin for the Glimmer editor.
This function works on single as well as on several paragraphs. As mentioned in the docstrings, indention is lost but that can be easily added.
Because the words of a paragraph are stored in a list and then joined via the join() method, inter-words space is not retained in original form (but this is what paragraph filling is about anyway).
The algorithm for adding spaces between words uses a random approach, that gives a fairly even distribution and look. This could be extended to handle special cases, like adding space after punctuation first etc.
Execute the module file itself to see an example of the formatting.
Too small and negative linewidths are handled gracefully.