Welcome, guest | Sign In | My Account | Store | Cart

This recipe shows how to create a Unix pipeline that generates PDF output, under the control of a Python program. It is tested on Linux. It uses nl, a standard Linux command that adds line numbers to its input, and selpg, a custom Linux command-line utility, that selects only specified pages from its input, together in a pipeline (nl | selpg). The Python program sets up and starts that pipeline running, and then reads input from it and generates PDF output.

Python, 41 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# PopenToPDF.py
# Demo program to read text from a shell pipeline using 
# subprocess.Popen, and write the text to PDF using xtopdf.
# Author: Vasudev Ram
# Copyright (C) 2016 Vasudev Ram - http://jugad2.blogspot.com

import sys
import subprocess
from PDFWriter import PDFWriter

def error_exit(message):
    sys.stderr.write(message + '\n')
    sys.stderr.write("Terminating.\n")
    sys.exit(1)

def main():
    try:
        # Create and set up a PDFWriter instance.
        pw = PDFWriter("PopenTo.pdf")
        pw.setFont("Courier", 12)
        pw.setHeader("Use subprocess.Popen to read pipe and write to PDF.")
        pw.setFooter("Done using selpg, xtopdf, Python and ReportLab, on Linux.")

        # Set up a pipeline with nl and selpg such that we can read from its stdout.
        # nl numbers the lines of the input.
        # selpg extracts pages 3 to 5 from the input.
        pipe = subprocess.Popen("nl -ba 1000-lines.txt | selpg -s3 -e5", \
            shell=True, bufsize=-1, stdout=subprocess.PIPE, 
            stderr=sys.stderr).stdout

        # Read from the pipeline and write the data to PDF, using the PDFWriter instance.
        for idx, line in enumerate(pipe):
            pw.writeLine(str(idx).zfill(8) + ": " + line)
    except IOError as ioe:
        error_exit("Caught IOError: {}".format(str(ioe)))
    except Exception as e:
        error_exit("Caught Exception: {}".format(str(e)))
    finally:
        pw.close()

main()

The PopenToPDF program uses Python's subprocess module to set up and start the pipeline.

After that it reads from the standard output of the pipeline (nl piped to selpg), which becomes the standard input of the Python program.

It then adds a second set of line number prefixes and writes the output to PDF.

The following links to posts have more details, sample output, and information on where/how to get/build the code for selpg, the C command-line utility used in the pipeline.

More details, sample output, links to selpg, etc.:

http://jugad2.blogspot.in/2016/01/generate-pdf-from-python-controlled.html

The code for the main PopenToPDF.py program and also for a one-off script gen-file.py that automates generation of the input file used by the pipeline:

http://jugad2.blogspot.in/2016/01/code-for-recent-post-about-pdf-from.html

There are links to helpful references about Unix pipelines (to understand the concept) in the posts.