Welcome, guest | Sign In | My Account | Store | Cart

Given that PDF is a "native" data format on Mac OS X, it is very easy to get access to some properties of such documents. One is the number of pages. Using Python the necessary code to do this is only about four lines, plus some import and command-line plumbing, etc.

Python, 26 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#!/usr/bin/env python

"""pdfpagecount.py - print number of pages in a PDF file.

This needs PyObjC 1.0 or higher. A more extended version might be
available one day here: http://python.net/~gherman
"""

__author__ = "Dinu C. Gherman"

import sys
from Foundation import NSData
from AppKit import NSPDFImageRep


def pageCount(pdfPath):
    "Return the number of pages for some PDF file."

    data = NSData.dataWithContentsOfFile_(pdfPath)
    img = NSPDFImageRep.imageRepWithData_(data)
    return img.pageCount()


if __name__ == '__main__':
    if len(sys.argv) == 2:
        print pageCount(sys.argv[1])

This is probably the shortest possible code for programmatically counting pages of PDF files on the Mac OS X platform. Apart from that, this is also a nice demo for using PyObjC which gives you the power of reusing all of the Foundation and AppKit classes from Python.

1 comment

dan wolfe 17 years, 6 months ago  # | flag

Two lines less, PyObj not required. Works with 10.3 or later... :-).

#!/usr/bin python

import sys
from CoreGraphics import *

def pageCount(pdfPath):
    "Return the number of pages for some PDF file."

    pdf = CGPDFDocumentCreateWithProvider (CGDataProviderCreateWithFilename (pdfPath))
    return pdf.getNumberOfPages()

if __name__ == '__main__':
    if len(sys.argv) == 2:
        print pageCount(sys.argv[1])