Welcome, guest | Sign In | My Account | Store | Cart

A simple way to count the pages of a PDF the pure Python way.

Python, 10 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
import re

rxcountpages = re.compile(r"$\s*/Type\s*/Page[/\s]", re.MULTILINE|re.DOTALL)

def countPages(filename):
    data = file(filename,"rb").read()
    return len(rxcountpages.findall(data))

if __name__=="__main__":
    print "Number of pages in PDF File:", countPages("test.pdf")

Very straight forward approach. To do more counting and manipulation with PDF have a look at pyPDF http://pybrary.net/pyPdf

4 comments

Gerald Lester 14 years, 9 months ago  # | flag

The link given in the discussion does not work if clicked on -- there is an extra %29. in the url part of the href.

Also, the code is always returning 0 for a document that Preview under Mac OS X says (and shows) as having 17 pages.

PM 12 years, 6 months ago  # | flag

The regex doesn't work for a generic PDF version.

This changed regex seems to work with PDF v. 1.3,1.4,1.5,1.6 rxcountpages = re.compile(r"/Type\s*/Page([^s]|$)", re.MULTILINE|re.DOTALL)

Josh Morris 10 years, 9 months ago  # | flag

PM's code works great for PDF v. 1.3,1.4,1.5,1.6. It does not seem to work for PDF v. 1.7.

Anyone know how to implement that?

Jean Kosossey 8 years, 4 months ago  # | flag

The following badly written Python code worked well for me (Python 3.5)

import sys
import os

f=open (infile, mode='br')
a=0
while a==0:
    x=f.readline()
    if x == b'<</Type/Pages\n':
        x=f.readline()
        x=x.replace (b'/Count ', b'')
        x=x.replace (b'\n', b'')
        a=1
        print(int(x))