Welcome, guest | Sign In | My Account | Store | Cart

Count PDF pages (Python recipe) by Dirk Holtwick
ActiveState Code (http://code.activestate.com/recipes/496837/)

A simple way to count the pages of a PDF the pure Python way.

      import re

rxcountpages = re.compile(r"$\s*/Type\s*/Page[/\s]", re.MULTILINE|re.DOTALL)

def countPages(filename):
    data = file(filename,"rb").read()
    return len(rxcountpages.findall(data))

if __name__=="__main__":
    print "Number of pages in PDF File:", countPages("test.pdf")

Very straight forward approach. To do more counting and manipulation with PDF have a look at pyPDF http://pybrary.net/pyPdf

Tags: files

4 comments

Gerald Lester 14 years, 9 months ago # | flag

The link given in the discussion does not work if clicked on -- there is an extra %29. in the url part of the href.

Also, the code is always returning 0 for a document that Preview under Mac OS X says (and shows) as having 17 pages.

PM 12 years, 6 months ago # | flag

The regex doesn't work for a generic PDF version.

This changed regex seems to work with PDF v. 1.3,1.4,1.5,1.6 rxcountpages = re.compile(r"/Type\s*/Page([^s]|$)", re.MULTILINE|re.DOTALL)

Josh Morris 10 years, 9 months ago # | flag

PM's code works great for PDF v. 1.3,1.4,1.5,1.6. It does not seem to work for PDF v. 1.7.

Anyone know how to implement that?

Jean Kosossey 8 years, 4 months ago # | flag

The following badly written Python code worked well for me (Python 3.5)

import sys
import os

f=open (infile, mode='br')
a=0
while a==0:
    x=f.readline()
    if x == b'<</Type/Pages\n':
        x=f.readline()
        x=x.replace (b'/Count ', b'')
        x=x.replace (b'\n', b'')
        a=1
        print(int(x))

Created by Dirk Holtwick on Tue, 27 Jun 2006 (MIT)

◄	Python recipes (4591)	►
◄	Dirk Holtwick's recipes (15)	►

Required Modules

Other Information and Tasks

Licensed under the MIT License
Viewed 28689 times
Revision 1

Accounts

Code Recipes

Feedback & Information

ActiveState

© 2024 ActiveState Software Inc. All rights reserved. ActiveState®, Komodo®, ActiveState Perl Dev Kit®, ActiveState Tcl Dev Kit®, ActivePerl®, ActivePython®, and ActiveTcl® are registered trademarks of ActiveState. All other marks are property of their respective owners.

Count PDF pages (Python recipe) by Dirk Holtwick ActiveState Code (http://code.activestate.com/recipes/496837/)