A simple way to count the pages of a PDF the pure Python way.
1 2 3 4 5 6 7 8 9 10 | import re
rxcountpages = re.compile(r"$\s*/Type\s*/Page[/\s]", re.MULTILINE|re.DOTALL)
def countPages(filename):
data = file(filename,"rb").read()
return len(rxcountpages.findall(data))
if __name__=="__main__":
print "Number of pages in PDF File:", countPages("test.pdf")
|
Very straight forward approach. To do more counting and manipulation with PDF have a look at pyPDF http://pybrary.net/pyPdf
Tags: files
The link given in the discussion does not work if clicked on -- there is an extra %29. in the url part of the href.
Also, the code is always returning 0 for a document that Preview under Mac OS X says (and shows) as having 17 pages.
The regex doesn't work for a generic PDF version.
This changed regex seems to work with PDF v. 1.3,1.4,1.5,1.6 rxcountpages = re.compile(r"/Type\s*/Page([^s]|$)", re.MULTILINE|re.DOTALL)
PM's code works great for PDF v. 1.3,1.4,1.5,1.6. It does not seem to work for PDF v. 1.7.
Anyone know how to implement that?
The following badly written Python code worked well for me (Python 3.5)