A new method select() in PyMuPDF 1.9.0 allows selecting pages of a PDF document to create a new one. Any Python list of integers (0 <= n < page count) can be taken.
The resulting PDF contains all links, annotations and bookmarks (provided they still point to valid targets).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | import fitz # this is PyMuPDF 1.9.0
doc = fitz.open("some.pdf")
# An easy start: create new PDFs of the first and last 10 pages ...
l = list(range(10)) # first 10 pages
doc.select(l) # delete all others
doc.save("some-first-10.pdf", garbage=3)# save and clean new PDF
doc.close()
doc = fitz.open("some.pdf") # recycle PDF
l = list(range(doc.pageCount-10, doc.pageCount)) # last 10 pages
doc.select(l) # delete all others
doc.save("some-last-10.pdf", garbage=3) # save and clean new PDF
doc.close()
# page numbers may occur multiple times and in any order ...
doc = fitz.open("some.pdf") # recycle PDF
doc.select([1,1,1,3,3,3,5,5,5,0,0,0]) # create crazily tripled pages
doc.save("some-crazy-triples.pdf", garbage=3) # save that & clean new PDF
doc.close()
# new PDF containing the original 2 times
doc = fitz.open("some.pdf") # recycle PDF
l = list(range(doc.pageCount)) # list of all pages
l += l # two times that [0,...,n,0,...,n]
doc.select(l) # PDF will now contain itself twice ...
doc.save("some-times-2.pdf") # will hardly be bigger than original!
doc.close()
# delete pages without text (or whatever ...)
doc = fitz.open("some.pdf") # recycle PDF
l = list(range(doc.pageCount)) # list of all pages
for i in l:
if not doc.getPageText(i) # if no text on page number i ...
l.remove(i) # delete that page from list
doc.select(l) # select remaining pages from the PDF
doc.save("some-non-empty.pdf", garbage=3) # save PDF, every page has some text now ...
doc.close()
|
PyMuPDF actually supports Python versions 2.7 to 3.5 (x86 and x64).
other possibilities of this technique include selection of only the odd (even) pages or reverting the page sequence
Download
Copy to clipboard