It's more a code snippet. Shows how to dynamically check whether a PDF is password protected. If it is, decrypt it and save it back to disk un-encrypted.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | #!/usr/bin/python
# this demo will open an encrypted PDF document
# decrypt it with the provided password
# and save as a new PDF document
# usage: removePass.py <input file> <password> <output file>
import fitz # this is PyMuPDF
import sys
from __future__ import print_function
if len(sys.argv) != 4:
print('Usage: %s <input file> <password> <output file>' % sys.argv[0])
exit(0)
doc = fitz.Document(sys.argv[1])
# the document should be password protected
assert doc.needsPass
# decrypt the document
# return non-zero if failed
if not doc.authenticate(sys.argv[2]):
print('cannot decrypt %s with password %s' % (sys.argv[1], sys.argv[2]))
exit(1)
# save as a new, non-encrypted PDF
doc.save(sys.argv[3])
# Note that the save() method automatically also repairs the PDF in case of many types of corruption.
# Additional options may be used for garbadge collection, compression, etc.
|
Pure Python PDF modules like pdfrw or PyPDF2 often do not cover al types of PDF files, struggle with certain decryption and / or compression types, etc.
While it is certainly possible to "pre-process" such files with separate standalone programs like pdftk, the above solution is Python-based and can thus be dynamically invoked as necessary only.
Besides: MuPDF is very (!!) fast - faster than all I have seen so far (including XPDF, pdftk, Poppler).