Welcome, guest | Sign In | My Account | Store | Cart

It's more a code snippet. Shows how to dynamically check whether a PDF is password protected. If it is, decrypt it and save it back to disk un-encrypted.

Python, 30 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
#!/usr/bin/python
# this demo will open an encrypted PDF document
# decrypt it with the provided password
# and save as a new PDF document
# usage: removePass.py <input file> <password> <output file>

import fitz              # this is PyMuPDF
import sys
from __future__ import print_function

if len(sys.argv) != 4:
    print('Usage: %s <input file> <password> <output file>' % sys.argv[0])
    exit(0)

doc = fitz.Document(sys.argv[1])
# the document should be password protected
assert doc.needsPass

# decrypt the document
# return non-zero if failed

if not doc.authenticate(sys.argv[2]):
    print('cannot decrypt %s with password %s' % (sys.argv[1], sys.argv[2]))
    exit(1)

# save as a new, non-encrypted PDF
doc.save(sys.argv[3])

# Note that the save() method automatically also repairs the PDF in case of many types of corruption.
# Additional options may be used for garbadge collection, compression, etc.

Pure Python PDF modules like pdfrw or PyPDF2 often do not cover al types of PDF files, struggle with certain decryption and / or compression types, etc.

While it is certainly possible to "pre-process" such files with separate standalone programs like pdftk, the above solution is Python-based and can thus be dynamically invoked as necessary only.

Besides: MuPDF is very (!!) fast - faster than all I have seen so far (including XPDF, pdftk, Poppler).