Version 1.11.0 (based on MuPDF v1.11) allows exporting, importing and interrogating files embedded in a PDF.
PDF "/EmbeddedFiles" are similar to ZIP archives (or the Microsoft OLE technique), allowing arbitrary data to be incorporated in a PDF and benefit from its unique features.
1 2 3 4 5 6 7 8 9 10 11 12 | import fitz # = PyMuPDF
doc = fitz.open("test.pdf") # open the PDF
count = doc.embeddedFileCount
print("number of embedded file:", count) # shows number of embedded files
# get decompressed content of data stored by name "my data"
# also possible to use integer between 0 and "count - 1"
buff = doc.embeddedFileGet("my data")
fout = open("test.file", "wb") # open output file
fout.write(buff)
fout.close()
|
Deletion, reporting, importing, copying between PDFs, etc. is just as simple.
See here for more examples and lightweight utilities:
https://github.com/rk700/PyMuPDF/tree/master/examples
Any Python bitness and Python 3 is fully supported and tested up to and including 3.6. Platforms include at least Windows, Mac and Linux. Ohter platforms should work that are supported by Python and MuPDF.
Download
Copy to clipboard