Welcome, guest | Sign In | My Account | Store | Cart

Among dozens of other filetypes, FileOptimizer also compresses PDFs - often significantly. The issue is that the used plugin smpdf is free for non-commercial use only and it annoyingly also overwrites metadata information to state this.

The following tool remedies these metadata changes (but not the license situation!).

Python, 41 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#! python
from __future__ import print_function
import fitz
import sys, os, subprocess, tempfile, time
'''
Optimizes a PDF with FileOptimizer. But as "/Producer" and "/Creator" get
spoiled by this, we first save metadata and restore it after optimization.
This means we also accept non-compressed object definitions (as created by 
FileOptimizer).
'''
assert len(sys.argv) == 2, "need filename parameter"
fn = sys.argv[1]
assert fn.lower().endswith(".pdf"), "must be a PDF file"

fullname = os.path.abspath(fn)         # get the full path & name
t0 = time.clock()                      # save current time
doc = fitz.open(fullname)              # open PDF to save metadata
meta = doc.metadata
doc.close()

t1 = time.clock()                      # save current time again
subprocess.call(["fileoptimizer64", fullname])   # now invoke FileOptimizer
t2 = time.clock()                      # save current time again

cdir = os.path.split(fullname)[0]      # split dir from filename
fnout = tempfile.mkstemp(suffix = ".pdf", dir = cdir) # create temp pdf name 
doc = fitz.open(fullname)              # open now optimized PDF
doc.setMetadata(meta)                  # restore old metadata
doc.save(fnout[1], garbage = 4)        # save temp PDF with it, a little sub opt
doc.close()                            # close it

os.remove(fn)                          # remove super optimized file
os.close(fnout[0])                     # close temp file 
os.rename(fnout[1], fn)                # and rename it to original filename
t3 = time.clock()                      # save current time again

# put out runtime statistics
print("Timings:")
print(str(round(t1-t0, 4)).rjust(10), "save old metata")
print(str(round(t2-t1, 4)).rjust(10), "execute FileOptimizer")
print(str(round(t3-t2, 4)).rjust(10), "restore old metadata")

Runs under all Python versions 2.7 and up.

FileOptimizer is a tool for Windows platforms, but stated it can run on UNIX-like systems with WINE.

It must be installed for this script to run.

2 comments

Harald Lieder (author) 5 years, 1 month ago  # | flag

Foolowing example output shows a 50% filesize reduction:

$ dir sdw_2010_11.pdf

17.07.2016 05:59 19.174.030 sdw_2010_11.pdf 1 Datei(en), 19.174.030 Bytes 0 Verzeichnis(se), 1.587.349.512.192 Bytes frei

$ python pdf-opt.py sdw_2010_11.pdf Timings: 0.0031 save old metata 65.4435 execute FileOptimizer 0.3112 restore old metadata

$ dir sdw_2010_11.pdf

29.10.2016 08:03 9.769.342 sdw_2010_11.pdf 1 Datei(en), 9.769.342 Bytes 0 Verzeichnis(se), 1.587.380.068.352 Bytes frei

$

Harald Lieder (author) 5 years, 1 month ago  # | flag

now with proper newliners:

$ dir sdw_2010_11.pdf
17.07.2016  05:59        19.174.030 sdw_2010_11.pdf

$ python pdf-opt.py sdw_2010_11.pdf

Timings:
 0.0031 save old metata
65.4435 execute FileOptimizer
 0.3112 restore old metadata

$ dir sdw_2010_11.pdf
29.10.2016  08:03         9.769.342 sdw_2010_11.pdf