Welcome, guest | Sign In | My Account | Store | Cart

Create or modify the bookmarks list of a PDF (ToC - table of contents). Supports arbitrary hierarchy levels. Display PDF while editing to control bookmark targets. PDF meta data are maintainable. Save PDF under the same or a different name.

Python, 851 lines
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
#!/usr/bin/python
# -*- coding: utf-8 -*-

'''
Created on Sun May 03 16:15:08 2015

@author: Jorj McKie
Copyright (c) 2015 Jorj X. McKie

The license of this program is governed by the GNU GENERAL PUBLIC LICENSE
Version 3, 29 June 2007. See the "COPYING" file of this repository.

Example program for the Python binding PyMuPDF of MuPDF.

Changes in version 1.9.1
-------------------------
- removed depedency on PyPDF2 by using PyMuPDF's new methods setMetadata() and 
  setToC().

- using incremental saves if output file equals input.

Dependencies:
--------------
PyMuPDF 1.9.1 or later
wxPython 3.0 or later

This is a program for editing a PDF file's table of contents (ToC).
After choosing a file in a file selection dialog, its ToC is displayed
in a grid, together with an image of the currently displayed PDF page.
Entries in the grid can be edited, added, duplicated, deleted and moved.
Permanent changes to the underlying file are made only when the SAVE button is
pressed.

The overall screen layout is as follows:

        +--------------------+--------------------+
        |                    |                    |
        |      le_szr        |       ri_szr       |
        |                    |                    |
        +--------------------+--------------------+

Layout of left sizer "le_szr"

        +-----------------------------------------+
        | szr10: Button "New Row", expl. text     |
        +-----------------------------------------+
        | szr20: MyGrid (table of contents)       |
        +-----------------------------------------+
        | szr30: PDF metadata                     |
        +-----------------------------------------+
        | szr31: check data fields                |
        +-----------------------------------------+
        | szr40: OK / Cancel buttons              |
        +-----------------------------------------+

Layout of right sizer "ri_szr"

        +-----------------------------------------+
        | re_szr20: forw / backw / pages          |
        +-----------------------------------------+
        | PDFBild: Bitmap image of pdf page       |
        +-----------------------------------------+

'''
import os, sys
import wx
import wx.grid as gridlib
import wx.lib.gridmovers as gridmovers
import fitz                  # = PyMuPDF
ENCODING = "latin-1"         # used for item title only

def getint(v):
    import types
    # extract digits from a string to form an integer
    try:
        return int(v)
    except ValueError:
        pass
    if not isinstance(v, types.StringTypes):
        return 0
    a = "0"
    for d in v:
        if d in "0123456789":
            a += d
    return int(a)

#==============================================================================
# define scale factor for displaying page images (20% larger)
#==============================================================================
scaling = fitz.Matrix(1.2, 1.2)
#==============================================================================
# just abbreviations
#==============================================================================
defPos = wx.DefaultPosition
defSiz = wx.DefaultSize
khaki  = wx.Colour(240, 230, 140)

#==============================================================================
# convenience class for storing information across functions
#==============================================================================
class PDFconfig():
    def __init__(self):
        self.doc = None                  # fitz.Document
        self.meta = {}                   # PDF meta information
        self.seiten = 0                  # max pages
        self.inhalt = []                 # table of contents storage
        self.file = None                 # pdf filename
        self.oldpage = 0                 # stores displayed page number

#==============================================================================
# render a PDF page and return wx.Bitmap
#==============================================================================
def pdf_show(seite):
    page_idx = getint(seite) - 1
    pix = PDFcfg.doc.getPagePixmap(page_idx, matrix = scaling)
    # the following method returns just RGB data - no alpha bytes
    # this seems to be required in Windows versions of wx.
    # on other platforms try instead:
    #bmp = wx.BitmapfromBufferRGBA(pix.w, pix.h, pix.samples)
    a = pix.samplesRGB()                  # samples without alpha bytes
    bmp = wx.BitmapFromBuffer(pix.w, pix.h, a)
    pix = None
    a   = None
    return bmp

#==============================================================================
# PDFTable = a tabular grid class in wx
#==============================================================================
class PDFTable(gridlib.PyGridTableBase):
    def __init__(self):
        gridlib.PyGridTableBase.__init__(self)

        self.colLabels = ['Level','Title','Page']
        self.dataTypes = [gridlib.GRID_VALUE_NUMBER,
                          gridlib.GRID_VALUE_STRING,
                          gridlib.GRID_VALUE_NUMBER,
                          ]
        # initial load of table with outline data
        # each line consists of [lvl, title, page]
        # for display, we "indent" the title with spaces
        self.data = [[PDFcfg.inhalt[i][0],          # indentation level
                      " "*(PDFcfg.inhalt[i][0] -1) + \
                      PDFcfg.inhalt[i][1].decode("utf-8","ignore"),  # title
                      PDFcfg.inhalt[i][2]] \
                              for i in range(len(PDFcfg.inhalt))]
        if not PDFcfg.inhalt:
            self.data = [[0, "*** no outline ***", 0]]
        # used for correctly placing new lines. insert at end = -1
        self.cur_row = -1

#==============================================================================
# Methods required by wxPyGridTableBase interface.
# Will be called by the grid.
#==============================================================================
    def GetNumberRows(self):           # row count in my data table
        return len(self.data)

    def GetNumberCols(self):           # column count in my data table
        return len(self.colLabels)

    def IsEmptyCell(self, row, col):   # is-cell-empty checker
        try:
            return not self.data[row][col]
        except IndexError:
            return True

    def GetValue(self, row, col):      # get value (to be put into a cell)
        if col == 1:                   # simulate indentation if title column
            lvl = int(self.data[row][0]) - 1
            value = "  " * lvl + self.data[row][1].strip()
        else:
            value = self.data[row][col]
        return value

    def SetValue(self, row, col, value):    # put value from cell to data table
        if col == 1:
            x_val = value.strip()           # strip off simulated indentations
        else:
            x_val = value
        self.data[row][col] = x_val

#==============================================================================
# set col names
#==============================================================================
    def GetColLabelValue(self, col):
        return self.colLabels[col]

#==============================================================================
# set row names (just row counters in our case). Only needed, because we have
# row-based operations (dragging, duplicating), and these require some label.
#==============================================================================
    def GetRowLabelValue(self,row):
        return str(row +1)

#==============================================================================
# determine cell content type, controls the grid behaviour for the cells
#==============================================================================
    def GetTypeName(self, row, col):
        return self.dataTypes[col]

#==============================================================================
# move a row, called when user drags rows with the mouse.
# called with row numbers from -> to
#==============================================================================
    def MoveRow(self, frm, to):
        grid = self.GetView()

        if grid and frm != to:                  # actually moving something?
            self.cur_row = to
            # Move the data rows
            oldData = self.data[frm]            # list of row values
            del self.data[frm]                  # delete it from the data
            # determine place for the moving row, and insert it
            if to > frm:
                self.data.insert(to-1,oldData)
            else:
                self.data.insert(to,oldData)
#==============================================================================
#           inform the Grid about this by special "message batches"
#==============================================================================
            grid.BeginBatch()
            msg = gridlib.GridTableMessage(
                    self, gridlib.GRIDTABLE_NOTIFY_ROWS_DELETED, frm, 1)
            grid.ProcessTableMessage(msg)
            msg = gridlib.GridTableMessage(
                    self, gridlib.GRIDTABLE_NOTIFY_ROWS_INSERTED, to, 1)
            grid.ProcessTableMessage(msg)
            grid.EndBatch()

#==============================================================================
# insert a new row, called with the new cell value list (zeile).
# we use self.cur_row to determine where to put it.
#==============================================================================
    def NewRow(self, zeile):
        grid = self.GetView()
        if grid:
            if self.cur_row in range(len(self.data)): # insert in the middle?
                self.data.insert(self.cur_row, zeile)
                grid.BeginBatch()                     # inform the grid
                msg = gridlib.GridTableMessage(self,
                       gridlib.GRIDTABLE_NOTIFY_ROWS_INSERTED, self.cur_row, 1)
                grid.ProcessTableMessage(msg)
                grid.EndBatch()
            else:                                     # insert at end (append)
                self.data.append(zeile)
                grid.BeginBatch()                     # inform grid
                msg = gridlib.GridTableMessage(self,
                       gridlib.GRIDTABLE_NOTIFY_ROWS_APPENDED, 1)
                grid.ProcessTableMessage(msg)
                grid.EndBatch()

#==============================================================================
# Duplicate a row, called with row number
#==============================================================================
    def DuplicateRow(self, row):
        grid = self.GetView()
        if grid:
            zeile = [self.data[row][0], self.data[row][1],
                     self.data[row][2]]
            self.data.insert(row, zeile)
            grid.BeginBatch()
            msg = gridlib.GridTableMessage(self,
                        gridlib.GRIDTABLE_NOTIFY_ROWS_INSERTED, row, 1)
            grid.ProcessTableMessage(msg)
            grid.EndBatch()
            self.cur_row = row

#==============================================================================
# Delete a row. called with row number.
#==============================================================================
    def DeleteRow(self, row):
        grid = self.GetView()
        if grid:
            del self.data[row]
            grid.BeginBatch()                         # inform the grid
            msg = gridlib.GridTableMessage(self,
                   gridlib.GRIDTABLE_NOTIFY_ROWS_DELETED, row, 1)
            grid.ProcessTableMessage(msg)
            grid.EndBatch()
        if self.cur_row not in range(len(self.data)): # update indicator
            self.cur_row = -1

#==============================================================================
# define Grid
#==============================================================================
class MyGrid(gridlib.Grid):
    def __init__(self, parent):
        gridlib.Grid.__init__(self, parent, -1)

        table = PDFTable()             # initialize table

#==============================================================================
# announce table to Grid
# 'True' = enable Grid to manage the table (destroy, etc.)
#==============================================================================
        self.SetTable(table, True)
#==============================================================================
# set font, width, alignment in the grid
#==============================================================================
        self.SetDefaultCellFont(wx.Font(wx.NORMAL_FONT.GetPointSize(),
                 70, 90, 90, False, "DejaVu Sans Mono"))

        # center columns (indent level, delete check box)
        ct_al1 = gridlib.GridCellAttr()
        ct_al1.SetAlignment(wx.ALIGN_CENTER, wx.ALIGN_CENTER)
        self.SetColAttr(0, ct_al1)
        self.SetColAttr(3, ct_al1)
        # page number right aligned
        re_al1 = gridlib.GridCellAttr()
        re_al1.SetAlignment(wx.ALIGN_RIGHT, wx.ALIGN_CENTER)
        self.SetColAttr(2, re_al1)

#==============================================================================
# Enable Row moving
#==============================================================================
        gridmovers.GridRowMover(self)

#==============================================================================
# Bind: move row
#==============================================================================
        self.Bind(gridmovers.EVT_GRID_ROW_MOVE, self.OnRowMove, self)

#==============================================================================
# Bind: duplicate a row
#==============================================================================
        self.Bind(gridlib.EVT_GRID_LABEL_LEFT_DCLICK, self.OnRowDup, self)

#==============================================================================
# Bind: delete a row
#==============================================================================
        self.Bind(gridlib.EVT_GRID_LABEL_RIGHT_DCLICK, self.OnRowDel, self)

#==============================================================================
# Bind: (double) click a cell
#==============================================================================
        self.Bind(gridlib.EVT_GRID_CELL_LEFT_CLICK, self.OnCellClick, self)
        self.Bind(gridlib.EVT_GRID_CELL_LEFT_DCLICK, self.OnCellDClick, self)

#==============================================================================
# Bind: cell is changing
#==============================================================================
        self.Bind(gridlib.EVT_GRID_CELL_CHANGING, self.OnCellChanging, self)

#==============================================================================
# Event Method: cell is changing
#==============================================================================
    def OnCellChanging(self, evt):
        if evt.GetCol() == 2:          # page number is changing
            value = evt.GetString()    # new cell value
            PicRefresh(value)          # we show corresponding image
        self.AutoSizeColumn(1)         # as always: title width adjust
        DisableOK()                    # check data before save is possible

#==============================================================================
# Event Method: cell click
#==============================================================================
    def OnCellClick(self, evt):
        row = evt.GetRow()             # row
        col = evt.GetCol()             # col
        table = self.GetTable()
        grid = table.GetView()
        grid.GoToCell(row, col)        # force "select" for the cell
        self.cur_row = row             # memorize current row
        self.AutoSizeColumn(1)         # adjust title col width to content

#==============================================================================
# Event Method: cell double click
#==============================================================================
    def OnCellDClick(self, evt):
        row = evt.GetRow()             # row
        col = evt.GetCol()             # col
        table = self.GetTable()
        if col == 1 or col == 2:       # refresh picture if title or page col
            seite = table.GetValue(row, 2)
            PicRefresh(seite)
        grid = table.GetView()
        grid.GoToCell(row, col)        # force "select" of that cell
        self.cur_row = row             # memorize current row
        self.AutoSizeColumn(1)

#==============================================================================
# Event Method: move row
#==============================================================================
    def OnRowMove(self,evt):
        frm = evt.GetMoveRow()         # row being moved
        to = evt.GetBeforeRow()        # before which row to insert
        self.GetTable().MoveRow(frm,to)
        DisableOK()

#==============================================================================
# Event Method: delete row
#==============================================================================
    def OnRowDel(self, evt):
        row = evt.GetRow()
        self.GetTable().DeleteRow(row)
        DisableOK()

#==============================================================================
# Event Method: delete row
#==============================================================================
    def OnRowDup(self, evt):
        row = evt.GetRow()
        col = evt.GetCol()
        if col < 0 and row >= 0:       # else this is not a row duplication!
            self.GetTable().DuplicateRow(row)    # duplicate the row and ...
            self.GetParent().Layout()  # possibly enlarge the grid
        DisableOK()

#==============================================================================
#
# define dialog
#
#==============================================================================
class PDFDialog (wx.Dialog):
    def __init__(self, parent):
        wx.Dialog.__init__ (self, parent, id = wx.ID_ANY,
                             title = "Maintain the Table of Contents",
                             pos = defPos, size = defSiz,
                             style = wx.CAPTION|wx.CLOSE_BOX|
                                     wx.DEFAULT_DIALOG_STYLE|
                                     wx.MAXIMIZE_BOX|wx.MINIMIZE_BOX|
                                     wx.RESIZE_BORDER)

        self.SetBackgroundColour(khaki)
        # maximize the screen
        #self.Maximize()
        # alternatively, try more scrutiny:
        width = wx.GetDisplaySize()[0]-500        # define maximum width
        height = wx.GetDisplaySize()[1]-35       # define maximum height
        self.SetSize(wx.Size(width, height))
#==============================================================================
# Sizer 10: Button 'new row' and an explaining text
#==============================================================================
        self.szr10 = wx.BoxSizer(wx.HORIZONTAL)

        self.btn_neu = wx.Button(self, wx.ID_ANY, "New Row",
                        defPos, defSiz, 0)
        self.szr10.Add(self.btn_neu, 0, wx.ALIGN_CENTER|wx.ALL, 5)

        msg_txt = """NEW rows will be inserted at the end, or before the row with a right-clicked field.\nDUPLICATE row: double-click its number.  DELETE row: right-double-click its number.\nDouble-click titles or page numbers to display the page image."""
        explain = wx.StaticText(self, wx.ID_ANY, msg_txt,
                      defPos, wx.Size(-1, 50), 0)
        self.szr10.Add(explain, 0, wx.ALIGN_CENTER, 5)

#==============================================================================
# Sizer 20: define outline grid and do some layout adjustments
#==============================================================================
        self.szr20 = MyGrid(self)
        self.szr20.AutoSizeColumn(0)
        self.szr20.AutoSizeColumn(1)
        self.szr20.SetColSize(2, 45)
        self.szr20.SetRowLabelSize(30)

#==============================================================================
# Sizer 30: PDF meta information
#==============================================================================
        self.szr30 = wx.FlexGridSizer(6, 2, 0, 0)
        self.szr30.SetFlexibleDirection(wx.BOTH)
        self.szr30.SetNonFlexibleGrowMode(wx.FLEX_GROWMODE_SPECIFIED)

        self.tx_input = wx.StaticText(self, wx.ID_ANY, "Input:",
                            defPos, defSiz, 0)
        self.tx_input.Wrap(-1)
        self.szr30.Add(self.tx_input, 0, wx.ALIGN_CENTER, 5)

        self.tx_eindat = wx.StaticText(self, wx.ID_ANY,
                            "  %s  (%s pages)" % (PDFcfg.file, str(PDFcfg.seiten)),
                            defPos, defSiz, 0)
        self.tx_eindat.Wrap(-1)
        self.szr30.Add(self.tx_eindat, 0, wx.ALL, 5)

        self.tx_ausdat = wx.StaticText(self, wx.ID_ANY, "Output:",
                            defPos, defSiz, 0)
        self.tx_ausdat.Wrap(-1)
        self.szr30.Add(self.tx_ausdat, 0, wx.ALIGN_CENTER, 5)

        self.btn_aus = wx.FilePickerCtrl(self, wx.ID_ANY,
                        PDFcfg.file, "set output file", "*.pdf",
                        defPos, wx.Size(480,-1),
                        wx.FLP_OVERWRITE_PROMPT|wx.FLP_SAVE|
                        wx.FLP_USE_TEXTCTRL)
        self.szr30.Add(self.btn_aus, 0, wx.ALL, 5)
        self.tx_autor = wx.StaticText(self, wx.ID_ANY, "Author:",
                         defPos, defSiz, 0)
        self.tx_autor.Wrap(-1)
        self.szr30.Add(self.tx_autor, 0, wx.ALIGN_CENTER, 5)

        self.ausaut = wx.TextCtrl(self, wx.ID_ANY,
                       PDFcfg.meta["author"],
                       defPos, wx.Size(480, -1), 0)
        self.szr30.Add(self.ausaut, 0, wx.ALL, 5)

        self.pdf_titel = wx.StaticText(self, wx.ID_ANY, "Title:",
                          defPos, defSiz, 0)
        self.pdf_titel.Wrap(-1)
        self.szr30.Add(self.pdf_titel, 0, wx.ALIGN_CENTER, 5)

        self.austit = wx.TextCtrl(self, wx.ID_ANY,
                       PDFcfg.meta["title"],
                       defPos, wx.Size(480, -1), 0)
        self.szr30.Add(self.austit, 0, wx.ALL, 5)

        self.tx_subject = wx.StaticText(self, wx.ID_ANY, "Subject:",
                           defPos, defSiz, 0)
        self.tx_subject.Wrap(-1)
        self.szr30.Add(self.tx_subject, 0, wx.ALIGN_CENTER, 5)

        self.aussub = wx.TextCtrl(self, wx.ID_ANY,
                       PDFcfg.meta["subject"],
                       defPos, wx.Size(480, -1), 0)
        self.szr30.Add(self.aussub, 0, wx.ALL, 5)

#==============================================================================
# Sizer 31: check data
#==============================================================================
        self.szr31 = wx.FlexGridSizer(1, 2, 0, 0)
        self.btn_chk = wx.Button(self, wx.ID_ANY, "Check Data",
                        defPos, defSiz, 0)
        self.szr31.Add(self.btn_chk, 0, wx.ALIGN_TOP|wx.ALL, 5)
        self.msg = wx.StaticText(self, wx.ID_ANY, "Before data can be saved, "\
                    "they must be checked with this button.\n"\
                    "Warning: Any original 'Output' file will be overwritten, "\
                    "once you press SAVE!",
                    defPos, defSiz, 0)
        self.msg.Wrap(-1)
        self.szr31.Add(self.msg, 0, wx.ALL, 5)

#==============================================================================
# Sizer 40: OK / Cancel
#==============================================================================
        self.szr40 = wx.StdDialogButtonSizer()
        self.szr40OK = wx.Button(self, wx.ID_OK, label="SAVE")
        self.szr40OK.Disable()
        self.szr40.AddButton(self.szr40OK)
        self.szr40Cancel = wx.Button(self, wx.ID_CANCEL)
        self.szr40.AddButton(self.szr40Cancel)
        self.szr40.Realize()

#==============================================================================
# define lines (decoration only)
#==============================================================================
        linie1 = wx.StaticLine(self, wx.ID_ANY,
                       defPos, defSiz, wx.LI_HORIZONTAL)
        linie2 = wx.StaticLine(self, wx.ID_ANY,
                       defPos, defSiz, wx.LI_HORIZONTAL)
        linie3 = wx.StaticLine(self, wx.ID_ANY,
                       defPos, defSiz, wx.LI_HORIZONTAL)

#==============================================================================
# Left Sizer: Outline and other PDF information
#==============================================================================
        le_szr = wx.BoxSizer(wx.VERTICAL)
        le_szr.Add(self.szr10, 0, wx.EXPAND, 5)
        le_szr.Add(linie1, 0, wx.EXPAND|wx.ALL, 5)

        le_szr.Add(self.szr20, 1, wx.EXPAND, 5)
        le_szr.Add(self.szr31, 0, wx.EXPAND, 5)
        le_szr.Add(linie2, 0, wx.EXPAND|wx.ALL, 5)

        le_szr.Add(self.szr30, 0, wx.EXPAND, 5)
        le_szr.Add(linie3, 0, wx.EXPAND|wx.ALL, 5)

        le_szr.Add(self.szr40, 0, wx.ALIGN_TOP|wx.ALIGN_CENTER_HORIZONTAL, 5)

#==============================================================================
# Right Sizer: display a PDF page image
#==============================================================================
        ri_szr = wx.BoxSizer(wx.VERTICAL)     # a control line and the picture

        ri_szr20 = wx.BoxSizer(wx.HORIZONTAL) # defines the control line

        self.btn_vor = wx.Button(self, wx.ID_ANY, "forward",
                           defPos, defSiz, 0)
        ri_szr20.Add(self.btn_vor, 0, wx.ALL, 5)

        self.btn_zur = wx.Button(self, wx.ID_ANY, "backward",
                           defPos, defSiz, 0)
        ri_szr20.Add(self.btn_zur, 0, wx.ALL, 5)

        self.zuSeite = wx.TextCtrl(self, wx.ID_ANY, "1",
                             defPos, wx.Size(40, -1),
                             wx.TE_PROCESS_ENTER|wx.TE_RIGHT)
        ri_szr20.Add(self.zuSeite, 0, wx.ALL, 5)

        max_pages = wx.StaticText(self, wx.ID_ANY,
                            "of %s pages" % (str(PDFcfg.seiten),),
                            defPos, defSiz, 0)
        ri_szr20.Add(max_pages, 0, wx.ALIGN_CENTER, 5)

        # control line sizer composed, now add it to the vertical sizer
        ri_szr.Add(ri_szr20, 0, wx.EXPAND, 5)

        # define the bitmap for the pdf image ...
        bmp = pdf_show(1)
        self.PDFbild = wx.StaticBitmap(self, wx.ID_ANY, bmp,
                           defPos, defSiz, wx.BORDER_NONE)
        # ... and add it to the vertical sizer
        ri_szr.Add(self.PDFbild, 0, wx.ALL, 0)

#==============================================================================
# Main Sizer composition
#==============================================================================
        mainszr= wx.BoxSizer(wx.HORIZONTAL)
        mainszr.Add(le_szr, 1, wx.ALL, 5)
        mainszr.Add(ri_szr, 0, wx.ALL, 5)

        self.SetSizer(mainszr)
        self.Layout()
        self.Centre(wx.BOTH)

#==============================================================================
# bind buttons
#==============================================================================
        self.btn_neu.Bind(wx.EVT_BUTTON, self.insertRow)      # "new row"
        self.btn_chk.Bind(wx.EVT_BUTTON, self.DataOK)         # "check data"
        self.btn_vor.Bind(wx.EVT_BUTTON, self.forwPage)       # "forward"
        self.btn_zur.Bind(wx.EVT_BUTTON, self.backPage)       # "backward"
        self.zuSeite.Bind(wx.EVT_TEXT_ENTER, self.gotoPage)   # "page number"
        self.PDFbild.Bind(wx.EVT_MOUSEWHEEL, self.OnMouseWheel) # mouse scroll

    def __del__(self):
        pass

    def OnMouseWheel(self, event):
        # process wheel as paging operations
        d = event.GetWheelRotation()     # int indicating direction
        if d < 0:
            self.forwPage(event)
        elif d > 0:
            self.backPage(event)
        return

    def forwPage(self, event):
        seite = getint(self.zuSeite.Value) + 1
        PicRefresh(seite)
        event.Skip()

    def backPage(self, event):
        seite = getint(self.zuSeite.Value) - 1
        PicRefresh(seite)
        event.Skip()

    def gotoPage(self, event):
        seite = self.zuSeite.Value
        PicRefresh(seite)
        event.Skip()

#==============================================================================
# "insertRow" - Event Handler for new rows: insert a model row
#==============================================================================
    def insertRow(self, event):
        zeile = [1, "*** new row ***", 1, ""]
        self.szr20.Table.NewRow(zeile)
        DisableOK()
        self.Layout()

#==============================================================================
# Check Data: enable / disable OK button
#==============================================================================
    def DataOK(self, event):
        valide = True
        self.msg.Label = "Data OK!"
        d = self.szr20.GetTable()
        for i in range(self.szr20.Table.GetNumberRows()):
            if i == 0 and int(d.GetValue(0, 0)) != 1:
                valide = False
                self.msg.Label = "row 1 must have level 1"
                break
            if int(d.GetValue(i, 0)) < 1:
                valide = False
                self.msg.Label = "row %s: level < 1" % (str(i+1),)
                break
            if int(d.GetValue(i, 2)) > PDFcfg.seiten or \
               int(d.GetValue(i, 2)) < 1:
                valide = False
                self.msg.Label = "row %s: page# out of range" \
                                  % (str(i+1),)
                break
            if i > 0 and (int(d.GetValue(i, 0)) - int(d.GetValue(i-1, 0))) > 1:
                valide = False
                self.msg.Label = "row %s: level stepping > 1" % (str(i+1),)
                break
            if not d.GetValue(i, 1):
                valide = False
                self.msg.Label = "row %s: missing title" % (str(i+1),)
                break

        if valide and (self.btn_aus.GetPath() == PDFcfg.file):
            if PDFcfg.doc.openErrCode > 0 or PDFcfg.doc.needsPass == 1:
                valide = False
                self.msg.Label = "repaired or encrypted document - choose a different Output"

        if not valide:
            self.szr40OK.Disable()
        else:
            self.szr40OK.Enable()
        self.Layout()

#==============================================================================
# display a PDF page
#==============================================================================
def PicRefresh(seite):
    i_seite = getint(seite)
    i_seite = max(1, i_seite)           # ensure page# is within boundaries
    i_seite = min(PDFcfg.seiten, i_seite)

    dlg.zuSeite.Value = str(i_seite)    # set page number in dialog fields
    if PDFcfg.oldpage == i_seite:
        return
    PDFcfg.oldpage = i_seite

    bmp = pdf_show(i_seite)
    dlg.PDFbild.SetSize(bmp.Size)
    dlg.PDFbild.SetBitmap(bmp)
    dlg.PDFbild.Refresh(True)
    bmp = None
    dlg.Layout()

#==============================================================================
# Disable OK button
#==============================================================================
def DisableOK():
    dlg.szr40OK.Disable()
    dlg.msg.Label = "Data have changed.\nPress Check Data (again) " \
                    + "before saving."

#==============================================================================
# Read PDF document information
#==============================================================================
def getPDFinfo():
    PDFcfg.doc = fitz.open(PDFcfg.file)
    if PDFcfg.doc.needsPass:
        decrypt_doc()
    if PDFcfg.doc.isEncrypted:
        return True
    PDFcfg.inhalt = PDFcfg.doc.getToC()
    PDFcfg.seiten = PDFcfg.doc.pageCount
    PDFmeta = {"author":"", "title":"", "subject":""}
    for key, wert in PDFcfg.doc.metadata.items():
        if wert:
            PDFmeta[key] = wert.decode("utf-8", "ignore")
        else:
            PDFmeta[key] = ""
    PDFcfg.meta = PDFmeta
    return False

def decrypt_doc():
    # let user enter document password
    pw = None
    dlg = wx.TextEntryDialog(None, 'Please enter password below:',
             'Document is password protected', '',
             style = wx.TextEntryDialogStyle|wx.TE_PASSWORD)
    while pw is None:
        rc = dlg.ShowModal()
        if rc == wx.ID_OK:
            pw = str(dlg.GetValue().encode("utf-8"))
            PDFcfg.doc.authenticate(pw)
        else:
            return
        if PDFcfg.doc.isEncrypted:
            pw = None
            dlg.SetTitle("Wrong password. Enter correct password or cancel.")
    return

#==============================================================================
# Write the changed PDF file
#============================================================================
def make_pdf(dlg):
    cdate = wx.DateTime.Now().Format("D:%Y%m%d%H%M%S-04'00'")
    PDFmeta = {"creator":"PDFoutline.py",
               "producer":"PyMuPDF",
               "creationDate": cdate,
               "modDate": cdate,
               "title":dlg.austit.Value,
               "author":dlg.ausaut.Value,
               "subject":dlg.aussub.Value}

    PDFcfg.doc.setMetadata(PDFmeta)    # set new metadata
    newtoc = []
#==============================================================================
# store our outline entries as bookmarks
#==============================================================================
    for z in dlg.szr20.Table.data:
        lvl = int(z[0])
        pno = int(z[2])
        tit = z[1].strip()
        tit = tit.encode(ENCODING, "ignore")
        newtoc.append([lvl, tit, pno])

    PDFcfg.doc.setToC(newtoc)

    outfile = dlg.btn_aus.GetPath()         # get dir & name of file in screen

    if outfile == PDFcfg.file:
        PDFcfg.doc.save(outfile, incremental=True)
    else:                                   # equal: replace input file
        PDFcfg.doc.save(outfile, garbage=3)

    return

#==============================================================================
#
# Main Program
#
#==============================================================================
if wx.VERSION[0] >= 3:
    pass
else:
    print "need wxPython version 3.0 or higher"
    sys.exit(1)
app = None
app = wx.App()

#==============================================================================
# Check if we have been invoked with a PDF to edit
#==============================================================================
if len(sys.argv) == 2:
    infile = sys.argv[1]
    if not infile.endswith(".pdf"):
        infile = None
else:
    infile = None

#==============================================================================
# let user select the file. Can only allow true PDFs.
#==============================================================================
if not infile:
    dlg = wx.FileDialog(None, message = "Choose a PDF file to edit",
                        defaultDir = os.path.expanduser('~'),
                        defaultFile = wx.EmptyString,
                        wildcard = "PDF files (*.pdf)|*.pdf",
                        style=wx.OPEN | wx.CHANGE_DIR)
    # We got a file only when one was selected and OK pressed
    if dlg.ShowModal() == wx.ID_OK:
        # This returns a Python list of selected files.
        infile = dlg.GetPaths()[0]
    else:
        infile = None
    # destroy this dialog
    dlg.Destroy()

if infile:                      # if we have a filename ...
    PDFcfg = PDFconfig()        # create our PDF scratchpad
    PDFcfg.file = infile
    if getPDFinfo() == 0:              # input is not encrypted
        dlg = PDFDialog(None)          # create dialog
        rc = dlg.ShowModal()           # show dialog
        if rc == wx.ID_OK:           # output PDF if SAVE pressed
            make_pdf(dlg)
        dlg.Destroy()
        app = None

Now uses features introduced by PyMuPDF version 1.9.1:

  • requires wxPython and PyMuPDF, and no longer PyPDF2
  • if infile = outfile, an incremental save will be used (very fast)
  • program logic much simpler b/o the new method "setToC()"

1 comment

Harald Lieder (author) 5 years, 1 month ago  # | flag

This script is also contained in the GitHub repo of PyMuPDF. Recently (2016-10-28), it has been updated to support PyMuPDF 1.9.2, wxPython 3.0.3 (Phoenix release) on all Python versions 2.7 and up, and architectures x86 / x64.