Insert a Text Box in a PDF page (fitz / PyMuPDF) (Python)

2017-06-29T22:54:25-07:00

Python recipe 580809 by Jorj X. McKie (fitz, mupdf, pdf, textbox).

This method inserts text into a predefined rectangular area of a (new or existing) PDF page. Words are distributed across the available space, put on new lines when required etc. Line breaks and tab characters are respected / resolved. Text can be aligned in the box (left, center, right) and fonts can be freely chosen. The method returns a float indicating how vertical space is left over after filling the area.

Create Calendars on PDF with a few lines (Python)

2017-06-13T10:57:34-07:00

Python recipe 580805 by Jorj X. McKie (calendar, fitz, mupdf, pdf, pymupdf). Revision 2.

PyMuPDF (fitz) provides easy to use ways to create PDF documents out of simple texts.

An example is the text output of Python's calendar module. Here we take a starting year as script parameter and output a 3-page (A4 landscape) document with calendars for this and the following two years - in less than 20 lines of code.

Inserting Images on PDF Pages (Python)

2017-05-17T21:10:26-07:00

Python recipe 580803 by Jorj X. McKie (fitz, mupdf, pdf, pymupdf).

Version 1.11.0 of PyMuPDF allows putting an image on an existing PDF page. The following example puts the same image on every page of a given PDF - like a thumbnail.

How to handle PDF embedded files with PyMuPDF (Python)

2017-07-11T18:57:54-07:00

Python recipe 580796 by Jorj X. McKie (embedded_files, fitz, mupdf, pdf, pymupdf). Revision 3.

Version 1.11.0 (based on MuPDF v1.11) allows exporting, importing and interrogating files embedded in a PDF.

PDF "/EmbeddedFiles" are similar to ZIP archives (or the Microsoft OLE technique), allowing arbitrary data to be incorporated in a PDF and benefit from its unique features.

How to Create a PDF with a Caustic Drawing (Python)

2017-06-18T17:43:47-07:00

Python recipe 580806 by Jorj X. McKie (fitz, mupdf, pdf, pymupdf).

Just a little demo on how to create simple drawings with PyMuPDF.

This script simulates what you see looking into your coffee mug, early in the morning after a long night of programming ...

Inserting pages into a PDF with PyMuPDF (Python)

2017-05-17T21:15:26-07:00

Python recipe 580802 by Jorj X. McKie (fitz, mupdf, pdf, text_conversion). Revision 2.

Version 1.11.0 of PyMuPDF allows creating new PDF pages, as well as inserting images into existing pages.

Here is a script that converts any textfile into a PDF.

Convert doc and docx files to pdf (Python)

2014-03-31T18:39:16-07:00

Python recipe 578858 by Fabian Mayer (doc, pdf, python, win32com). Revision 2.

The Script converts all doc and docx files in a specified folder to pdf files. It checks whether the provided absolute path does actually exist and whether the specified folder contains any doc and docx files. It does not travers the directory recursively. The script is not portable and runs only a Windows machine. Based on the experience I made, I recommend closing MS Word before running the script.

PDF Text Extraction using fitz / MuPDF (PyMuPDF) (Python)

2016-03-17T12:00:06-07:00

Python recipe 580626 by Jorj X. McKie (cbz, epub, mupdf, openxps, pdf, pymupdf, text_extraction, xps).

Extract all the text of a PDF (or other supported container types) at very high speed. In general, text pieces of a PDF page are not arranged in natural reading order, but in the order they were entered during PDF creation. This script re-arranges text blocks according to their pixel coordinates to achieve a more readable output, i.e. top-down, left-right.

Crop PDF File with pyPdf (Python)

2011-11-03T17:42:10-07:00

Python recipe 576837 by ccpizza (pdf, pypdf). Revision 3.

This recipe was originally posted by sjvr767 on http://www.mobileread.com/forums/showthread.php?t=25565 and I decided to also make it available here.

It uses pypdf (http://pybrary.net/pyPdf/)

The script is supposed to be run like this:

pdf_crop.py" -m "120 50 120 180" -i mypdf.pdf

where the margins are left top right bottom

To install pyPdf try easy_install pypdf.

Extract images of a PDF - optionally by page using PyMuPDF / fitz (Python)

2016-09-28T12:03:59-07:00

Python recipe 580703 by Jorj X. McKie (fitz, pdf, png).

Two small scripts to extract images contained in a PDF document as PNG files. (1) Script 1 extracts all images (2) Script 2 extracts only images that are referenced by a page

Improved ReportLab recipe for "page x of y" (Python)

2009-07-06T10:03:28-07:00

Python recipe 576832 by Vinay Sajip (pdf, reportlab). Revision 2.

This recipe is based on Recipe 546511 which does not work reliably if there are images in the content.

wxPython PDF Viewer using Poppler (Python)

2010-04-15T17:43:27-07:00

Python recipe 577195 by Marcelo Fernández (pdf, poppler, python_poppler, viewer, wxpython).

This example shows a PDF Viewer class, which handles things like Zoom and Scrolling. It requires python-poppler and wxPython >= 2.8.9.

How to parse a table in a PDF document (Python)

2016-04-10T22:43:57-07:00

Python recipe 580635 by Jorj X. McKie (cbz, epub, fitz, mupdf, openxps, parsing, pdf, pymupdf, table, xps). Revision 4.

A Python function that converts a table contained in a page of a PDF (or OpenXPS, EPUB, CBZ, XPS) document to a matrix-like Python object (list of lists of strings).

PDF a Directory of Images using Reportlab (Python)

2009-04-12T08:35:10-07:00

Python recipe 576717 by andrew.canit (directory, images, pdf).

Walk through a directory PDFing Images

Convert PDF to plain text (Python)

2010-11-25T15:30:52-08:00

Python recipe 577095 by ccpizza (converter, pdf).

This is a very raw PDF converter which has absolutely no idea of the page layout or text positioning.

To install the required module try easy_install pypdf in a console.

How to use Python to convert a web page to PDF with a POST request to SelectPdf Online API and save it on the disk (Python)

2015-11-16T14:52:17-08:00

Python recipe 579126 by SelectPdf (api, converter, htmltopdf, pdf, selectpdf).

This code converts an url to pdf in Python using SelectPdf HTML To PDF REST API through a POST request. The parameters are JSON encoded. The content is saved into a file on the disk.

Rotate a PDF page in 3 lines (Python)

2016-11-06T11:33:59-08:00

Python recipe 580713 by Jorj X. McKie (fitz, mupdf, pdf, pymupdf). Revision 2.

PyMuPDF v1.9.3 now supports several new features for manipulating PDFs.

Here is an example to rotate a page with just a few lines of Python code.

How to delete pages in a PDF using fitz / MuPDF / PyMuPDF (Python)

2016-05-01T09:26:44-07:00

Python recipe 580657 by Jorj X. McKie (mupdf, pdf, pdf_generation).

A new method select() in PyMuPDF 1.9.0 allows selecting pages of a PDF document to create a new one. Any Python list of integers (0 <= n < page count) can be taken.

The resulting PDF contains all links, annotations and bookmarks (provided they still point to valid targets).

Roll your own Postscript code from scratch (Python)

2015-12-09T23:30:13-08:00

Python recipe 579136 by Jack Trainor (ghostscript, pdf, postscript, ps).

This recipe provides a mini-framework for creating custom Postscript PS and PDF files from scratch. It includes sample code for a personalized business index card.

Recipe does not use any Python PDF libraries. However, Ghostscript and a PDF viewer are useful for displaying/debugging output.

It's easier than you might think to roll your own Postscript code!

Decrypt a PDF using fitz / MuPDF (PyMuPDF) (Python)

2016-03-17T12:22:10-07:00

Python recipe 580627 by Harald Lieder (decompression, decryption, pdf, repair).

It's more a code snippet. Shows how to dynamically check whether a PDF is password protected. If it is, decrypt it and save it back to disk un-encrypted.

Most viewed recipes tagged "pdf" but not "xtopdf"