Popular recipes tagged "pdf" but not "fitz" and "xtopdf"

PDF Text Extraction using fitz / MuPDF (PyMuPDF) (Python)

2016-03-17T12:00:06-07:00

Python recipe 580626 by Jorj X. McKie (cbz, epub, mupdf, openxps, pdf, pymupdf, text_extraction, xps).

Extract all the text of a PDF (or other supported container types) at very high speed. In general, text pieces of a PDF page are not arranged in natural reading order, but in the order they were entered during PDF creation. This script re-arranges text blocks according to their pixel coordinates to achieve a more readable output, i.e. top-down, left-right.

How to delete pages in a PDF using fitz / MuPDF / PyMuPDF (Python)

2016-05-01T09:26:44-07:00

Python recipe 580657 by Jorj X. McKie (mupdf, pdf, pdf_generation).

A new method select() in PyMuPDF 1.9.0 allows selecting pages of a PDF document to create a new one. Any Python list of integers (0 <= n < page count) can be taken.

The resulting PDF contains all links, annotations and bookmarks (provided they still point to valid targets).

Convert from Html To Pdf in ASP.NET MVC C# with SelectPdf Free Community Edition (C++)

2016-11-17T15:01:12-08:00

C++ recipe 580719 by SelectPdf (aspnet, mvc, pdf, selectpdf).

It’s very easy to use SelectPdf SDK for .NET in ASP.NET MVC applications. Take a look at the simple code below.

Decrypt a PDF using fitz / MuPDF (PyMuPDF) (Python)

2016-03-17T12:22:10-07:00

Python recipe 580627 by Harald Lieder (decompression, decryption, pdf, repair).

It's more a code snippet. Shows how to dynamically check whether a PDF is password protected. If it is, decrypt it and save it back to disk un-encrypted.

Publish a Windows Process List to PDF with xtopdf (Batch)

2015-12-27T20:45:32-08:00

Batch recipe 579142 by Vasudev Ram (pdf, pdfwriter, pdf_generation, processes, process_management, python, windows).

This recipe shows how you can generate a Windows process list or task list (basically, a list of running processes, with some information about each of them), to a PDF file, using the Windows TASKLIST command along with the xtopdf toolkit. The list is sorted in ascending order of memory usage of the processes, before writing it to PDF.

It differs somewhat from other xtopdf recipes, in that no additional code needs to be written, over and above what is already in the xtopdf package. We just have to use the needed commands there, in a series of commands or a pipeline.

However, one can still write additional code, by modifying the program used (StdinToPDF.py), if needed, to customize the PDF output.

wxPython PDF / XPS Viewer using PyMuPDF (binding for fitz / MuPDF) (Python)

2016-09-28T12:21:03-07:00

Python recipe 580621 by Jorj X. McKie (cbz, epub, mupdf, pdf, pymupdf, wxpython, xps). Revision 2.

A simple program to display a PDF (or XPS, EPUB, CBZ) document with forward / backward buttons and a field for directly jumping to a specific page. It uses the Python binding PyMuPDF for fitz, the high-performance / high-quality graphics library of MuPDF. It obviously can also be used to display XPS documents on non-Windows platforms.

This new version also supports any links contained in a page.

Find all fonts used in a PDF document by page (Python)

2016-08-26T00:02:48-07:00

Python recipe 580651 by Jorj X. McKie (pdf). Revision 3.

Finds all fonts used in a PDF document by page. This new script is based on PyMuDF v1.9.2 and works for PDF files only. However, it is a lot simpler, speed has drastically improved and there is no dependency on other packages any more.

PDF Joiner / Splitter using wxPython, PyMuPDF (fitz / MuPDF) (Python)

2016-03-15T19:07:35-07:00

Python recipe 580622 by Harald Lieder (join, pdf, python, split).

Full featured PDF joiner. Join several PDF files into one output PDF. Page ranges can be specified as well as page orientation for each output page range. Tables of contents are intelligently preserved for each page range (can also be switched off). Output PDF metadata editable.

Python-controlled Unix pipeline to generate PDF (Python)

2016-01-07T18:02:52-08:00

Python recipe 579146 by Vasudev Ram (linux, pdf, pdf_generation, pipe, pipelining, python, python2, unix).

This recipe shows how to create a Unix pipeline that generates PDF output, under the control of a Python program. It is tested on Linux. It uses nl, a standard Linux command that adds line numbers to its input, and selpg, a custom Linux command-line utility, that selects only specified pages from its input, together in a pipeline (nl | selpg). The Python program sets up and starts that pipeline running, and then reads input from it and generates PDF output.

How to use Python to convert a web page to PDF with a POST request to SelectPdf Online API and save it on the disk (Python)

2015-11-16T14:52:17-08:00

Python recipe 579126 by SelectPdf (api, converter, htmltopdf, pdf, selectpdf).

This code converts an url to pdf in Python using SelectPdf HTML To PDF REST API through a POST request. The parameters are JSON encoded. The content is saved into a file on the disk.

Roll your own Postscript code from scratch (Python)

2015-12-09T23:30:13-08:00

Python recipe 579136 by Jack Trainor (ghostscript, pdf, postscript, ps).

This recipe provides a mini-framework for creating custom Postscript PS and PDF files from scratch. It includes sample code for a personalized business index card.

Recipe does not use any Python PDF libraries. However, Ghostscript and a PDF viewer are useful for displaying/debugging output.

It's easier than you might think to roll your own Postscript code!

Convert doc and docx files to pdf (Python)

2014-03-31T18:39:16-07:00

Python recipe 578858 by Fabian Mayer (doc, pdf, python, win32com). Revision 2.

The Script converts all doc and docx files in a specified folder to pdf files. It checks whether the provided absolute path does actually exist and whether the specified folder contains any doc and docx files. It does not travers the directory recursively. The script is not portable and runs only a Windows machine. Based on the experience I made, I recommend closing MS Word before running the script.

Convert HTML to PDF with the PDFcrowd API (Python)

2015-03-07T20:22:54-08:00

Python recipe 579032 by Vasudev Ram (api, html, pdf, pdfcrowd).

This recipe shows how to use Python and the PDFcrowd API to convert HTML content to PDF. The HTML input can come from a remote URL, a local HTML file, or a string containing HTML.

Print selected text pages to PDF with Python, selpg and xtopdf on Linux (Bash)

2014-10-29T17:38:10-07:00

Bash recipe 578954 by Vasudev Ram (bash, linux, pdf, python, reportlab, shell, text, text_files, text_processing, unix).

This recipe shows how to use selpg, a Linux command-line utility written in C, together with xtopdf, a Python toolkit for PDF creation, to print only a selected range of pages from a text file, to a PDF file, for display or print purposes. The way to do this is to run the selpg utility at the Linux command line, with options specifying the start and end pages of the range, and pipe its output to the StdinToPDF.py program, which is a part of the xtopdf toolkit.

Serve PDF with Netius, a pure-Python network library, and xtopdf (Python)

2014-12-03T21:27:54-08:00

Python recipe 578974 by Vasudev Ram (client, client_server, networking, pdf, python, server).

This recipe shows how to serve PDF from a server written using Netius, a pure-Python library, together with xtopdf, a Python toolkit for PDF creation. It is a proof-of-concept recipe, to show the essentials needed for the task, so it hard-codes the text content that is served as PDF, but the concepts shown can easily be extended to serve dynamically generated PDF content.

jpg2pdf (Python)

2011-07-17T19:49:58-07:00

Python recipe 577798 by Sundar Srinivasan (image, jpeg, pdf, reportlab).

Program to convert JPEG to PDF. Technically it just embeds the JPEG in a landscape US letter size PDF page. When you might need it?: When you have to scan a document and do not have scanner handy, you can take a photograph of the document with webcam, and embed the JPEG into PDF - effectively works as a scanner.

wxPython PDF Viewer using Poppler (Python)

2010-04-15T17:43:27-07:00

Python recipe 577195 by Marcelo Fernández (pdf, poppler, python_poppler, viewer, wxpython).

This example shows a PDF Viewer class, which handles things like Zoom and Scrolling. It requires python-poppler and wxPython >= 2.8.9.

Improved ReportLab recipe for "page x of y" (Python)

2009-07-06T10:03:28-07:00

Python recipe 576832 by Vinay Sajip (pdf, reportlab). Revision 2.

This recipe is based on Recipe 546511 which does not work reliably if there are images in the content.

Convert PDF to plain text (Python)

2010-11-25T15:30:52-08:00

Python recipe 577095 by ccpizza (converter, pdf).

This is a very raw PDF converter which has absolutely no idea of the page layout or text positioning.

To install the required module try easy_install pypdf in a console.

Crop PDF File with pyPdf (Python)

2011-11-03T17:42:10-07:00

Python recipe 576837 by ccpizza (pdf, pypdf). Revision 3.

This recipe was originally posted by sjvr767 on http://www.mobileread.com/forums/showthread.php?t=25565 and I decided to also make it available here.

It uses pypdf (http://pybrary.net/pyPdf/)

The script is supposed to be run like this:

pdf_crop.py" -m "120 50 120 180" -i mypdf.pdf

where the margins are left top right bottom

To install pyPdf try easy_install pypdf.