Welcome, guest | Sign In | My Account | Store | Cart

This recipe shows how the basics of to convert the text data in a Microsoft Excel file (XLSX format) to PDF (Portable Document Format). It uses openpyxl to read the XLSX file and xtopdf to generate the PDF file.

Python, 30 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# XLSXtoPDF.py

# Program to convert the data from an XLSX file to PDF.
# Uses the openpyxl library and xtopdf.

# Author: Vasudev Ram - http://jugad2.blogspot.com
# Copyright 2015 Vasudev Ram.

from openpyxl import load_workbook
from PDFWriter import PDFWriter

workbook = load_workbook('fruits2.xlsx', guess_types=True, data_only=True)
worksheet = workbook.active

pw = PDFWriter('fruits2.pdf')
pw.setFont('Courier', 12)
pw.setHeader('XLSXtoPDF.py - convert XLSX data to PDF')
pw.setFooter('Generated using openpyxl and xtopdf')

ws_range = worksheet.iter_rows('A1:H13')
for row in ws_range:
    s = ''
    for cell in row:
        if cell.value is None:
            s += ' ' * 11
        else:
            s += str(cell.value).rjust(10) + ' '
    pw.writeLine(s)
pw.savePage()
pw.close()

The conversion of Excel to PDF, and of data to and from PDF in general, can have some issues, for certain technical reasons.

More information and sample input and output here:

http://jugad2.blogspot.in/2015/11/convert-xlsx-to-pdf-with-python-and.html

8 comments

Vasudev Ram (author) 8 years, 3 months ago  # | flag

That should be "Microsoft", not "Microsot", in the title of the recipe :)

Can't delete it now.

Error unintended.

Vineeth Shetty 8 years, 1 month ago  # | flag

I have got a similar thing to do and finally came across this page after a lot of searching online. I tried doing the same thing, but when I try "from PDFWriter import PDFWriter", it throws an ImportError: cannot import name PDFWriter.

I have installed epubmaker-0.3.20 in order to get the PDFWriter module.

Is there something that I'm missing. Please help.

Vasudev Ram (author) 8 years, 1 month ago  # | flag

Looks like you did not read the recipe completely.

have installed epubmaker-0.3.20 in order to get the PDFWriter module.

I have not used the epubmaker module at all.

See the blog post linked from the recipe, and follow and use relevant links it it, to get xtopdf (which has the right PDFWriter). Install it using the Guide to installing and using xtopdf, which can also be found from the links.

Vasudev Ram (author) 8 years, 1 month ago  # | flag

links it it

should be "links in it".

B 6 years, 9 months ago  # | flag

Hello. Thank you for this tutorial it works fine but there is problem with text converting. Namely, while text is a subscript in xlsx-file then after converting it becomes usual size. How to fix it?

B 6 years, 9 months ago  # | flag

Moreover, when I convert the file I get the error:

/usr/local/lib/python2.7/dist-packages/openpyxl/worksheet/worksheet.py:495: UserWarning: Using a range string         
is deprecated. Use ws[range_string]
warn("Using a range string is deprecated. Use ws[range_string]")
B 6 years, 9 months ago  # | flag

This is not an error, sorry. Program finishes correctly.

Vasudev Ram (author) 6 years, 9 months ago  # | flag

You're welcome. To answer your first question, about text size - xtpodf does not support superscripts, etc., when converting text from other formats. Only one size of font is supported per PDF generated. Fonts and the like is not the focus of xtopdf.

The focus of xtopdf is to provide a higher level abstraction than ReportLab (which xtopdf uses under the hood), for a subset of ReportLab's functionality, namely, convenient and easy generation of text-only reports with headers and footers, automatic pagination and page numbering, from a variety of other data formats / file formats / data sources from which text can be extracted by any means whatsoever.

Also, xtopdf is a library, so it is programmable - which means you can use it in your Python programs (not just use the existing Python apps I've written that use xtopdf), and you can mix and match text data from many input sources, munge / manipulate / adorn / format the text in any way you want via Python string-processing code, and then send the final formatted version to PDF very easily, without any boilerplate code, using xtopdf's PDFWriter class.

In other words, the focus of xtopdf is the generation of formatted reports (for business or any other purpose), where formatted means string formatting, but devoid of frills like different-sized fonts, line, boxes, shadows, etc. If you want that fancier / different stuff, nothing wrong with it, all have their uses, please use ReportLab's lower-level APIs directly, or generate HTML with CSS styling etc., and then use a tool like wkhtmltopdf or xhtmltopdf to convert your nicely styled HTML to PDF.

HTH - Vasudev.