Latest recipes tagged "text_processing"http://code.activestate.com/recipes/tags/text_processing/new/2016-12-06T20:37:30-08:00ActiveState Code RecipesConvert wildcard text files to PDF with xtopdf (e.g. report*.txt) (Python)
2016-12-06T20:37:30-08:00Vasudev Ramhttp://code.activestate.com/recipes/users/4173351/http://code.activestate.com/recipes/580727-convert-wildcard-text-files-to-pdf-with-xtopdf-eg-/
<p style="color: grey">
Python
recipe 580727
by <a href="/recipes/users/4173351/">Vasudev Ram</a>
(<a href="/recipes/tags/conversion/">conversion</a>, <a href="/recipes/tags/files/">files</a>, <a href="/recipes/tags/globbing/">globbing</a>, <a href="/recipes/tags/patterns/">patterns</a>, <a href="/recipes/tags/pdf/">pdf</a>, <a href="/recipes/tags/pdfwriter/">pdfwriter</a>, <a href="/recipes/tags/pdf_generation/">pdf_generation</a>, <a href="/recipes/tags/text_processing/">text_processing</a>, <a href="/recipes/tags/wildcard/">wildcard</a>, <a href="/recipes/tags/xtopdf/">xtopdf</a>).
</p>
<p>This recipe shows how to convert all text files matching a filename wildcard to PDF, using the xtopdf PDF creation toolkit. For example, if you specify report<em>.txt as the wildcard, all files in the current directory that match report</em>.txt, will be converted to PDF, each in a separate PDF file. The original text files are not changed.</p>
<p>Here is a guide to installing and using xtopdf:</p>
<p><a href="http://jugad2.blogspot.in/2012/07/guide-to-installing-and-using-xtopdf.html" rel="nofollow">http://jugad2.blogspot.in/2012/07/guide-to-installing-and-using-xtopdf.html</a></p>
<p>More details on running the program, and sample output, are available here:</p>
<p><a href="http://jugad2.blogspot.in/2016/12/xtopdf-wildcard-text-files-to-pdf-with.html" rel="nofollow">http://jugad2.blogspot.in/2016/12/xtopdf-wildcard-text-files-to-pdf-with.html</a></p>
Batch conversion of text files to PDF with fileinput and xtopdf (Python)
2016-11-07T20:28:01-08:00Vasudev Ramhttp://code.activestate.com/recipes/users/4173351/http://code.activestate.com/recipes/580715-batch-conversion-of-text-files-to-pdf-with-fileinp/
<p style="color: grey">
Python
recipe 580715
by <a href="/recipes/users/4173351/">Vasudev Ram</a>
(<a href="/recipes/tags/batch/">batch</a>, <a href="/recipes/tags/batchmode/">batchmode</a>, <a href="/recipes/tags/conversion/">conversion</a>, <a href="/recipes/tags/files/">files</a>, <a href="/recipes/tags/pdf/">pdf</a>, <a href="/recipes/tags/pdfwriter/">pdfwriter</a>, <a href="/recipes/tags/python/">python</a>, <a href="/recipes/tags/text/">text</a>, <a href="/recipes/tags/text_processing/">text_processing</a>, <a href="/recipes/tags/utilities/">utilities</a>, <a href="/recipes/tags/xtopdf/">xtopdf</a>).
</p>
<p>This recipe shows how to do a batch conversion of the content of multiple text files into a single PDF file, with a) an automatic page break after the content of each text file (in the PDF output), b) page numbering, and c) a header and footer on each page.</p>
<p>It uses the fileinput module (part of the Python standard library), and xtopdf, a Python library for conversion of other formats to PDF.</p>
<p>xtopdf is available here: <a href="https://bitbucket.org/vasudevram/xtopdf" rel="nofollow">https://bitbucket.org/vasudevram/xtopdf</a></p>
<p>and a guide to installing and using xtopdf is here:</p>
<p><a href="http://jugad2.blogspot.in/2012/07/guide-to-installing-and-using-xtopdf.html" rel="nofollow">http://jugad2.blogspot.in/2012/07/guide-to-installing-and-using-xtopdf.html</a></p>
<p>Here is a sample run of the program:</p>
<p>python BTTP123.pdf text1.txt text2.txt text3.txt</p>
<p>This will read the content from the three text files specified and write it into the PDF file specified, neatly formatted.</p>
Routine to i18nify any word (Python)
2016-05-19T18:41:26-07:00Vasudev Ramhttp://code.activestate.com/recipes/users/4173351/http://code.activestate.com/recipes/580662-routine-to-i18nify-any-word/
<p style="color: grey">
Python
recipe 580662
by <a href="/recipes/users/4173351/">Vasudev Ram</a>
(<a href="/recipes/tags/i18n/">i18n</a>, <a href="/recipes/tags/python/">python</a>, <a href="/recipes/tags/strings/">strings</a>, <a href="/recipes/tags/text_processing/">text_processing</a>).
</p>
<p>This recipe shows a routine and a driver program that lets you "i18nify" any word, similar to how the word "internationalization" is shortened to "i18n", and "localization" to "l10n".</p>
[python3-tk/ttk] Onager Scratchpad (Python)
2016-04-24T02:34:03-07:00Mickey Kocichttp://code.activestate.com/recipes/users/4193984/http://code.activestate.com/recipes/580650-python3-tkttk-onager-scratchpad/
<p style="color: grey">
Python
recipe 580650
by <a href="/recipes/users/4193984/">Mickey Kocic</a>
(<a href="/recipes/tags/python3/">python3</a>, <a href="/recipes/tags/text_processing/">text_processing</a>, <a href="/recipes/tags/tkinter/">tkinter</a>, <a href="/recipes/tags/ttk/">ttk</a>, <a href="/recipes/tags/windows/">windows</a>).
Revision 2.
</p>
<p>I wrote this simple text editor to use for my diary. It's customized the way I like it, but the code is set up so it's easy for other people to change bg, fg, font, etc. I've also compiled a standalone Windows executable (thank you very much ActiveState! without ActivePython the compilation would have been impossible). Anyone who wants a copy of the executable is free to message or email me.</p>
<p>NOTE: If you get an error that the theme is not recognized, just comment out line 18 or run the following code in your python3 interpreter:</p>
<pre class="prettyprint"><code>>>>from tkinter.ttk import Style
>>>s = Style()
>>>s.theme_use()
</code></pre>
<p>You'll get a list of the available themes and can replace the 'alt' in line 18 with any one of them you want.</p>
Print selected text pages to PDF with Python, selpg and xtopdf on Linux (Bash)
2014-10-29T17:38:10-07:00Vasudev Ramhttp://code.activestate.com/recipes/users/4173351/http://code.activestate.com/recipes/578954-print-selected-text-pages-to-pdf-with-python-selpg/
<p style="color: grey">
Bash
recipe 578954
by <a href="/recipes/users/4173351/">Vasudev Ram</a>
(<a href="/recipes/tags/bash/">bash</a>, <a href="/recipes/tags/linux/">linux</a>, <a href="/recipes/tags/pdf/">pdf</a>, <a href="/recipes/tags/python/">python</a>, <a href="/recipes/tags/reportlab/">reportlab</a>, <a href="/recipes/tags/shell/">shell</a>, <a href="/recipes/tags/text/">text</a>, <a href="/recipes/tags/text_files/">text_files</a>, <a href="/recipes/tags/text_processing/">text_processing</a>, <a href="/recipes/tags/unix/">unix</a>).
</p>
<p>This recipe shows how to use selpg, a Linux command-line utility written in C, together with xtopdf, a Python toolkit for PDF creation, to print only a selected range of pages from a text file, to a PDF file, for display or print purposes. The way to do this is to run the selpg utility at the Linux command line, with options specifying the start and end pages of the range, and pipe its output to the StdinToPDF.py program, which is a part of the xtopdf toolkit.</p>
The Bentley-Knuth problem and solutions (Python)
2014-03-15T23:46:59-07:00Vasudev Ramhttp://code.activestate.com/recipes/users/4173351/http://code.activestate.com/recipes/578851-the-bentley-knuth-problem-and-solutions/
<p style="color: grey">
Python
recipe 578851
by <a href="/recipes/users/4173351/">Vasudev Ram</a>
(<a href="/recipes/tags/algorithms/">algorithms</a>, <a href="/recipes/tags/python/">python</a>, <a href="/recipes/tags/text_processing/">text_processing</a>).
</p>
<p>This is a program Jon Bentley asked Donald Knuth to write, and is one that’s become familiar to people who use languages with serious text-handling capabilities: Read a file of text, determine the n most frequently used words, and print out a sorted list of those words along with their frequencies. I wrote 2 solutions for it earlier, in Python and in Unix shell. Also see the comment by a user on the post, giving another solution.</p>
Text Editor in Python 3.3 (Python)
2013-06-19T15:58:17-07:00Stephen Chappellhttp://code.activestate.com/recipes/users/2608421/http://code.activestate.com/recipes/578569-text-editor-in-python-33/
<p style="color: grey">
Python
recipe 578569
by <a href="/recipes/users/2608421/">Stephen Chappell</a>
(<a href="/recipes/tags/editor/">editor</a>, <a href="/recipes/tags/python/">python</a>, <a href="/recipes/tags/text/">text</a>, <a href="/recipes/tags/text_processing/">text_processing</a>).
</p>
<p>This is a simple text editor written in Python using <code>tkinter</code> for graphics.</p>
<p>Check out Captain DeadBones' <a href="http://thelivingpearl.com/">Chronicles</a> blog.</p>
Plain Text Editor in Python (Python)
2013-06-18T15:33:01-07:00Captain DeadBoneshttp://code.activestate.com/recipes/users/4184772/http://code.activestate.com/recipes/578568-plain-text-editor-in-python/
<p style="color: grey">
Python
recipe 578568
by <a href="/recipes/users/4184772/">Captain DeadBones</a>
(<a href="/recipes/tags/editor/">editor</a>, <a href="/recipes/tags/python/">python</a>, <a href="/recipes/tags/text/">text</a>, <a href="/recipes/tags/text_processing/">text_processing</a>).
</p>
<p>Just a simple text editor written in Python with Tk for graphics. </p>
<p>Check out my blog <a href="http://thelivingpearl.com/">Captain DeadBones Chronicles</a></p>
slurp.py (Regex based simple parsing engine) (Python)
2013-05-26T18:00:58-07:00Mike 'Fuzzy' Partinhttp://code.activestate.com/recipes/users/4179778/http://code.activestate.com/recipes/578532-slurppy-regex-based-simple-parsing-engine/
<p style="color: grey">
Python
recipe 578532
by <a href="/recipes/users/4179778/">Mike 'Fuzzy' Partin</a>
(<a href="/recipes/tags/parser/">parser</a>, <a href="/recipes/tags/regex/">regex</a>, <a href="/recipes/tags/text_processing/">text_processing</a>).
</p>
<p>A parsing engine that allows you to define sets of patterns and callbacks, and process any I/O object in Pyton that has a readline() method.</p>
Text Model (Python)
2015-01-13T22:56:53-08:00Chris Eckerhttp://code.activestate.com/recipes/users/4180203/http://code.activestate.com/recipes/577978-text-model/
<p style="color: grey">
Python
recipe 577978
by <a href="/recipes/users/4180203/">Chris Ecker</a>
(<a href="/recipes/tags/datastuctures/">datastuctures</a>, <a href="/recipes/tags/text_processing/">text_processing</a>, <a href="/recipes/tags/tree/">tree</a>).
Revision 3.
</p>
<p>A tree data type holding text data together with styling information. </p>
Extracting structured text or code (Python)
2011-05-18T13:04:01-07:00Mike Sweeneyhttp://code.activestate.com/recipes/users/4177990/http://code.activestate.com/recipes/577700-extracting-structured-text-or-code/
<p style="color: grey">
Python
recipe 577700
by <a href="/recipes/users/4177990/">Mike Sweeney</a>
(<a href="/recipes/tags/parsing/">parsing</a>, <a href="/recipes/tags/structured/">structured</a>, <a href="/recipes/tags/text_processing/">text_processing</a>, <a href="/recipes/tags/token/">token</a>).
Revision 2.
</p>
<p>This function uses the power of regular expressions to extract parts of a structured text string. It can build a token list from many types of code and data formats. It finds string types (with quotes) and nested structures that use parentheses, brackets, and braces. If you need to extract a different syntax, you can provide a custom token pattern in the function arguments.</p>
Simple tabulator (Python)
2010-11-09T12:50:06-08:00Noufal Ibrahimhttp://code.activestate.com/recipes/users/4173873/http://code.activestate.com/recipes/577458-simple-tabulator/
<p style="color: grey">
Python
recipe 577458
by <a href="/recipes/users/4173873/">Noufal Ibrahim</a>
(<a href="/recipes/tags/parsing/">parsing</a>, <a href="/recipes/tags/text_processing/">text_processing</a>).
</p>
<p>This is a simple script to covert a top to bottom list of items into a left to right list.</p>
<pre class="prettyprint"><code> a
b
c
d
e
f
g
h
i
j
k
l
m
</code></pre>
<p>into</p>
<pre class="prettyprint"><code> a b c d e f g
h i j k l m
</code></pre>
<p>A few command line options allow some amount of customisation. </p>
Split a string on capitalized / uppercase char using Python (Python)
2009-12-11T23:16:36-08:00activestatehttp://code.activestate.com/recipes/users/4172588/http://code.activestate.com/recipes/576984-split-a-string-on-capitalized-uppercase-char-using/
<p style="color: grey">
Python
recipe 576984
by <a href="/recipes/users/4172588/">activestate</a>
(<a href="/recipes/tags/string/">string</a>, <a href="/recipes/tags/string_parsing/">string_parsing</a>, <a href="/recipes/tags/text_processing/">text_processing</a>).
Revision 6.
</p>
<p>By user <a href="http://code.activestate.com/recipes/users/2629617/" rel="nofollow">http://code.activestate.com/recipes/users/2629617/</a> in comment on <a href="http://code.activestate.com/recipes/440698/" rel="nofollow">http://code.activestate.com/recipes/440698/</a> but modified slightly.</p>
<p>Splits any string on upper case characters.</p>
<p>Ex.</p>
<pre class="prettyprint"><code>>>> print split_uppercase("thisIsIt and SoIsThis")
this Is It and So Is This
</code></pre>
<p>note the two spaces after 'and'</p>
uniform matcher( "re pattern" / re / func / dict / list / tuple / set ) (Python)
2009-05-06T06:17:16-07:00denishttp://code.activestate.com/recipes/users/4168005/http://code.activestate.com/recipes/576741-uniform-matcher-re-pattern-re-func-dict-list-tuple/
<p style="color: grey">
Python
recipe 576741
by <a href="/recipes/users/4168005/">denis</a>
(<a href="/recipes/tags/grep/">grep</a>, <a href="/recipes/tags/re/">re</a>, <a href="/recipes/tags/text_processing/">text_processing</a>, <a href="/recipes/tags/uniform/">uniform</a>).
Revision 2.
</p>
<p>matcher() makes a string matcher function from any of:</p>
<ul>
<li>"RE pattern string"</li>
<li>re.compile()</li>
<li>a function, i.e. callable</li>
<li>a dict / list / tuple / set / container</li>
</ul>
<p>This uniformity is simple, useful, a Good Thing.</p>
<p>A few example functions using matchers are here too: grep getfields kwgrep.</p>
Remove diatrical marks (including accents) from strings using latin alphabets (Python)
2009-02-11T11:40:55-08:00Sylvain Fourmanoithttp://code.activestate.com/recipes/users/4150902/http://code.activestate.com/recipes/576648-remove-diatrical-marks-including-accents-from-stri/
<p style="color: grey">
Python
recipe 576648
by <a href="/recipes/users/4150902/">Sylvain Fourmanoit</a>
(<a href="/recipes/tags/text/">text</a>, <a href="/recipes/tags/text_processing/">text_processing</a>).
Revision 7.
</p>
<p>Many written languages using latin alphabets employ <a href="http://en.wikipedia.org/wiki/Diacritic">diacritical marks</a>. Even today, it is still pretty common to encounter situations where it would be desirable to get rid of them: files naming, creation of easy to read URIs, indexing schemes, etc. </p>
<p>An easy way has always been to simply filter out any "decorated characters"; unfortunately, this does not preserve the base, undecorated glyphs. But thanks to Unicode support in Python, it is now straightforward to perform such a transliteration.</p>
<p>(This recipe was completely rewritten based on a comment by Mathieu Clabaut: many thanks to him!)</p>
State Machine for Text Processing (Python)
2009-01-21T14:01:23-08:00Jack Trainorhttp://code.activestate.com/recipes/users/4076953/http://code.activestate.com/recipes/576624-state-machine-for-text-processing/
<p style="color: grey">
Python
recipe 576624
by <a href="/recipes/users/4076953/">Jack Trainor</a>
(<a href="/recipes/tags/state_machine/">state_machine</a>, <a href="/recipes/tags/text_processing/">text_processing</a>).
</p>
<p>General state machine mechanism plus a specialized version, LineStateMachine, for processing text files based by using regular expressions to determine state transitions.</p>
grade keeper (Python)
2009-01-12T09:38:11-08:00Caleb Herberthttp://code.activestate.com/recipes/users/4118572/http://code.activestate.com/recipes/543261-grade-keeper/
<p style="color: grey">
Python
recipe 543261
by <a href="/recipes/users/4118572/">Caleb Herbert</a>
(<a href="/recipes/tags/easy/">easy</a>, <a href="/recipes/tags/grades/">grades</a>, <a href="/recipes/tags/homework/">homework</a>, <a href="/recipes/tags/records/">records</a>, <a href="/recipes/tags/school/">school</a>, <a href="/recipes/tags/simple/">simple</a>, <a href="/recipes/tags/text/">text</a>, <a href="/recipes/tags/text_processing/">text_processing</a>).
Revision 3.
</p>
<p>This code is was my first attempt at making a useful program. What it does is store grades in a text file after asking you a few questions like what subject, number of questions right, et cetera.</p>