Popular Python recipes tagged "text_processing"http://code.activestate.com/recipes/langs/python/tags/text_processing/2016-12-06T20:37:30-08:00ActiveState Code RecipesConvert wildcard text files to PDF with xtopdf (e.g. report*.txt) (Python)
2016-12-06T20:37:30-08:00Vasudev Ramhttp://code.activestate.com/recipes/users/4173351/http://code.activestate.com/recipes/580727-convert-wildcard-text-files-to-pdf-with-xtopdf-eg-/
<p style="color: grey">
Python
recipe 580727
by <a href="/recipes/users/4173351/">Vasudev Ram</a>
(<a href="/recipes/tags/conversion/">conversion</a>, <a href="/recipes/tags/files/">files</a>, <a href="/recipes/tags/globbing/">globbing</a>, <a href="/recipes/tags/patterns/">patterns</a>, <a href="/recipes/tags/pdf/">pdf</a>, <a href="/recipes/tags/pdfwriter/">pdfwriter</a>, <a href="/recipes/tags/pdf_generation/">pdf_generation</a>, <a href="/recipes/tags/text_processing/">text_processing</a>, <a href="/recipes/tags/wildcard/">wildcard</a>, <a href="/recipes/tags/xtopdf/">xtopdf</a>).
</p>
<p>This recipe shows how to convert all text files matching a filename wildcard to PDF, using the xtopdf PDF creation toolkit. For example, if you specify report<em>.txt as the wildcard, all files in the current directory that match report</em>.txt, will be converted to PDF, each in a separate PDF file. The original text files are not changed.</p>
<p>Here is a guide to installing and using xtopdf:</p>
<p><a href="http://jugad2.blogspot.in/2012/07/guide-to-installing-and-using-xtopdf.html" rel="nofollow">http://jugad2.blogspot.in/2012/07/guide-to-installing-and-using-xtopdf.html</a></p>
<p>More details on running the program, and sample output, are available here:</p>
<p><a href="http://jugad2.blogspot.in/2016/12/xtopdf-wildcard-text-files-to-pdf-with.html" rel="nofollow">http://jugad2.blogspot.in/2016/12/xtopdf-wildcard-text-files-to-pdf-with.html</a></p>
Batch conversion of text files to PDF with fileinput and xtopdf (Python)
2016-11-07T20:28:01-08:00Vasudev Ramhttp://code.activestate.com/recipes/users/4173351/http://code.activestate.com/recipes/580715-batch-conversion-of-text-files-to-pdf-with-fileinp/
<p style="color: grey">
Python
recipe 580715
by <a href="/recipes/users/4173351/">Vasudev Ram</a>
(<a href="/recipes/tags/batch/">batch</a>, <a href="/recipes/tags/batchmode/">batchmode</a>, <a href="/recipes/tags/conversion/">conversion</a>, <a href="/recipes/tags/files/">files</a>, <a href="/recipes/tags/pdf/">pdf</a>, <a href="/recipes/tags/pdfwriter/">pdfwriter</a>, <a href="/recipes/tags/python/">python</a>, <a href="/recipes/tags/text/">text</a>, <a href="/recipes/tags/text_processing/">text_processing</a>, <a href="/recipes/tags/utilities/">utilities</a>, <a href="/recipes/tags/xtopdf/">xtopdf</a>).
</p>
<p>This recipe shows how to do a batch conversion of the content of multiple text files into a single PDF file, with a) an automatic page break after the content of each text file (in the PDF output), b) page numbering, and c) a header and footer on each page.</p>
<p>It uses the fileinput module (part of the Python standard library), and xtopdf, a Python library for conversion of other formats to PDF.</p>
<p>xtopdf is available here: <a href="https://bitbucket.org/vasudevram/xtopdf" rel="nofollow">https://bitbucket.org/vasudevram/xtopdf</a></p>
<p>and a guide to installing and using xtopdf is here:</p>
<p><a href="http://jugad2.blogspot.in/2012/07/guide-to-installing-and-using-xtopdf.html" rel="nofollow">http://jugad2.blogspot.in/2012/07/guide-to-installing-and-using-xtopdf.html</a></p>
<p>Here is a sample run of the program:</p>
<p>python BTTP123.pdf text1.txt text2.txt text3.txt</p>
<p>This will read the content from the three text files specified and write it into the PDF file specified, neatly formatted.</p>
Routine to i18nify any word (Python)
2016-05-19T18:41:26-07:00Vasudev Ramhttp://code.activestate.com/recipes/users/4173351/http://code.activestate.com/recipes/580662-routine-to-i18nify-any-word/
<p style="color: grey">
Python
recipe 580662
by <a href="/recipes/users/4173351/">Vasudev Ram</a>
(<a href="/recipes/tags/i18n/">i18n</a>, <a href="/recipes/tags/python/">python</a>, <a href="/recipes/tags/strings/">strings</a>, <a href="/recipes/tags/text_processing/">text_processing</a>).
</p>
<p>This recipe shows a routine and a driver program that lets you "i18nify" any word, similar to how the word "internationalization" is shortened to "i18n", and "localization" to "l10n".</p>
[python3-tk/ttk] Onager Scratchpad (Python)
2016-04-24T02:34:03-07:00Mickey Kocichttp://code.activestate.com/recipes/users/4193984/http://code.activestate.com/recipes/580650-python3-tkttk-onager-scratchpad/
<p style="color: grey">
Python
recipe 580650
by <a href="/recipes/users/4193984/">Mickey Kocic</a>
(<a href="/recipes/tags/python3/">python3</a>, <a href="/recipes/tags/text_processing/">text_processing</a>, <a href="/recipes/tags/tkinter/">tkinter</a>, <a href="/recipes/tags/ttk/">ttk</a>, <a href="/recipes/tags/windows/">windows</a>).
Revision 2.
</p>
<p>I wrote this simple text editor to use for my diary. It's customized the way I like it, but the code is set up so it's easy for other people to change bg, fg, font, etc. I've also compiled a standalone Windows executable (thank you very much ActiveState! without ActivePython the compilation would have been impossible). Anyone who wants a copy of the executable is free to message or email me.</p>
<p>NOTE: If you get an error that the theme is not recognized, just comment out line 18 or run the following code in your python3 interpreter:</p>
<pre class="prettyprint"><code>>>>from tkinter.ttk import Style
>>>s = Style()
>>>s.theme_use()
</code></pre>
<p>You'll get a list of the available themes and can replace the 'alt' in line 18 with any one of them you want.</p>
The Bentley-Knuth problem and solutions (Python)
2014-03-15T23:46:59-07:00Vasudev Ramhttp://code.activestate.com/recipes/users/4173351/http://code.activestate.com/recipes/578851-the-bentley-knuth-problem-and-solutions/
<p style="color: grey">
Python
recipe 578851
by <a href="/recipes/users/4173351/">Vasudev Ram</a>
(<a href="/recipes/tags/algorithms/">algorithms</a>, <a href="/recipes/tags/python/">python</a>, <a href="/recipes/tags/text_processing/">text_processing</a>).
</p>
<p>This is a program Jon Bentley asked Donald Knuth to write, and is one that’s become familiar to people who use languages with serious text-handling capabilities: Read a file of text, determine the n most frequently used words, and print out a sorted list of those words along with their frequencies. I wrote 2 solutions for it earlier, in Python and in Unix shell. Also see the comment by a user on the post, giving another solution.</p>
Plain Text Editor in Python (Python)
2013-06-18T15:33:01-07:00Captain DeadBoneshttp://code.activestate.com/recipes/users/4184772/http://code.activestate.com/recipes/578568-plain-text-editor-in-python/
<p style="color: grey">
Python
recipe 578568
by <a href="/recipes/users/4184772/">Captain DeadBones</a>
(<a href="/recipes/tags/editor/">editor</a>, <a href="/recipes/tags/python/">python</a>, <a href="/recipes/tags/text/">text</a>, <a href="/recipes/tags/text_processing/">text_processing</a>).
</p>
<p>Just a simple text editor written in Python with Tk for graphics. </p>
<p>Check out my blog <a href="http://thelivingpearl.com/">Captain DeadBones Chronicles</a></p>
Text Editor in Python 3.3 (Python)
2013-06-19T15:58:17-07:00Stephen Chappellhttp://code.activestate.com/recipes/users/2608421/http://code.activestate.com/recipes/578569-text-editor-in-python-33/
<p style="color: grey">
Python
recipe 578569
by <a href="/recipes/users/2608421/">Stephen Chappell</a>
(<a href="/recipes/tags/editor/">editor</a>, <a href="/recipes/tags/python/">python</a>, <a href="/recipes/tags/text/">text</a>, <a href="/recipes/tags/text_processing/">text_processing</a>).
</p>
<p>This is a simple text editor written in Python using <code>tkinter</code> for graphics.</p>
<p>Check out Captain DeadBones' <a href="http://thelivingpearl.com/">Chronicles</a> blog.</p>
slurp.py (Regex based simple parsing engine) (Python)
2013-05-26T18:00:58-07:00Mike 'Fuzzy' Partinhttp://code.activestate.com/recipes/users/4179778/http://code.activestate.com/recipes/578532-slurppy-regex-based-simple-parsing-engine/
<p style="color: grey">
Python
recipe 578532
by <a href="/recipes/users/4179778/">Mike 'Fuzzy' Partin</a>
(<a href="/recipes/tags/parser/">parser</a>, <a href="/recipes/tags/regex/">regex</a>, <a href="/recipes/tags/text_processing/">text_processing</a>).
</p>
<p>A parsing engine that allows you to define sets of patterns and callbacks, and process any I/O object in Pyton that has a readline() method.</p>
Text Model (Python)
2015-01-13T22:56:53-08:00Chris Eckerhttp://code.activestate.com/recipes/users/4180203/http://code.activestate.com/recipes/577978-text-model/
<p style="color: grey">
Python
recipe 577978
by <a href="/recipes/users/4180203/">Chris Ecker</a>
(<a href="/recipes/tags/datastuctures/">datastuctures</a>, <a href="/recipes/tags/text_processing/">text_processing</a>, <a href="/recipes/tags/tree/">tree</a>).
Revision 3.
</p>
<p>A tree data type holding text data together with styling information. </p>
Extracting structured text or code (Python)
2011-05-18T13:04:01-07:00Mike Sweeneyhttp://code.activestate.com/recipes/users/4177990/http://code.activestate.com/recipes/577700-extracting-structured-text-or-code/
<p style="color: grey">
Python
recipe 577700
by <a href="/recipes/users/4177990/">Mike Sweeney</a>
(<a href="/recipes/tags/parsing/">parsing</a>, <a href="/recipes/tags/structured/">structured</a>, <a href="/recipes/tags/text_processing/">text_processing</a>, <a href="/recipes/tags/token/">token</a>).
Revision 2.
</p>
<p>This function uses the power of regular expressions to extract parts of a structured text string. It can build a token list from many types of code and data formats. It finds string types (with quotes) and nested structures that use parentheses, brackets, and braces. If you need to extract a different syntax, you can provide a custom token pattern in the function arguments.</p>
Simple tabulator (Python)
2010-11-09T12:50:06-08:00Noufal Ibrahimhttp://code.activestate.com/recipes/users/4173873/http://code.activestate.com/recipes/577458-simple-tabulator/
<p style="color: grey">
Python
recipe 577458
by <a href="/recipes/users/4173873/">Noufal Ibrahim</a>
(<a href="/recipes/tags/parsing/">parsing</a>, <a href="/recipes/tags/text_processing/">text_processing</a>).
</p>
<p>This is a simple script to covert a top to bottom list of items into a left to right list.</p>
<pre class="prettyprint"><code> a
b
c
d
e
f
g
h
i
j
k
l
m
</code></pre>
<p>into</p>
<pre class="prettyprint"><code> a b c d e f g
h i j k l m
</code></pre>
<p>A few command line options allow some amount of customisation. </p>
Split a string on capitalized / uppercase char using Python (Python)
2009-12-11T23:16:36-08:00activestatehttp://code.activestate.com/recipes/users/4172588/http://code.activestate.com/recipes/576984-split-a-string-on-capitalized-uppercase-char-using/
<p style="color: grey">
Python
recipe 576984
by <a href="/recipes/users/4172588/">activestate</a>
(<a href="/recipes/tags/string/">string</a>, <a href="/recipes/tags/string_parsing/">string_parsing</a>, <a href="/recipes/tags/text_processing/">text_processing</a>).
Revision 6.
</p>
<p>By user <a href="http://code.activestate.com/recipes/users/2629617/" rel="nofollow">http://code.activestate.com/recipes/users/2629617/</a> in comment on <a href="http://code.activestate.com/recipes/440698/" rel="nofollow">http://code.activestate.com/recipes/440698/</a> but modified slightly.</p>
<p>Splits any string on upper case characters.</p>
<p>Ex.</p>
<pre class="prettyprint"><code>>>> print split_uppercase("thisIsIt and SoIsThis")
this Is It and So Is This
</code></pre>
<p>note the two spaces after 'and'</p>
uniform matcher( "re pattern" / re / func / dict / list / tuple / set ) (Python)
2009-05-06T06:17:16-07:00denishttp://code.activestate.com/recipes/users/4168005/http://code.activestate.com/recipes/576741-uniform-matcher-re-pattern-re-func-dict-list-tuple/
<p style="color: grey">
Python
recipe 576741
by <a href="/recipes/users/4168005/">denis</a>
(<a href="/recipes/tags/grep/">grep</a>, <a href="/recipes/tags/re/">re</a>, <a href="/recipes/tags/text_processing/">text_processing</a>, <a href="/recipes/tags/uniform/">uniform</a>).
Revision 2.
</p>
<p>matcher() makes a string matcher function from any of:</p>
<ul>
<li>"RE pattern string"</li>
<li>re.compile()</li>
<li>a function, i.e. callable</li>
<li>a dict / list / tuple / set / container</li>
</ul>
<p>This uniformity is simple, useful, a Good Thing.</p>
<p>A few example functions using matchers are here too: grep getfields kwgrep.</p>
Remove diatrical marks (including accents) from strings using latin alphabets (Python)
2009-02-11T11:40:55-08:00Sylvain Fourmanoithttp://code.activestate.com/recipes/users/4150902/http://code.activestate.com/recipes/576648-remove-diatrical-marks-including-accents-from-stri/
<p style="color: grey">
Python
recipe 576648
by <a href="/recipes/users/4150902/">Sylvain Fourmanoit</a>
(<a href="/recipes/tags/text/">text</a>, <a href="/recipes/tags/text_processing/">text_processing</a>).
Revision 7.
</p>
<p>Many written languages using latin alphabets employ <a href="http://en.wikipedia.org/wiki/Diacritic">diacritical marks</a>. Even today, it is still pretty common to encounter situations where it would be desirable to get rid of them: files naming, creation of easy to read URIs, indexing schemes, etc. </p>
<p>An easy way has always been to simply filter out any "decorated characters"; unfortunately, this does not preserve the base, undecorated glyphs. But thanks to Unicode support in Python, it is now straightforward to perform such a transliteration.</p>
<p>(This recipe was completely rewritten based on a comment by Mathieu Clabaut: many thanks to him!)</p>
State Machine for Text Processing (Python)
2009-01-21T14:01:23-08:00Jack Trainorhttp://code.activestate.com/recipes/users/4076953/http://code.activestate.com/recipes/576624-state-machine-for-text-processing/
<p style="color: grey">
Python
recipe 576624
by <a href="/recipes/users/4076953/">Jack Trainor</a>
(<a href="/recipes/tags/state_machine/">state_machine</a>, <a href="/recipes/tags/text_processing/">text_processing</a>).
</p>
<p>General state machine mechanism plus a specialized version, LineStateMachine, for processing text files based by using regular expressions to determine state transitions.</p>
grade keeper (Python)
2009-01-12T09:38:11-08:00Caleb Herberthttp://code.activestate.com/recipes/users/4118572/http://code.activestate.com/recipes/543261-grade-keeper/
<p style="color: grey">
Python
recipe 543261
by <a href="/recipes/users/4118572/">Caleb Herbert</a>
(<a href="/recipes/tags/easy/">easy</a>, <a href="/recipes/tags/grades/">grades</a>, <a href="/recipes/tags/homework/">homework</a>, <a href="/recipes/tags/records/">records</a>, <a href="/recipes/tags/school/">school</a>, <a href="/recipes/tags/simple/">simple</a>, <a href="/recipes/tags/text/">text</a>, <a href="/recipes/tags/text_processing/">text_processing</a>).
Revision 3.
</p>
<p>This code is was my first attempt at making a useful program. What it does is store grades in a text file after asking you a few questions like what subject, number of questions right, et cetera.</p>