Popular Python recipes tagged "text_processing"http://code.activestate.com/recipes/langs/python/tags/text_processing/2016-12-06T20:37:30-08:00ActiveState Code RecipesConvert wildcard text files to PDF with xtopdf (e.g. report*.txt) (Python) 2016-12-06T20:37:30-08:00Vasudev Ramhttp://code.activestate.com/recipes/users/4173351/http://code.activestate.com/recipes/580727-convert-wildcard-text-files-to-pdf-with-xtopdf-eg-/ <p style="color: grey"> Python recipe 580727 by <a href="/recipes/users/4173351/">Vasudev Ram</a> (<a href="/recipes/tags/conversion/">conversion</a>, <a href="/recipes/tags/files/">files</a>, <a href="/recipes/tags/globbing/">globbing</a>, <a href="/recipes/tags/patterns/">patterns</a>, <a href="/recipes/tags/pdf/">pdf</a>, <a href="/recipes/tags/pdfwriter/">pdfwriter</a>, <a href="/recipes/tags/pdf_generation/">pdf_generation</a>, <a href="/recipes/tags/text_processing/">text_processing</a>, <a href="/recipes/tags/wildcard/">wildcard</a>, <a href="/recipes/tags/xtopdf/">xtopdf</a>). </p> <p>This recipe shows how to convert all text files matching a filename wildcard to PDF, using the xtopdf PDF creation toolkit. For example, if you specify report<em>.txt as the wildcard, all files in the current directory that match report</em>.txt, will be converted to PDF, each in a separate PDF file. The original text files are not changed.</p> <p>Here is a guide to installing and using xtopdf:</p> <p><a href="http://jugad2.blogspot.in/2012/07/guide-to-installing-and-using-xtopdf.html" rel="nofollow">http://jugad2.blogspot.in/2012/07/guide-to-installing-and-using-xtopdf.html</a></p> <p>More details on running the program, and sample output, are available here:</p> <p><a href="http://jugad2.blogspot.in/2016/12/xtopdf-wildcard-text-files-to-pdf-with.html" rel="nofollow">http://jugad2.blogspot.in/2016/12/xtopdf-wildcard-text-files-to-pdf-with.html</a></p> Batch conversion of text files to PDF with fileinput and xtopdf (Python) 2016-11-07T20:28:01-08:00Vasudev Ramhttp://code.activestate.com/recipes/users/4173351/http://code.activestate.com/recipes/580715-batch-conversion-of-text-files-to-pdf-with-fileinp/ <p style="color: grey"> Python recipe 580715 by <a href="/recipes/users/4173351/">Vasudev Ram</a> (<a href="/recipes/tags/batch/">batch</a>, <a href="/recipes/tags/batchmode/">batchmode</a>, <a href="/recipes/tags/conversion/">conversion</a>, <a href="/recipes/tags/files/">files</a>, <a href="/recipes/tags/pdf/">pdf</a>, <a href="/recipes/tags/pdfwriter/">pdfwriter</a>, <a href="/recipes/tags/python/">python</a>, <a href="/recipes/tags/text/">text</a>, <a href="/recipes/tags/text_processing/">text_processing</a>, <a href="/recipes/tags/utilities/">utilities</a>, <a href="/recipes/tags/xtopdf/">xtopdf</a>). </p> <p>This recipe shows how to do a batch conversion of the content of multiple text files into a single PDF file, with a) an automatic page break after the content of each text file (in the PDF output), b) page numbering, and c) a header and footer on each page.</p> <p>It uses the fileinput module (part of the Python standard library), and xtopdf, a Python library for conversion of other formats to PDF.</p> <p>xtopdf is available here: <a href="https://bitbucket.org/vasudevram/xtopdf" rel="nofollow">https://bitbucket.org/vasudevram/xtopdf</a></p> <p>and a guide to installing and using xtopdf is here:</p> <p><a href="http://jugad2.blogspot.in/2012/07/guide-to-installing-and-using-xtopdf.html" rel="nofollow">http://jugad2.blogspot.in/2012/07/guide-to-installing-and-using-xtopdf.html</a></p> <p>Here is a sample run of the program:</p> <p>python BTTP123.pdf text1.txt text2.txt text3.txt</p> <p>This will read the content from the three text files specified and write it into the PDF file specified, neatly formatted.</p> Routine to i18nify any word (Python) 2016-05-19T18:41:26-07:00Vasudev Ramhttp://code.activestate.com/recipes/users/4173351/http://code.activestate.com/recipes/580662-routine-to-i18nify-any-word/ <p style="color: grey"> Python recipe 580662 by <a href="/recipes/users/4173351/">Vasudev Ram</a> (<a href="/recipes/tags/i18n/">i18n</a>, <a href="/recipes/tags/python/">python</a>, <a href="/recipes/tags/strings/">strings</a>, <a href="/recipes/tags/text_processing/">text_processing</a>). </p> <p>This recipe shows a routine and a driver program that lets you "i18nify" any word, similar to how the word "internationalization" is shortened to "i18n", and "localization" to "l10n".</p> [python3-tk/ttk] Onager Scratchpad (Python) 2016-04-24T02:34:03-07:00Mickey Kocichttp://code.activestate.com/recipes/users/4193984/http://code.activestate.com/recipes/580650-python3-tkttk-onager-scratchpad/ <p style="color: grey"> Python recipe 580650 by <a href="/recipes/users/4193984/">Mickey Kocic</a> (<a href="/recipes/tags/python3/">python3</a>, <a href="/recipes/tags/text_processing/">text_processing</a>, <a href="/recipes/tags/tkinter/">tkinter</a>, <a href="/recipes/tags/ttk/">ttk</a>, <a href="/recipes/tags/windows/">windows</a>). Revision 2. </p> <p>I wrote this simple text editor to use for my diary. It's customized the way I like it, but the code is set up so it's easy for other people to change bg, fg, font, etc. I've also compiled a standalone Windows executable (thank you very much ActiveState! without ActivePython the compilation would have been impossible). Anyone who wants a copy of the executable is free to message or email me.</p> <p>NOTE: If you get an error that the theme is not recognized, just comment out line 18 or run the following code in your python3 interpreter:</p> <pre class="prettyprint"><code>&gt;&gt;&gt;from tkinter.ttk import Style &gt;&gt;&gt;s = Style() &gt;&gt;&gt;s.theme_use() </code></pre> <p>You'll get a list of the available themes and can replace the 'alt' in line 18 with any one of them you want.</p> The Bentley-Knuth problem and solutions (Python) 2014-03-15T23:46:59-07:00Vasudev Ramhttp://code.activestate.com/recipes/users/4173351/http://code.activestate.com/recipes/578851-the-bentley-knuth-problem-and-solutions/ <p style="color: grey"> Python recipe 578851 by <a href="/recipes/users/4173351/">Vasudev Ram</a> (<a href="/recipes/tags/algorithms/">algorithms</a>, <a href="/recipes/tags/python/">python</a>, <a href="/recipes/tags/text_processing/">text_processing</a>). </p> <p>This is a program Jon Bentley asked Donald Knuth to write, and is one that’s become familiar to people who use languages with serious text-handling capabilities: Read a file of text, determine the n most frequently used words, and print out a sorted list of those words along with their frequencies. I wrote 2 solutions for it earlier, in Python and in Unix shell. Also see the comment by a user on the post, giving another solution.</p> Plain Text Editor in Python (Python) 2013-06-18T15:33:01-07:00Captain DeadBoneshttp://code.activestate.com/recipes/users/4184772/http://code.activestate.com/recipes/578568-plain-text-editor-in-python/ <p style="color: grey"> Python recipe 578568 by <a href="/recipes/users/4184772/">Captain DeadBones</a> (<a href="/recipes/tags/editor/">editor</a>, <a href="/recipes/tags/python/">python</a>, <a href="/recipes/tags/text/">text</a>, <a href="/recipes/tags/text_processing/">text_processing</a>). </p> <p>Just a simple text editor written in Python with Tk for graphics. </p> <p>Check out my blog <a href="http://thelivingpearl.com/">Captain DeadBones Chronicles</a></p> Text Editor in Python 3.3 (Python) 2013-06-19T15:58:17-07:00Stephen Chappellhttp://code.activestate.com/recipes/users/2608421/http://code.activestate.com/recipes/578569-text-editor-in-python-33/ <p style="color: grey"> Python recipe 578569 by <a href="/recipes/users/2608421/">Stephen Chappell</a> (<a href="/recipes/tags/editor/">editor</a>, <a href="/recipes/tags/python/">python</a>, <a href="/recipes/tags/text/">text</a>, <a href="/recipes/tags/text_processing/">text_processing</a>). </p> <p>This is a simple text editor written in Python using <code>tkinter</code> for graphics.</p> <p>Check out Captain DeadBones' <a href="http://thelivingpearl.com/">Chronicles</a> blog.</p> slurp.py (Regex based simple parsing engine) (Python) 2013-05-26T18:00:58-07:00Mike 'Fuzzy' Partinhttp://code.activestate.com/recipes/users/4179778/http://code.activestate.com/recipes/578532-slurppy-regex-based-simple-parsing-engine/ <p style="color: grey"> Python recipe 578532 by <a href="/recipes/users/4179778/">Mike 'Fuzzy' Partin</a> (<a href="/recipes/tags/parser/">parser</a>, <a href="/recipes/tags/regex/">regex</a>, <a href="/recipes/tags/text_processing/">text_processing</a>). </p> <p>A parsing engine that allows you to define sets of patterns and callbacks, and process any I/O object in Pyton that has a readline() method.</p> Text Model (Python) 2015-01-13T22:56:53-08:00Chris Eckerhttp://code.activestate.com/recipes/users/4180203/http://code.activestate.com/recipes/577978-text-model/ <p style="color: grey"> Python recipe 577978 by <a href="/recipes/users/4180203/">Chris Ecker</a> (<a href="/recipes/tags/datastuctures/">datastuctures</a>, <a href="/recipes/tags/text_processing/">text_processing</a>, <a href="/recipes/tags/tree/">tree</a>). Revision 3. </p> <p>A tree data type holding text data together with styling information. </p> Extracting structured text or code (Python) 2011-05-18T13:04:01-07:00Mike Sweeneyhttp://code.activestate.com/recipes/users/4177990/http://code.activestate.com/recipes/577700-extracting-structured-text-or-code/ <p style="color: grey"> Python recipe 577700 by <a href="/recipes/users/4177990/">Mike Sweeney</a> (<a href="/recipes/tags/parsing/">parsing</a>, <a href="/recipes/tags/structured/">structured</a>, <a href="/recipes/tags/text_processing/">text_processing</a>, <a href="/recipes/tags/token/">token</a>). Revision 2. </p> <p>This function uses the power of regular expressions to extract parts of a structured text string. It can build a token list from many types of code and data formats. It finds string types (with quotes) and nested structures that use parentheses, brackets, and braces. If you need to extract a different syntax, you can provide a custom token pattern in the function arguments.</p> Simple tabulator (Python) 2010-11-09T12:50:06-08:00Noufal Ibrahimhttp://code.activestate.com/recipes/users/4173873/http://code.activestate.com/recipes/577458-simple-tabulator/ <p style="color: grey"> Python recipe 577458 by <a href="/recipes/users/4173873/">Noufal Ibrahim</a> (<a href="/recipes/tags/parsing/">parsing</a>, <a href="/recipes/tags/text_processing/">text_processing</a>). </p> <p>This is a simple script to covert a top to bottom list of items into a left to right list.</p> <pre class="prettyprint"><code> a b c d e f g h i j k l m </code></pre> <p>into</p> <pre class="prettyprint"><code> a b c d e f g h i j k l m </code></pre> <p>A few command line options allow some amount of customisation. </p> Split a string on capitalized / uppercase char using Python (Python) 2009-12-11T23:16:36-08:00activestatehttp://code.activestate.com/recipes/users/4172588/http://code.activestate.com/recipes/576984-split-a-string-on-capitalized-uppercase-char-using/ <p style="color: grey"> Python recipe 576984 by <a href="/recipes/users/4172588/">activestate</a> (<a href="/recipes/tags/string/">string</a>, <a href="/recipes/tags/string_parsing/">string_parsing</a>, <a href="/recipes/tags/text_processing/">text_processing</a>). Revision 6. </p> <p>By user <a href="http://code.activestate.com/recipes/users/2629617/" rel="nofollow">http://code.activestate.com/recipes/users/2629617/</a> in comment on <a href="http://code.activestate.com/recipes/440698/" rel="nofollow">http://code.activestate.com/recipes/440698/</a> but modified slightly.</p> <p>Splits any string on upper case characters.</p> <p>Ex.</p> <pre class="prettyprint"><code>&gt;&gt;&gt; print split_uppercase("thisIsIt and SoIsThis") this Is It and So Is This </code></pre> <p>note the two spaces after 'and'</p> uniform matcher( "re pattern" / re / func / dict / list / tuple / set ) (Python) 2009-05-06T06:17:16-07:00denishttp://code.activestate.com/recipes/users/4168005/http://code.activestate.com/recipes/576741-uniform-matcher-re-pattern-re-func-dict-list-tuple/ <p style="color: grey"> Python recipe 576741 by <a href="/recipes/users/4168005/">denis</a> (<a href="/recipes/tags/grep/">grep</a>, <a href="/recipes/tags/re/">re</a>, <a href="/recipes/tags/text_processing/">text_processing</a>, <a href="/recipes/tags/uniform/">uniform</a>). Revision 2. </p> <p>matcher() makes a string matcher function from any of:</p> <ul> <li>"RE pattern string"</li> <li>re.compile()</li> <li>a function, i.e. callable</li> <li>a dict / list / tuple / set / container</li> </ul> <p>This uniformity is simple, useful, a Good Thing.</p> <p>A few example functions using matchers are here too: grep getfields kwgrep.</p> Remove diatrical marks (including accents) from strings using latin alphabets (Python) 2009-02-11T11:40:55-08:00Sylvain Fourmanoithttp://code.activestate.com/recipes/users/4150902/http://code.activestate.com/recipes/576648-remove-diatrical-marks-including-accents-from-stri/ <p style="color: grey"> Python recipe 576648 by <a href="/recipes/users/4150902/">Sylvain Fourmanoit</a> (<a href="/recipes/tags/text/">text</a>, <a href="/recipes/tags/text_processing/">text_processing</a>). Revision 7. </p> <p>Many written languages using latin alphabets employ <a href="http://en.wikipedia.org/wiki/Diacritic">diacritical marks</a>. Even today, it is still pretty common to encounter situations where it would be desirable to get rid of them: files naming, creation of easy to read URIs, indexing schemes, etc. </p> <p>An easy way has always been to simply filter out any "decorated characters"; unfortunately, this does not preserve the base, undecorated glyphs. But thanks to Unicode support in Python, it is now straightforward to perform such a transliteration.</p> <p>(This recipe was completely rewritten based on a comment by Mathieu Clabaut: many thanks to him!)</p> State Machine for Text Processing (Python) 2009-01-21T14:01:23-08:00Jack Trainorhttp://code.activestate.com/recipes/users/4076953/http://code.activestate.com/recipes/576624-state-machine-for-text-processing/ <p style="color: grey"> Python recipe 576624 by <a href="/recipes/users/4076953/">Jack Trainor</a> (<a href="/recipes/tags/state_machine/">state_machine</a>, <a href="/recipes/tags/text_processing/">text_processing</a>). </p> <p>General state machine mechanism plus a specialized version, LineStateMachine, for processing text files based by using regular expressions to determine state transitions.</p> grade keeper (Python) 2009-01-12T09:38:11-08:00Caleb Herberthttp://code.activestate.com/recipes/users/4118572/http://code.activestate.com/recipes/543261-grade-keeper/ <p style="color: grey"> Python recipe 543261 by <a href="/recipes/users/4118572/">Caleb Herbert</a> (<a href="/recipes/tags/easy/">easy</a>, <a href="/recipes/tags/grades/">grades</a>, <a href="/recipes/tags/homework/">homework</a>, <a href="/recipes/tags/records/">records</a>, <a href="/recipes/tags/school/">school</a>, <a href="/recipes/tags/simple/">simple</a>, <a href="/recipes/tags/text/">text</a>, <a href="/recipes/tags/text_processing/">text_processing</a>). Revision 3. </p> <p>This code is was my first attempt at making a useful program. What it does is store grades in a text file after asking you a few questions like what subject, number of questions right, et cetera.</p>