Most viewed recipes tagged "parsing"http://code.activestate.com/recipes/tags/parsing/views/2016-04-10T22:43:57-07:00ActiveState Code RecipesSimple Web Crawler (Python)
2011-01-31T21:57:58-08:00James Millshttp://code.activestate.com/recipes/users/4167757/http://code.activestate.com/recipes/576551-simple-web-crawler/
<p style="color: grey">
Python
recipe 576551
by <a href="/recipes/users/4167757/">James Mills</a>
(<a href="/recipes/tags/crawler/">crawler</a>, <a href="/recipes/tags/network/">network</a>, <a href="/recipes/tags/parsing/">parsing</a>, <a href="/recipes/tags/web/">web</a>).
Revision 2.
</p>
<p>NOTE: This recipe has been updated with suggested improvements since the last revision.</p>
<p>This is a simple web crawler I wrote to
test websites and links. It will traverse
all links found to any given depth.</p>
<p>See --help for usage.</p>
<p>I'm posting this recipe as this kind of
problem has been asked on the Python
Mailing List a number of times... I
thought I'd share my simple little
implementation based on the standard
library and BeautifulSoup.</p>
<p>--JamesMills</p>
Get columns of data from text files (Python)
2010-10-28T16:18:19-07:00aliniumhttp://code.activestate.com/recipes/users/4175605/http://code.activestate.com/recipes/577444-get-columns-of-data-from-text-files/
<p style="color: grey">
Python
recipe 577444
by <a href="/recipes/users/4175605/">alinium</a>
(<a href="/recipes/tags/columns/">columns</a>, <a href="/recipes/tags/file/">file</a>, <a href="/recipes/tags/parsing/">parsing</a>).
</p>
<p>Read in a tab-delimited (or any separator-delimited like CSV) file and store each column in a list that can be referenced from a dictionary. The keys for the dictionary are the headings for the columns (if any). All data is read in as strings.</p>
Copy directory tree recursively while ignoring cvs, git and svn directories (Python)
2008-12-18T21:36:12-08:00Senthil Kumaranhttp://code.activestate.com/recipes/users/4165833/http://code.activestate.com/recipes/576588-copy-directory-tree-recursively-while-ignoring-cvs/
<p style="color: grey">
Python
recipe 576588
by <a href="/recipes/users/4165833/">Senthil Kumaran</a>
(<a href="/recipes/tags/parsing/">parsing</a>).
</p>
<p>I wanted to do a conditional copy of a directory tree.
Noticed a ignore parameter introduced in Python 2.6. Thats very handy. This snippet gives the example of its usage.</p>
How to parse a table in a PDF document (Python)
2016-04-10T22:43:57-07:00Jorj X. McKiehttp://code.activestate.com/recipes/users/4193772/http://code.activestate.com/recipes/580635-how-to-parse-a-table-in-a-pdf-document/
<p style="color: grey">
Python
recipe 580635
by <a href="/recipes/users/4193772/">Jorj X. McKie</a>
(<a href="/recipes/tags/cbz/">cbz</a>, <a href="/recipes/tags/epub/">epub</a>, <a href="/recipes/tags/fitz/">fitz</a>, <a href="/recipes/tags/mupdf/">mupdf</a>, <a href="/recipes/tags/openxps/">openxps</a>, <a href="/recipes/tags/parsing/">parsing</a>, <a href="/recipes/tags/pdf/">pdf</a>, <a href="/recipes/tags/pymupdf/">pymupdf</a>, <a href="/recipes/tags/table/">table</a>, <a href="/recipes/tags/xps/">xps</a>).
Revision 4.
</p>
<p>A Python function that converts a table contained in a page of a PDF (or OpenXPS, EPUB, CBZ, XPS) document to a matrix-like Python object (list of lists of strings).</p>
Flexible datetime parsing (Python)
2012-08-21T07:35:34-07:00Glenn Hutchingshttp://code.activestate.com/recipes/users/4175415/http://code.activestate.com/recipes/578245-flexible-datetime-parsing/
<p style="color: grey">
Python
recipe 578245
by <a href="/recipes/users/4175415/">Glenn Hutchings</a>
(<a href="/recipes/tags/datetime/">datetime</a>, <a href="/recipes/tags/parsing/">parsing</a>).
</p>
<p>The strptime() method of datetime accepts a format string that you have to specify in advance. What if you want to be more flexible in the kinds of date your program accepts? Here's a recipe for a function that tries many different formats until it finds one that works.</p>
Expression Evaluator (Python)
2009-06-02T23:57:44-07:00Stephen Chappellhttp://code.activestate.com/recipes/users/2608421/http://code.activestate.com/recipes/576790-expression-evaluator/
<p style="color: grey">
Python
recipe 576790
by <a href="/recipes/users/2608421/">Stephen Chappell</a>
(<a href="/recipes/tags/evaluation/">evaluation</a>, <a href="/recipes/tags/expressions/">expressions</a>, <a href="/recipes/tags/parsing/">parsing</a>).
Revision 4.
</p>
<p>After reading a portion of the book "The C# Programming Language: Third Edition," I found a section in the introduction that introduced abstract classes and methods that involved an example that included the concept of expression trees. The code was easy to implement since it just had to be copied out of the book. After playing around with the program a little and extending it, I thought that it would be fun to write a program in C# that could (interactively) evaluate expressions and display the results. Not knowing C# quite as well as Python led to the following program written and tested in Python 3.0 (not sure about previous languages).</p>
<p>The first section of the code includes port of the program from the aforementioned book along with extra code that allows for further features not originally included in the C# version. Those sections are clearly marked as being new code written by yours truly. The second area of the program has six functions that are profusely documented so as to explain how they go about parsing and processing expressions entered for evaluation. For those wishing to use the code, the "run" function should be all that you need. The final part of the module contains a test program that can be used to check the validity of the how well the program works.</p>
<p>The parser is not very complicated and will except expressions that are both normal to Python and completely illegal in Python. The main features are its ability to (1) identify simple assignment and mathematical operations, (2) identify constant floating point numbers, and (3) identify variables that would otherwise have no other meaning to the program. A limited number of error messages are given when appropriate but may leave one guessing what the problem really is. Mathematical operations are evaluated from left to right without regards to precedence, and assignment statements are evaluated from right to left.</p>
Parse HTTP date-time string (Python)
2010-01-20T13:47:50-08:00Sridhar Ratnakumarhttp://code.activestate.com/recipes/users/4169511/http://code.activestate.com/recipes/577015-parse-http-date-time-string/
<p style="color: grey">
Python
recipe 577015
by <a href="/recipes/users/4169511/">Sridhar Ratnakumar</a>
(<a href="/recipes/tags/datetime/">datetime</a>, <a href="/recipes/tags/http/">http</a>, <a href="/recipes/tags/parsing/">parsing</a>).
</p>
<p>This recipe will help you parse datetime strings returned by HTTP servers following the RFC 2616 standard (which <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.3">supports three datetime formats</a>). Credit for this recipe goes to <a href="http://stackoverflow.com/questions/1471987/how-do-i-parse-an-http-date-string-in-python/1472336#1472336">ΤΖΩΤΖΙΟΥ</a>.</p>
Remove the .pyc files from current directory tree and from svn (Python)
2009-02-03T23:38:43-08:00Senthil Kumaranhttp://code.activestate.com/recipes/users/4165833/http://code.activestate.com/recipes/576641-remove-the-pyc-files-from-current-directory-tree-a/
<p style="color: grey">
Python
recipe 576641
by <a href="/recipes/users/4165833/">Senthil Kumaran</a>
(<a href="/recipes/tags/parsing/">parsing</a>).
</p>
<p>I had mistakenly checked in .pyc files into svn, So I took this approach of deleting all the .pyc files in the current working copy directory tree and then using svn remove to the remove from the repository. The following is the snippet I wrote then to for the purpose.</p>
python xml parsing (Python)
2011-05-28T19:36:00-07:00abhijeet vaidyahttp://code.activestate.com/recipes/users/4178141/http://code.activestate.com/recipes/577727-python-xml-parsing/
<p style="color: grey">
Python
recipe 577727
by <a href="/recipes/users/4178141/">abhijeet vaidya</a>
(<a href="/recipes/tags/parsing/">parsing</a>, <a href="/recipes/tags/python/">python</a>, <a href="/recipes/tags/xml/">xml</a>).
</p>
<p>xml parsing to how to extract data</p>
Parse call function for Py2.6 and Py2.7 (Python)
2009-02-28T20:13:15-08:00Jervis Whitleyhttp://code.activestate.com/recipes/users/4169341/http://code.activestate.com/recipes/576671-parse-call-function-for-py26-and-py27/
<p style="color: grey">
Python
recipe 576671
by <a href="/recipes/users/4169341/">Jervis Whitley</a>
(<a href="/recipes/tags/ast/">ast</a>, <a href="/recipes/tags/call/">call</a>, <a href="/recipes/tags/function/">function</a>, <a href="/recipes/tags/namedtuple/">namedtuple</a>, <a href="/recipes/tags/nodevisitor/">nodevisitor</a>, <a href="/recipes/tags/parsing/">parsing</a>).
Revision 14.
</p>
<p>In some cases it may be desirable to parse the string expression "f1(*args)"
and return some of the key features of the represented function-like call. </p>
<p>This recipe returns the key features in the form of a namedtuple. </p>
<p>e.g. (for the above)</p>
<pre class="prettyprint"><code>>>> explain("f1(*args)")
[ Call(func='f1', starargs='args') ]
</code></pre>
<p>The recipe will return a list of such namedtuples for <code>"f1(*args)\nf2(*args)"</code>
Note that while the passed string expression must evaluate to valid python syntax,
names needn't be declared in current scope.</p>
Dragon Lexical Analyzer (Python)
2010-09-01T14:49:37-07:00Jack Trainorhttp://code.activestate.com/recipes/users/4076953/http://code.activestate.com/recipes/577380-dragon-lexical-analyzer/
<p style="color: grey">
Python
recipe 577380
by <a href="/recipes/users/4076953/">Jack Trainor</a>
(<a href="/recipes/tags/educational/">educational</a>, <a href="/recipes/tags/lexical_analyzer/">lexical_analyzer</a>, <a href="/recipes/tags/parsing/">parsing</a>).
Revision 2.
</p>
<p>The lexical analyzer from "Compliers: Principles, Techniques and Tools," Chapter 2, by Aho, Sethi, Ullman (1986) implemented in Python.</p>
Simple regex engine, elementary Python (Python)
2010-07-10T10:43:30-07:00Joost Behrendshttp://code.activestate.com/recipes/users/4174081/http://code.activestate.com/recipes/577251-simple-regex-engine-elementary-python/
<p style="color: grey">
Python
recipe 577251
by <a href="/recipes/users/4174081/">Joost Behrends</a>
(<a href="/recipes/tags/cached/">cached</a>, <a href="/recipes/tags/parse/">parse</a>, <a href="/recipes/tags/parsing/">parsing</a>, <a href="/recipes/tags/recursion/">recursion</a>, <a href="/recipes/tags/regular_expressions/">regular_expressions</a>).
Revision 40.
</p>
<p>A short engine for testing against a regex, understanding the 3 common quantifiers
?,+,* (non-greedy) working on characters, ., [...], [^...], \s, \S, bracketed patterns and group designators \N. Accepts unicode objects and fixed-width encoded strings
(but problems with eventual comparisons of trailing bytes in multi-byte utf-letters).
Captures up to 10 groups ( (?:...) implemented), which can be used for back referencing and in xreplace(). Captured groups are accessible after the search in the global list xGroups. | is supported, but only in groups and needing nested=True. nested=False is making '(' and ')' common letters.</p>
<p>This is not about Python or for Python, there it has little use beside re. But regarding that re needs about 6,000 lines you might agree with the author, that these 176 lines are powerful. This was the reason to publish it as a recipe - as a kind of (fairly complete) minimal example of a regex tester and as an example for corresponding recursive structures in data (TokenListCache) and code.</p>
<p>Working on this improved the author's understanding of regular expressions - especially of their eventual "greed". "Greedy" quantifiers are a concept, which has to be explained seperately and is coming unexpected: Whoever is scanning a text for <code>'<.*>'</code>, s/he will search SGML tags, not the whole text. Even with the star's "greediness" the code has to take care, that <code>'.*'</code> doesn't eat the whole text finding no match for <code>'<.*>'</code> at all. Thus the standard syntax with greedy quantifiers cannot be simpler to implement than this with its mere 3 lines 101, 111 and 121 preventing any greed. Perhaps it is faster - otherwise it is difficult to understand, why the concept "greed" is existing at all.</p>
<p>This engine might be useful here and then under circumstances with nothing else available. Its brevity eases translation to other languages and it can work with arbitrary characters for STAR or PERHAPS (for example).</p>
Extracting structured text or code (Python)
2011-05-18T13:04:01-07:00Mike Sweeneyhttp://code.activestate.com/recipes/users/4177990/http://code.activestate.com/recipes/577700-extracting-structured-text-or-code/
<p style="color: grey">
Python
recipe 577700
by <a href="/recipes/users/4177990/">Mike Sweeney</a>
(<a href="/recipes/tags/parsing/">parsing</a>, <a href="/recipes/tags/structured/">structured</a>, <a href="/recipes/tags/text_processing/">text_processing</a>, <a href="/recipes/tags/token/">token</a>).
Revision 2.
</p>
<p>This function uses the power of regular expressions to extract parts of a structured text string. It can build a token list from many types of code and data formats. It finds string types (with quotes) and nested structures that use parentheses, brackets, and braces. If you need to extract a different syntax, you can provide a custom token pattern in the function arguments.</p>
MicroXml: Stand-alone library for basic XML features (Python)
2015-12-04T22:36:56-08:00Jack Trainorhttp://code.activestate.com/recipes/users/4076953/http://code.activestate.com/recipes/579133-microxml-stand-alone-library-for-basic-xml-feature/
<p style="color: grey">
Python
recipe 579133
by <a href="/recipes/users/4076953/">Jack Trainor</a>
(<a href="/recipes/tags/parsing/">parsing</a>, <a href="/recipes/tags/xml/">xml</a>).
</p>
<p>MicroXml provides stand-alone support for the basic, most-used features of XML -- tags, attributes, and element values. It produces a DOM tree of XML nodes. It's compatible with Python 2.7 and Python 3. MicroXml does not support DTDs, CDATAs and other advanced XML features.</p>
<p>MicroXml is easy to use and easy to view/navigate its nodes in a debugger. It also includes a minimal XPath-like implementation.</p>
Nicer struct syntax thanks to Py3 metaclasses (Python)
2009-02-25T07:39:52-08:00Daniel Brodiehttp://code.activestate.com/recipes/users/1892511/http://code.activestate.com/recipes/576666-nicer-struct-syntax-thanks-to-py3-metaclasses/
<p style="color: grey">
Python
recipe 576666
by <a href="/recipes/users/1892511/">Daniel Brodie</a>
(<a href="/recipes/tags/binary/">binary</a>, <a href="/recipes/tags/parsing/">parsing</a>, <a href="/recipes/tags/py3/">py3</a>, <a href="/recipes/tags/struct/">struct</a>).
</p>
<p>This is a quick-hack module I wrote up in a couple of hours that allows for a nicer syntax to build up struct-like binary packing and unpacking. The point was to get it to be concise and as C-like as possible. This script requires python3 for it's improved metaclass support.</p>
A Basic USe flag EDitor for Gentoo Linux supporting on-the-fly editing (Python)
2015-02-28T07:04:31-08:00Mike 'Fuzzy' Partinhttp://code.activestate.com/recipes/users/4179778/http://code.activestate.com/recipes/579028-a-basic-use-flag-editor-for-gentoo-linux-supportin/
<p style="color: grey">
Python
recipe 579028
by <a href="/recipes/users/4179778/">Mike 'Fuzzy' Partin</a>
(<a href="/recipes/tags/parsing/">parsing</a>, <a href="/recipes/tags/popen/">popen</a>, <a href="/recipes/tags/subprocess/">subprocess</a>, <a href="/recipes/tags/user_input/">user_input</a>).
</p>
<p>This allows for on-the-fly editing. Simply drop abused.py into your path, and ensure that -a is not set in EMERGE_DEFAULT_OPTS in /etc/portage/make.conf. Then whenver you are installing new packages, use abused in place of emerge (eg: abused multitail) you will be presented with a list of use flags that are used in this action, and a prompt for editing any of them, simply hit enter with no changes to fire off the build.</p>
Simple tabulator (Python)
2010-11-09T12:50:06-08:00Noufal Ibrahimhttp://code.activestate.com/recipes/users/4173873/http://code.activestate.com/recipes/577458-simple-tabulator/
<p style="color: grey">
Python
recipe 577458
by <a href="/recipes/users/4173873/">Noufal Ibrahim</a>
(<a href="/recipes/tags/parsing/">parsing</a>, <a href="/recipes/tags/text_processing/">text_processing</a>).
</p>
<p>This is a simple script to covert a top to bottom list of items into a left to right list.</p>
<pre class="prettyprint"><code> a
b
c
d
e
f
g
h
i
j
k
l
m
</code></pre>
<p>into</p>
<pre class="prettyprint"><code> a b c d e f g
h i j k l m
</code></pre>
<p>A few command line options allow some amount of customisation. </p>
MicroXml: Stand-alone library for basic XML features (C++)
2016-02-18T19:33:07-08:00Jack Trainorhttp://code.activestate.com/recipes/users/4076953/http://code.activestate.com/recipes/580612-microxml-stand-alone-library-for-basic-xml-feature/
<p style="color: grey">
C++
recipe 580612
by <a href="/recipes/users/4076953/">Jack Trainor</a>
(<a href="/recipes/tags/parsing/">parsing</a>, <a href="/recipes/tags/xml/">xml</a>).
</p>
<p>MicroXml provides stand-alone support for the basic, most-used features of XML -- tags, attributes, and element values. It produces a DOM tree of XML nodes. MicroXml does not support DTDs, CDATAs and other advanced XML features.</p>
<p>MicroXml is easy to use and provides easy access to view/navigate its nodes in a debugger.</p>