Parse patch details from a darcs inventory file « Python recipes

Uses a multiline regular expression to retrieve patch details from a darcs (http://www.darcs.net) inventory file.

The regular expression is defined in verbose format for sanity's sake.

      import re
import sha
from datetime import datetime
from time import strptime

PATCH_DATE_FORMAT = '%Y%m%d%H%M%S'

patch_pattern = r"""
   \[                                   # Patch start indicator
   (?P<name>[^\n]+)\n                   # Patch name (rest of same line)
   (?P<author>[^\*]+)                   # Patch author
   \*                                   # Author/date separator 
   (?P<inverted>[-\*])                  # Inverted patch indicator
   (?P<date>\d{14})                     # Patch date
   (?:\n(?P<comment>(?:^\ [^\n]*\n)+))? # Optional long comment
   \]                                   # Patch end indicator
   """
patch_re = re.compile(patch_pattern, re.VERBOSE | re.MULTILINE)
tidy_comment_re = re.compile(r'^ ', re.MULTILINE)

class Patch:
    """
    Patch details, as defined in a darcs inventory file.

    Attribute names match those generated by the
    ``darcs changes --xml-output`` command.
    """
    def __init__(self, name, author, date, inverted, comment=None):
        self.name = name
        self.author = author
        self.date = datetime(*strptime(date, PATCH_DATE_FORMAT)[:6])
        self.inverted = inverted
        self.comment = comment

    def __str__(self):
        return self.name

    @property
    def hash(self):
        """
        Calculates the filename of the gzipped file containing patch
        contents in the repository's ``patches`` directory.

        This consists of the patch date, a partial SHA-1 hash of the
        patch author and a full SHA-1 hash of the complete patch
        details.
        """
        date_str = self.date.strftime(PATCH_DATE_FORMAT)
        complete_patch_details = '%s%s%s%s%s' % (
            self.name, self.author, date_str,
            self.comment and ''.join([l.rstrip() for l in self.comment.split('\n')]) or '',
            self.inverted and 't' or 'f',
        )
        return '%s-%s-%s.gz' % (date_str,
                                sha.new(self.author).hexdigest()[:5],
                                sha.new(complete_patch_details).hexdigest())

def parse_inventory(inventory):
    """
    Given the contents of a darcs inventory file, generates ``Patch``
    objects representing contained patch details.
    """
    for match in patch_re.finditer(inventory):
        attrs = match.groupdict(None)
        attrs['inverted'] = (attrs['inverted'] == '-')
        if attrs['comment'] is not None:
            attrs['comment'] = tidy_comment_re.sub('', attrs['comment']).strip()
        yield Patch(**attrs)

if __name__ == '__main__':
    import urllib2
    inventory = urllib2.urlopen('http://darcs.net/_darcs/inventory').read()
    for patch in parse_inventory(inventory):
        print patch.__dict__

      

Parsing a darcs inventory file for patch details is useful when you don't have darcs available locally or you don't have a copy of a given repository available locally. For example, I use this to display patch details in my tumblelog, pulling patch details from the inventories of a number of remote repositories on an hourly basis to look for any new patches committed since the last check.

I chose a multiline regular expression to do this as the format of a darcs patch entry is simple and consistent.

Note that this is not intended to retrieve the complete patch history for a repository (the inventory file may be reduced down to contain patches since the last tag applied to the repository when darcs optimize is used), only details of patches defined in the current inventory file.

Tags: text

2 comments

Drew Perttula 16 years, 10 months ago # | flag

but if you do have darcs available to run, use xml.

import subprocess
from elementtree import ElementTree   # or use the stdlib one in py2.5+

changelog = ElementTree.parse(subprocess.Popen(
              ["darcs", "changes", "--xml-output"],
              stdout=subprocess.PIPE).stdout)
for patch in changelog.find("patch"):
  print patch.get('author'), patch.find('name').text

Han-Wen Nienhuys 16 years, 5 months ago # | flag

unicode. This does not completely correspond to darcs changes --xml. For some stupid reason, darcs (1.0.9) outputs hi bit ascii as [_\XX_] in the xml output.

◄	Python recipes (4591)	►
◄	Jonathan Buchanan's recipes (1)	►

Parse patch details from a darcs inventory file (Python recipe) by Jonathan Buchanan
ActiveState Code (http://code.activestate.com/recipes/521889/)

2 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

Parse patch details from a darcs inventory file (Python recipe) by Jonathan Buchanan ActiveState Code (http://code.activestate.com/recipes/521889/)

2 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

Parse patch details from a darcs inventory file (Python recipe) by Jonathan Buchanan
ActiveState Code (http://code.activestate.com/recipes/521889/)