Uses a multiline regular expression to retrieve patch details from a darcs (http://www.darcs.net) inventory file.
The regular expression is defined in verbose format for sanity's sake.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | import re
import sha
from datetime import datetime
from time import strptime
PATCH_DATE_FORMAT = '%Y%m%d%H%M%S'
patch_pattern = r"""
\[ # Patch start indicator
(?P<name>[^\n]+)\n # Patch name (rest of same line)
(?P<author>[^\*]+) # Patch author
\* # Author/date separator
(?P<inverted>[-\*]) # Inverted patch indicator
(?P<date>\d{14}) # Patch date
(?:\n(?P<comment>(?:^\ [^\n]*\n)+))? # Optional long comment
\] # Patch end indicator
"""
patch_re = re.compile(patch_pattern, re.VERBOSE | re.MULTILINE)
tidy_comment_re = re.compile(r'^ ', re.MULTILINE)
class Patch:
"""
Patch details, as defined in a darcs inventory file.
Attribute names match those generated by the
``darcs changes --xml-output`` command.
"""
def __init__(self, name, author, date, inverted, comment=None):
self.name = name
self.author = author
self.date = datetime(*strptime(date, PATCH_DATE_FORMAT)[:6])
self.inverted = inverted
self.comment = comment
def __str__(self):
return self.name
@property
def hash(self):
"""
Calculates the filename of the gzipped file containing patch
contents in the repository's ``patches`` directory.
This consists of the patch date, a partial SHA-1 hash of the
patch author and a full SHA-1 hash of the complete patch
details.
"""
date_str = self.date.strftime(PATCH_DATE_FORMAT)
complete_patch_details = '%s%s%s%s%s' % (
self.name, self.author, date_str,
self.comment and ''.join([l.rstrip() for l in self.comment.split('\n')]) or '',
self.inverted and 't' or 'f',
)
return '%s-%s-%s.gz' % (date_str,
sha.new(self.author).hexdigest()[:5],
sha.new(complete_patch_details).hexdigest())
def parse_inventory(inventory):
"""
Given the contents of a darcs inventory file, generates ``Patch``
objects representing contained patch details.
"""
for match in patch_re.finditer(inventory):
attrs = match.groupdict(None)
attrs['inverted'] = (attrs['inverted'] == '-')
if attrs['comment'] is not None:
attrs['comment'] = tidy_comment_re.sub('', attrs['comment']).strip()
yield Patch(**attrs)
if __name__ == '__main__':
import urllib2
inventory = urllib2.urlopen('http://darcs.net/_darcs/inventory').read()
for patch in parse_inventory(inventory):
print patch.__dict__
|
Parsing a darcs inventory file for patch details is useful when you don't have darcs available locally or you don't have a copy of a given repository available locally. For example, I use this to display patch details in my tumblelog, pulling patch details from the inventories of a number of remote repositories on an hourly basis to look for any new patches committed since the last check.
I chose a multiline regular expression to do this as the format of a darcs patch entry is simple and consistent.
Note that this is not intended to retrieve the complete patch history for a repository (the inventory file may be reduced down to contain patches since the last tag applied to the repository when darcs optimize
is used), only details of patches defined in the current inventory file.
but if you do have darcs available to run, use xml.
unicode. This does not completely correspond to darcs changes --xml. For some stupid reason, darcs (1.0.9) outputs hi bit ascii as [_\XX_] in the xml output.