Generic filter logic strategy « Python recipes

By storing an object's data in a dictionary, you can easily create a Filter class which tests a given object against a set of nested inclusion rules. This works particularly well for database filtering.

      Pure object filter example:

cmpcasts = {'__lt__': lambda x, y: x < y,
            '__le__': lambda x, y: x <= y,
            '__eq__': lambda x, y: x == y,
            '__ne__': lambda x, y: x != y,
            '__gt__': lambda x, y: x > y,
            '__ge__': lambda x, y: x >= y
            }

def short_or(key, aList, anObject):
    """A helper function for parse_branch,
       since reduce(or, [value]) won't short-circuit."""
    for eachValue in aList:
        if parse_branch(key, eachValue, anObject): return True
    return False

def opval(key, value, anObject):
    """A helper function for parse_branch."""
    operation, value = value
    # value = parse_branch(key, value, anObject)
    try:
        anOp = getattr(anObject.data[key], operation)
        return apply(anOp, [value])
    except AttributeError:
        return cmpcasts[operation](anObject.data[key], value)

def parse_branch(key, value, anObject):
    """Parse a BizFilter branch according to the following grammar:
    
    {name: value} implies an atomic operation on a value of name 'name'.
    Multiple {names} implies an 'and' relationship between them.
    
    [value, value, value, ...] implies an 'or' relationship.
    
    (operation, value) implies anObject.operation(value).
    
    Any other value is taken as a primitive type.
    
    Example: ((not isInvalid) and ((delivery is None) or delivery >= 3/13/29))
           = {'isInvalid': False,
              'delivery': [None,
                          ('__gt__', aDate('3/13/1929'))
                          ]
              }
    """
    
    if type(value) == type({}):
        if len(value) == 0: return True
        return reduce(lambda x,y: x and y,
                      [parse_branch(eachKey, eachValue, anObject)
                       for eachKey, eachValue in value.items()])
    elif type(value) == type([]):
        if len(value) == 0: return False
        return short_or(key, value, anObject)
    elif type(value) == type(()):
        return opval(key, value, anObject)
    else:
        try:
            return value == anObject.data[key]
        except KeyError: raise PropertyNotFoundError(key)



#######################

ADO Database example:

cmpcasts_ADO = {'__lt__': "<", '__le__': "<=",
            '__eq__': "=", '__ne__': "<>",
            '__gt__': ">", '__ge__': ">="
            }

def short_or_ADO(key, aList):
    """A helper function for parse_branch."""
    return "(%s)" % (" or ".join([("%s" % parse_branch_ADO(key, eachValue))
                        for eachValue in aList]))

def opval_ADO(key, value):
    """A helper function for parse_branch."""
    operation, value = value
    return "[%s] %s %s" % (key, cmpcasts_ADO[operation], quoted_field(value))

def parse_branch_ADO(key, value):
    """Parse a Filter branch according to the following grammar:
    (snip)
    """
    
    if type(value) == type({}):
        if len(value) == 0: return "True"
        return "(%s)" % reduce(lambda x,y: "%s and %s" % (x, y),
                      [parse_branch_ADO(eachKey, eachValue)
                       for eachKey, eachValue in value.items()])
    elif type(value) == type([]):
        if len(value) == 0: return "False"
        return short_or_ADO(key, value)
    elif type(value) == type(()):
        return opval_ADO(key, value)
    else:
        if value is None:
            return "IsNull([%s])" % (key)
        else:
            return "[%s] = %s" % (key, quoted_field(value))

      

Implementation disclaimer:

Obviously, some parts of both scripts are incomplete. I have a custom date class, for example, which I felt you could infer on your own. I also do more complex casting than quoted_field(). And my Objects and Filters (and many other parts) are more complex and integrated into my implementations. Since those bits are extraneous to the core concept, I left them as exercises for the implementer.

Discussion by example:

I'm going to discuss the "Pure object filter" example only, since the ADO example is merely a similar implementation of the same idea. The basic idea is that you have two items in your possession:

An Object with some attributes. The Object in the first example has a .data attribute, which is a dictionary of the form {name: value}. Let's pretend that "value" is always a string, int, or other atomic type (no sequences). An example Object might be:

anObject = MyObject() anObject.data = {'ID': 30124, 'Name': 'GvR', 'Number of Contributions': 82739}

A Filter that also has some attributes. My Filter objects also have a .data attribute; however, these are quite different from the .data attributes of Objects. An example Filter might look like this:

aFilter = MyFilter() aFilter.data = [{'Name': 'GvR'}, {'Number of Contributions': ('__ge__', 1000)}]

This simple Filter will allow us to take our example Object, or a series of such Objects, and test them. In plain language, we might be evaluating a list of contributors, and want to produce a subset of that list: people with 1000 or more contributions. Just in case GvR is having a slow period, we include him in the list regardless of his number of contributions.

I could probably take several pages to describe all the possibilities, but I'll simply walk through this example Filter, and show you how parse_branch() does its thing. We begin by passing our Filter's .data attribute to parse_branch:

aResult = parse_branch('', aFilter.data, anObject)

At the topmost level, we are asking parse_branch to take anObject and analyze it against our Filter and its ruleset. It does this by recursively unpacking sequences in our Filter in particular ways. The basic rules are:

Items in a list [] should be or'ed together.
Items in a dictionary {} should be and'ed together.
Items in a 2-tuple (the only allowable kind) are of the form: (operation, value) and imply the expression: anObject.operation(value).
Any other item should be directly compared for equivalence against anObject.

We have all four in our example (all right, our dictionary is something of a degenerate case, but it's there nonetheless). Let's quickly walk through them.

The first container is a list. This gets separated into its two items, and each is evaluated in order. When each has been evaluated (which may involve a recursive call to parse_branch), their True/False values will be or'ed together.

The first item in that outermost list is a dictionary. If it had multiple items, they would be and'ed together; our example only has one item. Therefore, it merely checks to see if anObject.data['Name'] equals 'GvR'. If it does, then our first list item is True, and the second item will not be evaluated, and anObject "passes" the Filter. If the 'Name' is not 'GvR', the second item is evaluated.

The second item is also a dictionary; however, its 'value' half isn't a simple string: it's a 2-tuple. This signals our parser to evaluate the expression: anObject.data['Number of Contributions'].__ge__(1000). Just in case our .data value is an object that doesn't support __ge__ (in Python 2.2, this includes many basic types like int), we provide a lambda function to perform the operation using the built-in behavior for that type; mostly __cmp__.

That's all there is to it! Our example might be written in straight Python code as:

anObject.data['Name'] == 'GvR' or anObject.data['Number of Contributions'] >= 1000

But of course, this isn't nearly as much fun as abstracting the process. ;)

The same example, run through the ADO version, does not test for truth value on its own. Instead, it produces valid SQL for ADO database connections. Our example Filter would produce something like:

"([Name] = 'GvR' or [Number of Contributions] >= 1000)"

which could then easily be inserted into a WHERE clause. I currently use this very technique in an object server design.

Analysis:

The point of the entire exercise is this: the same Filter can be evaluated using different strategies in different contexts. In the first example, we compared an Object directly to the filter. In the second example, we applied the same filter to a database query. Abstracting the logic in this way allows us to write Filter logic once, and use it in a variety of situations. It also allows us the usual benefits of abstraction: dynamic instantiation, code isolation, consistent expression, easier state management (you could pickle or otherwise store the Filters), and maintainability. Python, with its built-in sequences, gives us a simple, powerful way to express complex logic without introducing (too much) unfamiliar syntax.

Tags: algorithms

1 comment

a 14 years ago # | flag

apply() is deprecated. apply(anOp, [value]) should be changed to anOp(value)

◄	Python recipes (4591)	►
◄	Robert Brewer's recipes (3)	►

Generic filter logic strategy (Python recipe) by Robert Brewer
ActiveState Code (http://code.activestate.com/recipes/223612/)

1 comment

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

Generic filter logic strategy (Python recipe) by Robert Brewer ActiveState Code (http://code.activestate.com/recipes/223612/)

1 comment

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

Generic filter logic strategy (Python recipe) by Robert Brewer
ActiveState Code (http://code.activestate.com/recipes/223612/)