Welcome, guest | Sign In | My Account | Store | Cart

This program uses a digital trie search (Knuth 1998) to find the network of a given IPv4 address. It is fast, and can find nested subnets.

Python, 166 lines
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
# David.Clark (a) CIBER-research.eu 2012-12-20

# CIBER Research Ltd.
# version 10 hybrid for both clarity and optimisation, + bug fix.
# this code is online at
#       http://code.activestate.com/recipes/576479/

""" Create a digital trie of ipranges(CIDR); for a given ip 
determine membership of smallest CIDR (if any) by digital search. 

    'Instead of basing a search method on comparisons between keys, 
we can make use of their representation as a sequence of digits or
alphabetic characters.'
        Knuth. Art of Computer Programming vol 3, 2e 1998 (6.3 p492)
"""
	
def bytes2bits():
    """Create a table of bit values for int(0) to int(256)."""
    # derived from  Python Cookbook 2e 2005.
    bytes = [None] * 256
    for n in xrange(256):
        bits = []
        q = n                       # q is modified by bitshift
        for b in xrange(0,8):
            if (q & 128):
                bits.append(1)
            else:
                bits.append(0)
            q <<= 1
        bytes[n] = tuple(bits)      # make immutable
    return tuple(bytes)
# One-time call to build table; rebind the funtion name to its result.
bytes2bits = bytes2bits()

class ProgramError(ValueError):
    """This is a bug"""

class CIDRnode(object):
    def __init__(self, cidr=None):
        self.cidr = cidr
        self.left = None
        self.right = None

    def get_net(self):
        if self.cidr:
            return self.cidr.split('/')[0]
        else:
            return None
    network = property(get_net, doc='IPv4 dotted quad')

    def get_bits(self):
        if self.cidr:
            return int(self.cidr.split('/')[1])
        else:
            return None
    netbits = property(get_bits, doc='IPv4 netmask bits')

class CIDRtrie(object):
    """
    take a shortcut by using a list for the first octet as any CIDR 
    will be no greater than class A. Thereafter work down a binary tree
    of ip network addresses. If a node has a CIDR record stack the result,
    but continue down the tree while there are links. (Not all Nodes 
    have CIDR records.). At leif node pop (the best) result.
    """
    # we need some definitions before we can __init__()
    @staticmethod
    def byte2bit(ip):
        for q in ip:
            q = int(q) 
            for b in xrange(0,8):
                yield 1 if (q & 128) else 0
                q <<= 1

    def add_cidr(self, cidr):
        """Build the trie. For clarity this is the non-optimised version."""
        c = cidr.split('/')
        network = c[0].split('.')
        nm_bit = int(c[1])
        classA = int(network[0])
        subtree = self.root[classA] 
        if not subtree:
            subtree = CIDRnode()
            self.root[classA] = subtree
        nm_bit -= 7             # leave a bit for test at top of loop
        network_bits = self.byte2bit(network[1:])
        for nextbit in network_bits: 
            if nm_bit == 1:
                overwrite = subtree.cidr
                subtree.cidr = cidr
                return overwrite                # expect to return None
            nm_bit -= 1        
            if nextbit is 0:
                if not subtree.left:
                    subtree.left = CIDRnode()
                subtree = subtree.left
                continue
            elif nextbit is 1:
                if not subtree.right:
                    subtree.right = CIDRnode()
                subtree = subtree.right
                continue
            else:
                raise ProgramError(
                    'Tried to bud %s, bitten by bug, fell out of Tree'% cidr)
        subtree.cidr = cidr

    def __init__(self, ipranges=None):
        self.root = [None] * 256        # A forest of (x.0.0.0/8) 'class A' addresses
        if ipranges:
            for cidr in ipranges:
                self.add_cidr(cidr)

    def get_cidr(self, ip):
        """This is very similar to add_cidr but inline code improves
        performance by around 10% and the cost of building the lookup table
        is amortised over serveral million lookups gaining another 10%."""
        ip = ip.split('.')
        subtree = self.root[int(ip[0])]         # subtree = CIDRnode
        if subtree is None:
            return None
        results = [None]
        for quad in ip[1:]:
            quad_bits = bytes2bits[int(quad)]
            for nextbit in quad_bits:
                if subtree.cidr:
                    results.append(subtree.cidr)
                if subtree.left and nextbit is 0:
                    subtree = subtree.left
                    continue
                elif subtree.right and nextbit is 1:
                    subtree = subtree.right
                    continue
                else:
                    return results.pop()
        return subtree.cidr

####    Command Line Processing    ####
def test():
    index = {}
    networks = (
        ('10.40.47.0/27', 'network 1'),
        ('10.40.0.0/16' , 'network 2'),
        ('10.44.0.0/16' , 'network 3'),
        ('192.168.47.0/27','network 4'),
        ('10.10.1.1/32' , 'single_ip'),)
    ipaddr = ('10.40.47.26', '10.40.47.34', '10.44.47.26',
            '192.168.47.0', '192.168.47.31', '192.168.47.32',  
            '192.168.0.47', '192.168.47.1', '10.10.1.1')
    trie = CIDRtrie()
    for cidr, name in networks:
        index[cidr] = name
        overwrite = trie.add_cidr(cidr)
        if overwrite:
            print('WARNING overwriting %s with %s'% (overwrite, cidr))

    for ip in ipaddr:
        cidr = trie.get_cidr(ip)
        if cidr:
            name = index[cidr]
            print('%s is in network %s (%s)'% (ip, name, cidr))
        else:
            print('network not known for %s'% ip)

if __name__ == '__main__':     #only when run from cmd line
    test()

Log file analysis frequently requires the identification of the user of an IP address. A reverse DNS lookup is slow, uncertain of response, and may provide more information than is really required. In many cases it is enough to know the organisation associated with a subnet.

This program uses a digital trie search (Knuth 1998) to find the network of a given IPv4 address. It is fast, and can find nested subnets.

The digital trie data structure and search is described by Knuth in Art of Computer Programming Vol 3 (2e 1998). The version of the program shown here demonstrates several variations. A network address is a 32 bit integer. In CIDR (Classless Inter-Domain Routing) notation the size of the network is denoted by an integer denoting the number of rightmost bits that form the network address, the remaining bits are the address within the network. We assume that no network will be larger than /8 (class A); therefore we can gain a performance advantage by looking up the first component of the address in dotted quad form in a list representing all 256 possible values. If we were dealing only with the historical class A, B, C (/8, /16, /24 bit netmask) addresses we could nest a further two lists to complete the lookup in at most three steps. However today's networks come in a variety of sizes so the remainder of the lookup proceeds bit by bit through a binary trie. The depth of the tree depends on the length of the bitmask.

The code to add a network to the tree and to retrieve a value is very similar; the version here uses an less optimised but possibly clearer implementation for the add_cidr method, the get_cidr code gains by in-lining the lookup function (better by 10%) and using a lookup table to determine the binary representation of int(0) to int(256) (another 10% improvement). Either version is in fact much faster than a reverse DNS lookup: 55 to 45 seconds to test 12 million addresses whereas a fast asynchronous rDNS of the same data took around 7 hours.

The class CIDRnode could be extended to hold more attributes of the network. This implementation keeps the node simple by storing the network name in a separate dictionary indexed by the cidr.

2011-12-20. My thanks to John Hopkins for fixing a bug in the original version of this code.

4 comments

versus okasava 15 years, 3 months ago  # | flag

Tnx for this recipe. It's rock!

versus okasava 15 years, 3 months ago  # | flag

But this do not work with network any/32.

networks = ( ('10.40.47.0/27', 'network 1'), ('10.10.1.1/32' , 'myip'),)

ipaddr = ('10.40.47.26', '10.10.1.1')

$python recipe.py 10.40.47.26 is in network network 1 (10.40.47.0/27) network not known for 10.10.1.1

John Hopkins 12 years, 4 months ago  # | flag

To fix the any/32 problem, add this at the end of add_cidr: subtree.cidr = cidr # after for loop

And add this at the end of get_cidr: return subtree.cidr # after for loop

DJC (author) 12 years, 4 months ago  # | flag

Thank you for the patch which has now been incorporated into the code.

Created by DJC on Tue, 2 Sep 2008 (MIT)
Python recipes (4591)
DJC's recipes (1)

Required Modules

  • (none specified)

Other Information and Tasks