ActiveState Code

Recipe 410672: Custom String Representations of Bases


For those times when '20050415115959' just takes up too much space. Useful for making your numbers shorter (like timestamps in URLs).

Python
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
NOTATION10 = '0123456789'
NOTATION70 = "!'()*-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~"

class BaseConvert:
    def __init__(self):
        self.notation = NOTATION10

    def _convert(self, n=1, b=10):
        '''
        Private function for doing conversions; returns a list
        '''
        if True not in [ isinstance(n, x) for x in [long, int, float] ]:
            raise TypeError, 'parameters must be numbers'
        converted = []
        quotient, remainder = divmod(n, b)
        converted.append(remainder)
        if quotient != 0:
            converted.extend(self._convert(quotient, b))
        return converted

    def convert(self, n, b):
        '''
        General conversion function
        '''
        nums = self._convert(n, b)
        nums.reverse()
        return self.getNotation(nums)

    def getNotation(self, list_of_remainders):
        '''
        Get the notational representation of the converted number
        '''
        return ''.join([ self.notation[x] for x in list_of_remainders ])

class Base70(BaseConvert):
    '''
    >>> base = Base70()
    >>> base.convert(10)
    '4'
    >>> base.convert(510)
    '1E'
    >>> base.convert(1000)
    '8E'
    >>> base.convert(10000000)
    'N4o4'
    '''
    def __init__(self):
        self.notation = NOTATION70

    def convert(self, n):
        "Convert base 10 to base 70"
        return BaseConvert.convert(self, n, 70)

def _test():
    import doctest, base
    return doctest.testmod(base)

if __name__ == '__main__':
    _test()

Discussion

I have worked on several projects where number strings needed to be "compressed" without losing data. Two of them were separate Zope projects where the auto-generated unique IDs (datetime stamps with random numbers tacked on) used up too much URL space in the browser address bar.

As a result, we modified the unique ID generators to incorporate classes based on the above. The Base70 class was actually written for a Nevow project; some of the characters in NOTATION70 are not Zope-friendly; we thus came up with a base 65 notation that was Zope-friendly. In addition, we also had one that didn't go in URLS and was base 90. Very short number strings ;-)

Update: Fixed typo as pointed out by Anand Pillai in the comments.

Comments

  1. 1. At 1:48 a.m. on 26 apr 2005, Amos Newcombe said:

    You really want vowels in there? The problem with including vowels is that then your output can include natural-language words, and certain words can be offensive to certain people.

    Murphy's Law: when "fuk-u" shows up in one of your URLs, the wrong person is going to notice it.

  2. 2. At 8:14 p.m. on 26 apr 2005, Douglas Bagnall said:

    why not a two way process? The thing I notice is that you're only converting in one direction. Something like the code would be more useful if you ever want to use the base 70 number as a number. (for bases less than 37, int(n, base) is the simpler way to back-convert).

    INT_TO_DIGIT = [ x for x in "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
                     "_abcdefghijklmnopqrstuvwxyz~!'()*+"]
    DIGIT_TO_INT = dict([ (y, x) for x, y in enumerate(INT_TO_DIGIT) ])
    
    
    def convert(n, newbase=70, oldbase=10):
        nums = [ DIGIT_TO_INT[x] for x in str(n) ]
        nums.reverse()
        r = 1
        total = 0
        for x in nums:
            total += r * x
            r *= oldbase
        if total == 0:
            return '0'
    
        converted = []
        while total:
            total, remainder = divmod(total, newbase)
            converted.append(INT_TO_DIGIT[remainder])
    
        converted.reverse()
        return ''.join(converted)
    
    
    if __name__ == '__main__':
    
        numbers10 = (1, 256, 3, 34534, 20050427123456, 0, 453532)
        bases = (70, 16, 2, 7, 39)
        for n in numbers10:
            print "\ndealing with %s" %n
            for base in bases:
                nbase = convert(n, base, 10)
                n10 = convert(nbase, 10, base)
                print "base %2s: %40s" % (base, nbase)
                assert (int(n10) == n)
            print "base 10: %40s" % n10
    
  3. 3. At 3:49 a.m. on 28 apr 2005, Anand Balachandran Pillai said:

    Spelling error. I think you mean "raise TypeError, 'parameters must be numbers' " not "raise TypeError, 'parameters bust be numbers'"

    -Anand

  4. 4. At 1:53 p.m. on 28 apr 2005, Ian Bicking said:

    struct and base64. You can accomplish something similar with the standard struct module and base64. Not quite as compact, and the numbers are padded, but it's fairly straight-forward. I have it at: http://svn.colorstudy.com/home/ianb/recipes/base64unpack.py ; but here's the raw code:

    def pack_int(i):
        return struct.pack('i', i).encode('base64').replace('\n', '').strip('=')
    
    def unpack_int(s):
        s += '=' * (8 % len(s))
        return struct.unpack('i', s.decode('base64'))[0]
    
  5. 5. At 12:58 p.m. on 10 may 2005, Duncan McGreggor (the author) said:

    packing and base64. I really like Ian's suggestion above. Very simple and elegant solution. We emailed briefly about this after I had done some testing. First, I made a minor change to his code above to allow for doubles (and thus much longer number sequences):

    def pack_int(i):
        return struct.pack('d', i).encode('base64').replace('\n', '').strip('=')
    

    But after I made that change, I ran into some other issues. Given, these will most likely be edge cases, but interesting to note and be aware of nonetheless:

    Precision is good here:

    >>> struct.pack('d', 2**52).encode('base64').strip().strip('=')
    'QzAAAAAAAAA'
    >>> struct.pack('d', 2**52+1).encode('base64').strip().strip('=')
    'QzAAAAAAAAE'
    

    But increasing the power by one at this point results in a loss of precision:

    >>> struct.pack('d', 2**53).encode('base64').strip().strip('=')
    'Q0AAAAAAAAA'
    >>> struct.pack('d', 2**53+1).encode('base64').strip().strip('=')
    'Q0AAAAAAAAA'
    

    Here are a few examples of how this changes with numbers of increasing size (the numbers are test timestamps + "random" numbers):

    >>> struct.pack('d', 20050817043010000).encode('base64').strip().strip('=')
    'Q1HPByjThHQ'
    >>> struct.pack('d', 20050817043010001).encode('base64').strip().strip('=')
    'Q1HPByjThHQ'
    >>> struct.pack('d', 20050817043010002).encode('base64').strip().strip('=')
    'Q1HPByjThHQ'
    >>> struct.pack('d', 20050817043010003).encode('base64').strip().strip('=')
    'Q1HPByjThHU'
    

    I asked Ian about this, and he briefly touched on C and struct internals which I won't get into ;-) Something to keep in mind, though.

  6. 6. At 4:13 p.m. on 10 may 2005, Duncan McGreggor (the author) said:

    Only part-way two-way... I really like your convert logic. Much cleaner. I'll update the recipe with it at some point. However, your two-way doesn't do a full two-way:

    Using the order of your INT_TO_DIGIT list, 100 base70 would be the string '1U'; you conversion doesn't let you go from '1U' and reobtain 100. When I get some time, I'll look into that, just for kicks (I've never needed that functionality).

    Some notes about INT_TO_DIGIT: 1) strings are already indexed iterables, so you don't need a list comprehension for it; and 2) it might be a good idea to list the strings in the notation in python sorting order, that way if someone sorted out a bunch of base70 strings, they would actually list in numerical order. Here they are in python sort order:

    !'()*-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~
    
  7. 7. At 2:53 p.m. on 27 apr 2007, Shannon -jj Behrens said:

    Recursion is limiting. I'm hitting recursion limits. I suggest replacing:

        quotient, remainder = divmod(n, b)
        converted.append(remainder)
        if quotient != 0:
            converted.extend(self._convert(quotient, b))
        return converted
    

    with:

        while True:
            quotient, remainder = divmod(n, b)
            converted.append(remainder)
            if quotient == 0:
                return converted
            n = quotient
    
  8. 8. At 8:47 p.m. on 21 jul 2008, j m said:

    Any reason not to use the more general function linked to below? http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/111286

    Tested various numbers with both and they give the same result. Anyone find a problem with the other one?

Sign in to comment