Custom String Representations of Bases « Python recipes

For those times when '20050415115959' just takes up too much space. Useful for making your numbers shorter (like timestamps in URLs).

      NOTATION10 = '0123456789'
NOTATION70 = "!'()*-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~"

class BaseConvert:
    def __init__(self):
        self.notation = NOTATION10

    def _convert(self, n=1, b=10):
        '''
        Private function for doing conversions; returns a list
        '''
        if True not in [ isinstance(n, x) for x in [long, int, float] ]:
            raise TypeError, 'parameters must be numbers'
        converted = []
        quotient, remainder = divmod(n, b)
        converted.append(remainder)
        if quotient != 0:
            converted.extend(self._convert(quotient, b))
        return converted

    def convert(self, n, b):
        '''
        General conversion function
        '''
        nums = self._convert(n, b)
        nums.reverse()
        return self.getNotation(nums)

    def getNotation(self, list_of_remainders):
        '''
        Get the notational representation of the converted number
        '''
        return ''.join([ self.notation[x] for x in list_of_remainders ])

class Base70(BaseConvert):
    '''
    >>> base = Base70()
    >>> base.convert(10)
    '4'
    >>> base.convert(510)
    '1E'
    >>> base.convert(1000)
    '8E'
    >>> base.convert(10000000)
    'N4o4'
    '''
    def __init__(self):
        self.notation = NOTATION70

    def convert(self, n):
        "Convert base 10 to base 70"
        return BaseConvert.convert(self, n, 70)

def _test():
    import doctest, base
    return doctest.testmod(base)

if __name__ == '__main__':
    _test()

      

I have worked on several projects where number strings needed to be "compressed" without losing data. Two of them were separate Zope projects where the auto-generated unique IDs (datetime stamps with random numbers tacked on) used up too much URL space in the browser address bar.

As a result, we modified the unique ID generators to incorporate classes based on the above. The Base70 class was actually written for a Nevow project; some of the characters in NOTATION70 are not Zope-friendly; we thus came up with a base 65 notation that was Zope-friendly. In addition, we also had one that didn't go in URLS and was base 90. Very short number strings ;-)

Update: Fixed typo as pointed out by Anand Pillai in the comments.

Tags: text

8 comments

Amos Newcombe 18 years, 12 months ago # | flag

You really want vowels in there? The problem with including vowels is that then your output can include natural-language words, and certain words can be offensive to certain people.

Murphy's Law: when "fuk-u" shows up in one of your URLs, the wrong person is going to notice it.

Douglas Bagnall 18 years, 12 months ago # | flag

why not a two way process? The thing I notice is that you're only converting in one direction. Something like the code would be more useful if you ever want to use the base 70 number as a number. (for bases less than 37, int(n, base) is the simpler way to back-convert).

INT_TO_DIGIT = [ x for x in "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
                 "_abcdefghijklmnopqrstuvwxyz~!'()*+"]
DIGIT_TO_INT = dict([ (y, x) for x, y in enumerate(INT_TO_DIGIT) ])


def convert(n, newbase=70, oldbase=10):
    nums = [ DIGIT_TO_INT[x] for x in str(n) ]
    nums.reverse()
    r = 1
    total = 0
    for x in nums:
        total += r * x
        r *= oldbase
    if total == 0:
        return '0'

    converted = []
    while total:
        total, remainder = divmod(total, newbase)
        converted.append(INT_TO_DIGIT[remainder])

    converted.reverse()
    return ''.join(converted)


if __name__ == '__main__':

    numbers10 = (1, 256, 3, 34534, 20050427123456, 0, 453532)
    bases = (70, 16, 2, 7, 39)
    for n in numbers10:
        print "\ndealing with %s" %n
        for base in bases:
            nbase = convert(n, base, 10)
            n10 = convert(nbase, 10, base)
            print "base %2s: %40s" % (base, nbase)
            assert (int(n10) == n)
        print "base 10: %40s" % n10

Anand 18 years, 12 months ago # | flag

Spelling error. I think you mean "raise TypeError, 'parameters must be numbers' " not "raise TypeError, 'parameters bust be numbers'"

-Anand

Ian Bicking 18 years, 12 months ago # | flag

struct and base64. You can accomplish something similar with the standard struct module and base64. Not quite as compact, and the numbers are padded, but it's fairly straight-forward. I have it at: http://svn.colorstudy.com/home/ianb/recipes/base64unpack.py ; but here's the raw code:

def pack_int(i):
    return struct.pack('i', i).encode('base64').replace('\n', '').strip('=')

def unpack_int(s):
    s += '=' * (8 % len(s))
    return struct.unpack('i', s.decode('base64'))[0]

Duncan McGreggor (author) 18 years, 11 months ago # | flag

packing and base64. I really like Ian's suggestion above. Very simple and elegant solution. We emailed briefly about this after I had done some testing. First, I made a minor change to his code above to allow for doubles (and thus much longer number sequences):

def pack_int(i):
    return struct.pack('d', i).encode('base64').replace('\n', '').strip('=')

But after I made that change, I ran into some other issues. Given, these will most likely be edge cases, but interesting to note and be aware of nonetheless:

Precision is good here:

>>> struct.pack('d', 2**52).encode('base64').strip().strip('=')
'QzAAAAAAAAA'
>>> struct.pack('d', 2**52+1).encode('base64').strip().strip('=')
'QzAAAAAAAAE'

But increasing the power by one at this point results in a loss of precision:

>>> struct.pack('d', 2**53).encode('base64').strip().strip('=')
'Q0AAAAAAAAA'
>>> struct.pack('d', 2**53+1).encode('base64').strip().strip('=')
'Q0AAAAAAAAA'

Here are a few examples of how this changes with numbers of increasing size (the numbers are test timestamps + "random" numbers):

>>> struct.pack('d', 20050817043010000).encode('base64').strip().strip('=')
'Q1HPByjThHQ'
>>> struct.pack('d', 20050817043010001).encode('base64').strip().strip('=')
'Q1HPByjThHQ'
>>> struct.pack('d', 20050817043010002).encode('base64').strip().strip('=')
'Q1HPByjThHQ'
>>> struct.pack('d', 20050817043010003).encode('base64').strip().strip('=')
'Q1HPByjThHU'

I asked Ian about this, and he briefly touched on C and struct internals which I won't get into ;-) Something to keep in mind, though.

Duncan McGreggor (author) 18 years, 11 months ago # | flag

Only part-way two-way... I really like your convert logic. Much cleaner. I'll update the recipe with it at some point. However, your two-way doesn't do a full two-way:

Using the order of your INT_TO_DIGIT list, 100 base70 would be the string '1U'; you conversion doesn't let you go from '1U' and reobtain 100. When I get some time, I'll look into that, just for kicks (I've never needed that functionality).

Some notes about INT_TO_DIGIT: 1) strings are already indexed iterables, so you don't need a list comprehension for it; and 2) it might be a good idea to list the strings in the notation in python sorting order, that way if someone sorted out a bunch of base70 strings, they would actually list in numerical order. Here they are in python sort order:

!'()*-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~

Shannon -jj Behrens 16 years, 12 months ago # | flag

Recursion is limiting. I'm hitting recursion limits. I suggest replacing:

    quotient, remainder = divmod(n, b)
    converted.append(remainder)
    if quotient != 0:
        converted.extend(self._convert(quotient, b))
    return converted

with:

    while True:
        quotient, remainder = divmod(n, b)
        converted.append(remainder)
        if quotient == 0:
            return converted
        n = quotient

j m 15 years, 9 months ago # | flag

Any reason not to use the more general function linked to below? http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/111286

Tested various numbers with both and they give the same result. Anyone find a problem with the other one?

◄	Python recipes (4591)	►
◄	Duncan McGreggor's recipes (5)	►

Custom String Representations of Bases (Python recipe) by Duncan McGreggor
ActiveState Code (http://code.activestate.com/recipes/410672/)

8 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

Custom String Representations of Bases (Python recipe) by Duncan McGreggor ActiveState Code (http://code.activestate.com/recipes/410672/)

8 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

Custom String Representations of Bases (Python recipe) by Duncan McGreggor
ActiveState Code (http://code.activestate.com/recipes/410672/)