Welcome, guest | Sign In | My Account | Store | Cart

For those times when '20050415115959' just takes up too much space. Useful for making your numbers shorter (like timestamps in URLs).

Python, 59 lines
 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 NOTATION10 = '0123456789' NOTATION70 = "!'()*-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~" class BaseConvert: def __init__(self): self.notation = NOTATION10 def _convert(self, n=1, b=10): ''' Private function for doing conversions; returns a list ''' if True not in [ isinstance(n, x) for x in [long, int, float] ]: raise TypeError, 'parameters must be numbers' converted = [] quotient, remainder = divmod(n, b) converted.append(remainder) if quotient != 0: converted.extend(self._convert(quotient, b)) return converted def convert(self, n, b): ''' General conversion function ''' nums = self._convert(n, b) nums.reverse() return self.getNotation(nums) def getNotation(self, list_of_remainders): ''' Get the notational representation of the converted number ''' return ''.join([ self.notation[x] for x in list_of_remainders ]) class Base70(BaseConvert): ''' >>> base = Base70() >>> base.convert(10) '4' >>> base.convert(510) '1E' >>> base.convert(1000) '8E' >>> base.convert(10000000) 'N4o4' ''' def __init__(self): self.notation = NOTATION70 def convert(self, n): "Convert base 10 to base 70" return BaseConvert.convert(self, n, 70) def _test(): import doctest, base return doctest.testmod(base) if __name__ == '__main__': _test()

I have worked on several projects where number strings needed to be "compressed" without losing data. Two of them were separate Zope projects where the auto-generated unique IDs (datetime stamps with random numbers tacked on) used up too much URL space in the browser address bar.

As a result, we modified the unique ID generators to incorporate classes based on the above. The Base70 class was actually written for a Nevow project; some of the characters in NOTATION70 are not Zope-friendly; we thus came up with a base 65 notation that was Zope-friendly. In addition, we also had one that didn't go in URLS and was base 90. Very short number strings ;-)

Update: Fixed typo as pointed out by Anand Pillai in the comments. Amos Newcombe 16 years, 9 months ago

You really want vowels in there? The problem with including vowels is that then your output can include natural-language words, and certain words can be offensive to certain people.

Murphy's Law: when "fuk-u" shows up in one of your URLs, the wrong person is going to notice it. Douglas Bagnall 16 years, 9 months ago

why not a two way process? The thing I notice is that you're only converting in one direction. Something like the code would be more useful if you ever want to use the base 70 number as a number. (for bases less than 37, int(n, base) is the simpler way to back-convert).

INT_TO_DIGIT = [ x for x in "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
"_abcdefghijklmnopqrstuvwxyz~!'()*+"]
DIGIT_TO_INT = dict([ (y, x) for x, y in enumerate(INT_TO_DIGIT) ])

def convert(n, newbase=70, oldbase=10):
nums = [ DIGIT_TO_INT[x] for x in str(n) ]
nums.reverse()
r = 1
total = 0
for x in nums:
total += r * x
r *= oldbase
if total == 0:
return '0'

converted = []
while total:
total, remainder = divmod(total, newbase)
converted.append(INT_TO_DIGIT[remainder])

converted.reverse()
return ''.join(converted)

if __name__ == '__main__':

numbers10 = (1, 256, 3, 34534, 20050427123456, 0, 453532)
bases = (70, 16, 2, 7, 39)
for n in numbers10:
print "\ndealing with %s" %n
for base in bases:
nbase = convert(n, base, 10)
n10 = convert(nbase, 10, base)
print "base %2s: %40s" % (base, nbase)
assert (int(n10) == n)
print "base 10: %40s" % n10 Anand 16 years, 9 months ago

Spelling error. I think you mean "raise TypeError, 'parameters must be numbers' " not "raise TypeError, 'parameters bust be numbers'"

-Anand Ian Bicking 16 years, 9 months ago

struct and base64. You can accomplish something similar with the standard struct module and base64. Not quite as compact, and the numbers are padded, but it's fairly straight-forward. I have it at: http://svn.colorstudy.com/home/ianb/recipes/base64unpack.py ; but here's the raw code:

def pack_int(i):
return struct.pack('i', i).encode('base64').replace('\n', '').strip('=')

def unpack_int(s):
s += '=' * (8 % len(s))
return struct.unpack('i', s.decode('base64')) Duncan McGreggor (author) 16 years, 8 months ago

packing and base64. I really like Ian's suggestion above. Very simple and elegant solution. We emailed briefly about this after I had done some testing. First, I made a minor change to his code above to allow for doubles (and thus much longer number sequences):

def pack_int(i):
return struct.pack('d', i).encode('base64').replace('\n', '').strip('=')

But after I made that change, I ran into some other issues. Given, these will most likely be edge cases, but interesting to note and be aware of nonetheless:

Precision is good here:

>>> struct.pack('d', 2**52).encode('base64').strip().strip('=')
'QzAAAAAAAAA'
>>> struct.pack('d', 2**52+1).encode('base64').strip().strip('=')
'QzAAAAAAAAE'

But increasing the power by one at this point results in a loss of precision:

>>> struct.pack('d', 2**53).encode('base64').strip().strip('=')
'Q0AAAAAAAAA'
>>> struct.pack('d', 2**53+1).encode('base64').strip().strip('=')
'Q0AAAAAAAAA'

Here are a few examples of how this changes with numbers of increasing size (the numbers are test timestamps + "random" numbers):

>>> struct.pack('d', 20050817043010000).encode('base64').strip().strip('=')
'Q1HPByjThHQ'
>>> struct.pack('d', 20050817043010001).encode('base64').strip().strip('=')
'Q1HPByjThHQ'
>>> struct.pack('d', 20050817043010002).encode('base64').strip().strip('=')
'Q1HPByjThHQ'
>>> struct.pack('d', 20050817043010003).encode('base64').strip().strip('=')
'Q1HPByjThHU'

I asked Ian about this, and he briefly touched on C and struct internals which I won't get into ;-) Something to keep in mind, though. Duncan McGreggor (author) 16 years, 8 months ago

Only part-way two-way... I really like your convert logic. Much cleaner. I'll update the recipe with it at some point. However, your two-way doesn't do a full two-way:

Using the order of your INT_TO_DIGIT list, 100 base70 would be the string '1U'; you conversion doesn't let you go from '1U' and reobtain 100. When I get some time, I'll look into that, just for kicks (I've never needed that functionality).

Some notes about INT_TO_DIGIT: 1) strings are already indexed iterables, so you don't need a list comprehension for it; and 2) it might be a good idea to list the strings in the notation in python sorting order, that way if someone sorted out a bunch of base70 strings, they would actually list in numerical order. Here they are in python sort order:

!'()*-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~ Shannon -jj Behrens 14 years, 9 months ago

Recursion is limiting. I'm hitting recursion limits. I suggest replacing:

quotient, remainder = divmod(n, b)
converted.append(remainder)
if quotient != 0:
converted.extend(self._convert(quotient, b))
return converted

with:

while True:
quotient, remainder = divmod(n, b)
converted.append(remainder)
if quotient == 0:
return converted
n = quotient j m 13 years, 6 months ago

Any reason not to use the more general function linked to below? http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/111286

Tested various numbers with both and they give the same result. Anyone find a problem with the other one? Created by Duncan McGreggor on Fri, 22 Apr 2005 (PSF)