For those times when '20050415115959' just takes up too much space. Useful for making your numbers shorter (like timestamps in URLs).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | NOTATION10 = '0123456789'
NOTATION70 = "!'()*-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz~"
class BaseConvert:
def __init__(self):
self.notation = NOTATION10
def _convert(self, n=1, b=10):
'''
Private function for doing conversions; returns a list
'''
if True not in [ isinstance(n, x) for x in [long, int, float] ]:
raise TypeError, 'parameters must be numbers'
converted = []
quotient, remainder = divmod(n, b)
converted.append(remainder)
if quotient != 0:
converted.extend(self._convert(quotient, b))
return converted
def convert(self, n, b):
'''
General conversion function
'''
nums = self._convert(n, b)
nums.reverse()
return self.getNotation(nums)
def getNotation(self, list_of_remainders):
'''
Get the notational representation of the converted number
'''
return ''.join([ self.notation[x] for x in list_of_remainders ])
class Base70(BaseConvert):
'''
>>> base = Base70()
>>> base.convert(10)
'4'
>>> base.convert(510)
'1E'
>>> base.convert(1000)
'8E'
>>> base.convert(10000000)
'N4o4'
'''
def __init__(self):
self.notation = NOTATION70
def convert(self, n):
"Convert base 10 to base 70"
return BaseConvert.convert(self, n, 70)
def _test():
import doctest, base
return doctest.testmod(base)
if __name__ == '__main__':
_test()
|
I have worked on several projects where number strings needed to be "compressed" without losing data. Two of them were separate Zope projects where the auto-generated unique IDs (datetime stamps with random numbers tacked on) used up too much URL space in the browser address bar.
As a result, we modified the unique ID generators to incorporate classes based on the above. The Base70 class was actually written for a Nevow project; some of the characters in NOTATION70 are not Zope-friendly; we thus came up with a base 65 notation that was Zope-friendly. In addition, we also had one that didn't go in URLS and was base 90. Very short number strings ;-)
Update: Fixed typo as pointed out by Anand Pillai in the comments.
You really want vowels in there? The problem with including vowels is that then your output can include natural-language words, and certain words can be offensive to certain people.
Murphy's Law: when "fuk-u" shows up in one of your URLs, the wrong person is going to notice it.
why not a two way process? The thing I notice is that you're only converting in one direction. Something like the code would be more useful if you ever want to use the base 70 number as a number. (for bases less than 37, int(n, base) is the simpler way to back-convert).
Spelling error. I think you mean "raise TypeError, 'parameters must be numbers' " not "raise TypeError, 'parameters bust be numbers'"
-Anand
struct and base64. You can accomplish something similar with the standard struct module and base64. Not quite as compact, and the numbers are padded, but it's fairly straight-forward. I have it at: http://svn.colorstudy.com/home/ianb/recipes/base64unpack.py ; but here's the raw code:
packing and base64. I really like Ian's suggestion above. Very simple and elegant solution. We emailed briefly about this after I had done some testing. First, I made a minor change to his code above to allow for doubles (and thus much longer number sequences):
But after I made that change, I ran into some other issues. Given, these will most likely be edge cases, but interesting to note and be aware of nonetheless:
Precision is good here:
But increasing the power by one at this point results in a loss of precision:
Here are a few examples of how this changes with numbers of increasing size (the numbers are test timestamps + "random" numbers):
I asked Ian about this, and he briefly touched on C and struct internals which I won't get into ;-) Something to keep in mind, though.
Only part-way two-way... I really like your convert logic. Much cleaner. I'll update the recipe with it at some point. However, your two-way doesn't do a full two-way:
Using the order of your INT_TO_DIGIT list, 100 base70 would be the string '1U'; you conversion doesn't let you go from '1U' and reobtain 100. When I get some time, I'll look into that, just for kicks (I've never needed that functionality).
Some notes about INT_TO_DIGIT: 1) strings are already indexed iterables, so you don't need a list comprehension for it; and 2) it might be a good idea to list the strings in the notation in python sorting order, that way if someone sorted out a bunch of base70 strings, they would actually list in numerical order. Here they are in python sort order:
Recursion is limiting. I'm hitting recursion limits. I suggest replacing:
with:
Any reason not to use the more general function linked to below? http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/111286
Tested various numbers with both and they give the same result. Anyone find a problem with the other one?