|
1
|
For those times when '20050415115959' just takes up too much space. Useful for making your numbers shorter (like timestamps in URLs).
I have worked on several projects where number strings needed to be "compressed" without losing data. Two of them were separate Zope projects where the auto-generated unique IDs (datetime stamps with random numbers tacked on) used up too much URL space in the browser address bar. As a result, we modified the unique ID generators to incorporate classes based on the above. The Base70 class was actually written for a Nevow project; some of the characters in NOTATION70 are not Zope-friendly; we thus came up with a base 65 notation that was Zope-friendly. In addition, we also had one that didn't go in URLS and was base 90. Very short number strings ;-) Update: Fixed typo as pointed out by Anand Pillai in the comments.
Tags: text
|
8 comments
Add a comment
Sign in to comment

Download
Copy to clipboard

You really want vowels in there? The problem with including vowels is that then your output can include natural-language words, and certain words can be offensive to certain people.
Murphy's Law: when "fuk-u" shows up in one of your URLs, the wrong person is going to notice it.
why not a two way process? The thing I notice is that you're only converting in one direction. Something like the code would be more useful if you ever want to use the base 70 number as a number. (for bases less than 37, int(n, base) is the simpler way to back-convert).
Spelling error. I think you mean "raise TypeError, 'parameters must be numbers' " not "raise TypeError, 'parameters bust be numbers'"
-Anand
struct and base64. You can accomplish something similar with the standard struct module and base64. Not quite as compact, and the numbers are padded, but it's fairly straight-forward. I have it at: http://svn.colorstudy.com/home/ianb/recipes/base64unpack.py ; but here's the raw code:
packing and base64. I really like Ian's suggestion above. Very simple and elegant solution. We emailed briefly about this after I had done some testing. First, I made a minor change to his code above to allow for doubles (and thus much longer number sequences):
But after I made that change, I ran into some other issues. Given, these will most likely be edge cases, but interesting to note and be aware of nonetheless:
Precision is good here:
But increasing the power by one at this point results in a loss of precision:
Here are a few examples of how this changes with numbers of increasing size (the numbers are test timestamps + "random" numbers):
I asked Ian about this, and he briefly touched on C and struct internals which I won't get into ;-) Something to keep in mind, though.
Only part-way two-way... I really like your convert logic. Much cleaner. I'll update the recipe with it at some point. However, your two-way doesn't do a full two-way:
Using the order of your INT_TO_DIGIT list, 100 base70 would be the string '1U'; you conversion doesn't let you go from '1U' and reobtain 100. When I get some time, I'll look into that, just for kicks (I've never needed that functionality).
Some notes about INT_TO_DIGIT: 1) strings are already indexed iterables, so you don't need a list comprehension for it; and 2) it might be a good idea to list the strings in the notation in python sorting order, that way if someone sorted out a bunch of base70 strings, they would actually list in numerical order. Here they are in python sort order:
Recursion is limiting. I'm hitting recursion limits. I suggest replacing:
with:
Any reason not to use the more general function linked to below? http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/111286
Tested various numbers with both and they give the same result. Anyone find a problem with the other one?