Most of the web sites that work with user generated content use the text that was entered by the user as the url for this specific item and usually the user enter charates like ' '(space) '&', '.' and some other char that you want to remove or convert to _ , _and_ , _dot_. So i have wrote a dynamic code that you can setup what chars you what to change.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | import re
# set here all chars that needed to be changed
map = {' ' : '_',
'.' : '_dot_',
'&' : '_and_',
'$' : '_dolar_',
':' : '_colon_',
',' : '_comma_'
}
_under = re.compile(r'_+')
def parse_for_beautiful_url(text):
# if ch does not exists in the map return ch
str = ''.join([map.get(ch,ch) for ch in text])
# now we need to clear all types of __ ___ ____ to _
str = _under.sub('_',str)
# remove the last underscore if exis
if str[-1:] == '_': return str[0:-1]
return str
|
Lets see an example.
text = 'we are go$ng to shop & run.' print parse_for_buitifull_url(text)
output: we_are_go_dolar_ng_to_shop_and_run_dot
There is more than one issue with your code as posted. First, it doesn't return the claimed output but:
Second, its not "dynamic" in that other punctuation can and will show up in the user controlled input text. Better to do something like this::
Output:
Actually I guess if you really want the token "_dolar_" (sp) in your url (why, I can not fathom) then your code is on the right track.
I think you'll find though that the average "title" of a page which you may want to turn some or all words into a URL probably works better (more readable and meaningful) without trying to include text representation of the punctuation. Words matter to search engines, punctuation should not.
The second component of my critique remains wholly valid...
Here's a better implementation than either of ours:
http://www.gooli.org/blog/unicode-and-permalinks/