Welcome, guest | Sign In | My Account | Store | Cart

Most of the web sites that work with user generated content use the text that was entered by the user as the url for this specific item and usually the user enter charates like ' '(space) '&', '.' and some other char that you want to remove or convert to _ , _and_ , _dot_. So i have wrote a dynamic code that you can setup what chars you what to change.

Python, 20 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
import re
# set here all chars that needed to be changed 
map = {' ' : '_',
       '.' : '_dot_',
       '&' : '_and_',
       '$' : '_dolar_',
       ':' : '_colon_',
       ',' : '_comma_'
       }

_under = re.compile(r'_+')

def parse_for_beautiful_url(text):
	# if ch does not exists in the map return ch
	str = ''.join([map.get(ch,ch) for ch in text])
	# now we need to clear all types of __ ___ ____ to _ 
	str = _under.sub('_',str)
	# remove the last underscore if exis
	if str[-1:] == '_': return str[0:-1]
	return str

Lets see an example.

text = 'we are go$ng to shop & run.' print parse_for_buitifull_url(text)

output: we_are_go_dolar_ng_to_shop_and_run_dot

3 comments

Mike Watkins 15 years, 7 months ago  # | flag

There is more than one issue with your code as posted. First, it doesn't return the claimed output but:

"this_is_#_dolar_*(#_dolar_fun-isn't_it????"

Second, its not "dynamic" in that other punctuation can and will show up in the user controlled input text. Better to do something like this::

word_re = re.compile(r'\b\w+\b')

def make_name(text, lower=True):
    text = ''.join([c for c in text
                if (c.isalnum() or c =='-' or c.isspace())]) if text else ''
    text = '-'.join(word_re.findall(text))
    return text.lower() if lower else text

Output:

->> text = "a string with many ~!@#$%^&*()_-+=`/\ not-so-desirable characters"
->> make_name(text)
u'a-string-with-many-not-so-desirable-characters'
Mike Watkins 15 years, 7 months ago  # | flag

Actually I guess if you really want the token "_dolar_" (sp) in your url (why, I can not fathom) then your code is on the right track.

I think you'll find though that the average "title" of a page which you may want to turn some or all words into a URL probably works better (more readable and meaningful) without trying to include text representation of the punctuation. Words matter to search engines, punctuation should not.

The second component of my critique remains wholly valid...

Mike Watkins 15 years, 7 months ago  # | flag

Here's a better implementation than either of ours:

http://www.gooli.org/blog/unicode-and-permalinks/

def to_permalink(s):
    import re
    return re.compile("\W+", re.UNICODE).sub("_", s)
Created by Mark Zitnik on Sun, 21 Sep 2008 (MIT)
Python recipes (4591)
Mark Zitnik's recipes (2)

Required Modules

Other Information and Tasks