Welcome, guest | Sign In | My Account | Store | Cart

Creating beautiful urls from user generated text (Python recipe) by Mark Zitnik
ActiveState Code (http://code.activestate.com/recipes/576511/)

Most of the web sites that work with user generated content use the text that was entered by the user as the url for this specific item and usually the user enter charates like ' '(space) '&', '.' and some other char that you want to remove or convert to _ , _and_ , _dot_. So i have wrote a dynamic code that you can setup what chars you what to change.

      import re
# set here all chars that needed to be changed 
map = {' ' : '_',
       '.' : '_dot_',
       '&' : '_and_',
       '$' : '_dolar_',
       ':' : '_colon_',
       ',' : '_comma_'
       }

_under = re.compile(r'_+')

def parse_for_beautiful_url(text):
	# if ch does not exists in the map return ch
	str = ''.join([map.get(ch,ch) for ch in text])
	# now we need to clear all types of __ ___ ____ to _ 
	str = _under.sub('_',str)
	# remove the last underscore if exis
	if str[-1:] == '_': return str[0:-1]
	return str

      

Lets see an example.

text = 'we are go$ng to shop & run.' print parse_for_buitifull_url(text)

output: we_are_go_dolar_ng_to_shop_and_run_dot

Tags: beautiful, url, web

3 comments

Mike Watkins 15 years, 7 months ago # | flag

There is more than one issue with your code as posted. First, it doesn't return the claimed output but:

"this_is_#_dolar_*(#_dolar_fun-isn't_it????"

Second, its not "dynamic" in that other punctuation can and will show up in the user controlled input text. Better to do something like this::

word_re = re.compile(r'\b\w+\b')

def make_name(text, lower=True):
    text = ''.join([c for c in text
                if (c.isalnum() or c =='-' or c.isspace())]) if text else ''
    text = '-'.join(word_re.findall(text))
    return text.lower() if lower else text

Output:

->> text = "a string with many ~!@#$%^&*()_-+=`/\ not-so-desirable characters"
->> make_name(text)
u'a-string-with-many-not-so-desirable-characters'

Mike Watkins 15 years, 7 months ago # | flag

Actually I guess if you really want the token "_dolar_" (sp) in your url (why, I can not fathom) then your code is on the right track.

I think you'll find though that the average "title" of a page which you may want to turn some or all words into a URL probably works better (more readable and meaningful) without trying to include text representation of the punctuation. Words matter to search engines, punctuation should not.

The second component of my critique remains wholly valid...

Mike Watkins 15 years, 7 months ago # | flag

Here's a better implementation than either of ours:

http://www.gooli.org/blog/unicode-and-permalinks/

def to_permalink(s):
    import re
    return re.compile("\W+", re.UNICODE).sub("_", s)

Created by Mark Zitnik on Sun, 21 Sep 2008 (MIT)

◄	Python recipes (4591)	►
◄	Mark Zitnik's recipes (2)	►

Required Modules

Other Information and Tasks

Licensed under the MIT License
Viewed 7746 times
Revision 3 (updated 15 years ago)

Accounts

Code Recipes

Feedback & Information

ActiveState

© 2024 ActiveState Software Inc. All rights reserved. ActiveState®, Komodo®, ActiveState Perl Dev Kit®, ActiveState Tcl Dev Kit®, ActivePerl®, ActivePython®, and ActiveTcl® are registered trademarks of ActiveState. All other marks are property of their respective owners.

Creating beautiful urls from user generated text (Python recipe) by Mark Zitnik ActiveState Code (http://code.activestate.com/recipes/576511/)

3 comments

Tags

Required Modules

Other Information and Tasks

Accounts

Code Recipes

Feedback & Information

ActiveState

Creating beautiful urls from user generated text (Python recipe) by Mark Zitnik
ActiveState Code (http://code.activestate.com/recipes/576511/)