Welcome, guest | Sign In | My Account | Store | Cart

"Slugify" a string so it is ascii, has only alphanumeric and hyphen characters. Useful for URLs and filenames. This is heavily based on the slugify in Django.

Note: presumes that you've import red higher up in your module.

Python, 15 lines
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
_slugify_strip_re = re.compile(r'[^\w\s-]')
_slugify_hyphenate_re = re.compile(r'[-\s]+')
def _slugify(value):
    """
    Normalizes string, converts to lowercase, removes non-alpha characters,
    and converts spaces to hyphens.
    
    From Django's "django/template/defaultfilters.py".
    """
    import unicodedata
    if not isinstance(value, unicode):
        value = unicode(value)
    value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore')
    value = unicode(_slugify_strip_re.sub('', value).strip().lower())
    return _slugify_hyphenate_re.sub('-', value)

1 comment

Alan Plum 13 years, 10 months ago  # | flag

If you're fine with using additional packages, unidecode can help if you want to handle foreign characters more gracefully. unicodedata is a nice start, but it can only handle those characters for which there is a canonical form, which creates problems for foreign scripts and even some Latin languages.