"Slugify" a string so it is ascii, has only alphanumeric and hyphen characters. Useful for URLs and filenames. This is heavily based on the slugify in Django.
Note: presumes that you've import re
d higher up in your module.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | _slugify_strip_re = re.compile(r'[^\w\s-]')
_slugify_hyphenate_re = re.compile(r'[-\s]+')
def _slugify(value):
"""
Normalizes string, converts to lowercase, removes non-alpha characters,
and converts spaces to hyphens.
From Django's "django/template/defaultfilters.py".
"""
import unicodedata
if not isinstance(value, unicode):
value = unicode(value)
value = unicodedata.normalize('NFKD', value).encode('ascii', 'ignore')
value = unicode(_slugify_strip_re.sub('', value).strip().lower())
return _slugify_hyphenate_re.sub('-', value)
|
If you're fine with using additional packages,
unidecode
can help if you want to handle foreign characters more gracefully.unicodedata
is a nice start, but it can only handle those characters for which there is a canonical form, which creates problems for foreign scripts and even some Latin languages.