I came across a recurrent problem at work which was to convert special characters such as the French-Latin accentuated letter “é” to ASCII “e” (this is called transliteration).
I wanted to avoid having to use an external library such as Unidecode (which is great obviously) so I ended up wandering around the unicodedata built-in library. Before I had to get too deep in the matter I found this StackOverflow topic which gives an interesting method to do so and works fine for me.
def strip_accents(s): """ Sanitarize the given unicode string and remove all special/localized characters from it. Category "Mn" stands for Nonspacing_Mark """ try: return ''.join( c for c in unicodedata.normalize('NFD', s) if unicodedata.category(c) != 'Mn' ) except: return s
PS : thanks to @Flameeyes for his good remark on wording