Convert special characters to ASCII in python

I came across a recurrent problem at work which was to convert special characters such as the French-Latin accentuated letter “é” to ASCII “e” (this is called transliteration).

I wanted to avoid having to use an external library such as Unidecode (which is great obviously) so I ended up wandering around the unicodedata built-in library. Before I had to get too deep in the matter I found this StackOverflow topic which gives an interesting method to do so and works fine for me.

def strip_accents(s):
    """
    Sanitarize the given unicode string and remove all special/localized
    characters from it.

    Category "Mn" stands for Nonspacing_Mark
    """
    try:
        return ''.join(
            c for c in unicodedata.normalize('NFD', s)
            if unicodedata.category(c) != 'Mn'
        )
    except:
        return s

PS : thanks to @Flameeyes for his good remark on wording

Join the Conversation

4 Comments

    1. You’re right Flammie thx, poor choice of words indeed 🙂 corrected !

  1. How do I make this doable for an entire dataset/csv file? Like I know how to pass it through a single string, but a list and then having it make the replacements then doing a to_csv(”) from there…. That’ll be idea.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.