How to remove non ascii characters with ascii characters

Sometimes we need to get rid fo non ascii characters from some strings. Specially when those strings have to be part of an URL. SO here is the simple code to do that :

String resultString = subjectString.replaceAll("[^\\x00-\\x7F]", "");

But the above code will replace non-ascii characters with empty character. If you want to preserve the ascii equivalents, you need to normalize first:

String subjectString = "öäü";
subjectString = Normalizer.normalize(subjectString, Normalizer.Form.NFD);
String resultString = subjectString.replaceAll("[^\\x00-\\x7F]", "");

=> will produce “oau”
That way, characters like “öäü” will be mapped to “oau”, which at least preserves some information. Without normalization, the resulting String will be blank.

Uday Ogra

Connect with me at http://facebook.com/tendulkarogra and lets have some healthy discussion :)

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *