Remove all non-ascii characters with perl

This post is over 3 years old, so please keep in mind that some of its content might not be relevant anymore.

Sometimes when creating one-liners, when the source is not “clean”, you may end up with non-ascii characters which make the parsing harder.

To get rid of all of it, you can use perl.

cat dirty-source.txt|perl -pe 's/[^[:ascii:]]//g' > clean-output.txt

The same command can be used on a file directly too.

perl -pe 's/[^[:ascii:]]//g' dirty-source.txt

To create a backup (eg dirty-source.txt.bak), you can add the “-i” flag like “-i.bak”

Hope it helps!
Andrea

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: