Remove Byte Order Mark (BOM) from files recursively


Ok, this wiki is not only for yii projects.

The problem was that all my web applications ran normally on localhost, but on server the Greek characters (or any other no-english characters) displayed with problems.

So, I needed to remove BOM from hundreds view files of Yii from a lot of Yii projects manually. Netbeans has an option to keep files Encoding in utf-8 but not utf-8 without BOM.

My previous solution was the converting in utf-8 without BOM encoding one by one file on notepad++ consuming a lot of my time!

Many servers has not this issue but for other servers this is important. So, after of two hours searching I found a fast way to do that by commands.

(If you have windows, install cygwin first)

1) Open a shell command, go into your root folder that contains the project

2) Run this command

grep -rl $'\xEF\xBB\xBF'  /home/rootfolder/Yii_project  > file_with_boms.txt

3) Now, Run this one

while read l; do sed -i '1 s/^\xef\xbb\xbf//'  $l; done < file_with_boms.txt

Thats it! The BOM will be removed from all files that contained it. Now you can upload your project on your server.

Note: Because I didn't use this way many times and I don't know if it works properly for all cases and files, make first a backup of your project! :)

Total 3 comments

#17740 report it
Maurizio Domba Cerin at 2014/07/18 07:03am

Your solution uses 2 command and a middle file... to make it with one file you can use this command

find . -type f -exec sed 's/^\xEF\xBB\xBF//' -i.bak {} \; -exec rm {}.bak \; 

If you want to only list the affected files you can use:

find -type f|while read file;do [ "`head -c3 -- "$file"`" == $'\xef\xbb\xbf' ] && echo "found BOM in: $file";done
#16352 report it
Kostas Apazidis (KonApaz) at 2014/02/13 03:40pm
Re: #16349

Thanks for your comment!

Yes I had issues with Byte order mark many times! Ι rid of this annoying triple-byte using this wiki when I forget these two commands ;)

#16349 report it
mi.sarah at 2014/02/13 08:32am
Helped with strange characters  in output

First: Thanks a lot! Glorious hint!

Under cygwin and with the two lines, I finally found the file which always got me a cronjob response with characters like this: 

 is the BOM (Byte Order Mark) of UTF-8 files,

original it's the hexa code: EF BB BF

but it's displayed in browsers with: 

Your 2 commands worked fine :D

Leave a comment

Please to leave your comment.