How to set up Unicode
To fix issues with display of special language characters once and for all there's a solution: Use Unicode (UTF-8) everywhere. If everything's set up to use Unicode, you can use mostly every language in your application.
There are several places that all may need some configuration tuning to use Unicode:
1. PHP script files
Make sure that you use an editor which is capable of using UTF-8 and save all your files UTF-8 encoded without BOM. If you have some older non-unicode files in your project open them with your editor and save them again UTF-8 encoded. On Linux you can also use command line tools like recode or iconv to convert a whole bunch of files.
2. Database tables
Every table in your database needs to use UTF-8 charset for its content. The configuration for that might differ between database systems.
MySQL
To find out if a table uses utf8 charset you have to look at the CREATE
statement for that table. You can use phpMyAdmin's export feature and look
at the CREATE statement.
Info: Don't confuse the encoding of characters in a table with its collation. The latter is used for sorting in queries and can be changed easily with e.g. phpMyAdmin or even for a single query.
You could also issue this SQL statement:
SHOW CREATE TABLE your_tablename;
You'll see a CREATE statement with the CHARSET information at the end. It
should like this:
CREATE TABLE IF NOT EXISTS `your_tablename` ( .... your field definitions ... ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
If your table doesn't use UTF-8 charset yet the easiest way to change this is
to export your table, adapt the CREATE statement's CHARSET parameter and
re-import your table again into the database.
Be very careful when doing this conversion and make sure you save the file with the changed
SQL statement in UTF-8 and convert it if neccessary. If not performed carefully
you can easily end up with messed up encodings, e.g. having ISO-8859-1 encoded
characters in a table with utf8 CHARSET.
Tip: To have MySQL create all of your tables with utf8 CHARSET by default, you can add this to your MySQL configuration (e.g.
my.cnffile):[mysqld] character-set-server = utf8 # for older versions: default-character-set = utf8
3. Database connection
When connecting to a database a client like PHP has to use a specific charset encoding. To specify the charset to use for a connection in Yii, configure it like this:
return array( ...... 'components'=>array( ...... 'db'=>array( 'connectionString'=>'sqlite:protected/data/source.db', 'charset'=>'utf8', ), ), ...... );
4. Webserver/HTTP-Header
We also need to let the browser know, that we use UTF-8 with our pages. The best place to do this is in the header of an HTTP response. Configuring this varies between different server software.
Tip: If you use this approach, there's no need to add additional header information about encoding to your pages. Using the HTTP header is enough.
Apache
You can configure UTF-8 charset either in a VirtualHost section of your server
configuration or by adding this line into a .htaccess file in your DocumentRoot:
AddDefaultCharset UTF-8
5. PHP string functions
PHP needs to use UTF-8 internally in order for e.g. string length validation to work correctly. Full Unicode support will be available in PHP 6 and is still work in progress.
mbstring
The alternative is to use mbstring functions instead of the non-multibyte aware counterparts. Since mbstring is a non-default extension it might not be available on every host. That's one of the reasons why Yii uses the non-multibyte functions like strlen() instead of mb_strlen() by default.
Using mbstring with Yii > 1.1.1
Since version 1.1.1 you can use the encoding parameter of CStringValidator. If you set it to utf-8 it will use the mbstring functions for different string validation operations.
Using mbstring with older versions of Yii
A workaround for older releases is to use mbstring's function overloading feature. This will override then non-multibyte aware functions with their mbstring counterpart. To set this up add this in your php.ini:
mbstring.func_overload "7" mbstring.internal_encoding "UTF-8"
As an alternative you can also enable it for a single VirtualHost in Apache in the according configuration section:
php_admin_value mbstring.func_overload "7" php_admin_value mbstring.internal_encoding "UTF-8"
Note: Unfortunately it's not recommended to set this an an
.htaccessfile as this may lead to undefined behavior.
Links
Total 9 comments:
in my case adding 'charset'=>'utf8' to db connection config helped with polish letters.
'charset' => 'UTF-8',
is needed for CHtml::encode
You can remove the UTF-8 BOM from the output using the ob_start function. This way you can leave the UTF-8 BOM in your source files so your editor understands it is really UTF-8.
In the /protected/config/main.php you have to add before returning the config array:
ob_start('My_OB');
function My_OB($str, $flags)
{
//remove UTF-8 BOM
$str = preg_replace("/\xef\xbb\xbf/","",$str);
return $str;
}
return array( ... yii config array ...);
P.S. You don't have to call ob_end_flush(), php will do this automatically at the end of the script.
Just add on your 'rootApp/protected/config/main.php' the correct charset of your app on the root of the return array, like that:
'charset'=>'iso-8859-1',
So change your layout 'appRoot/protected/views/layouts/main.php', or others, to use the charset off your app, changing the meta tag of header section to this:
And your problems are solved, without need to do conversions and etc.
Bye!
Just add on your 'rootApp/protected/config/main.php' the correct charset of your app on the root of the return array, like that:
'charset'=>'iso-8859-1',
So change your main layout to use the charset off your app, changing the meta tag of header section:
< meta http-equiv="Content-Type" content="text/html; charset=<?= Yii::app()->charset ?>" />
And your problems are solved, without need to do conversions and etc.
Bye!
If your php.ini has a default_charset set then everything might just get ignored (like in my case).
Just put this at the begining of the entry script (index.php) ini_set('default_charset','utf-8');
I was struggling to get utf8 work, my problem was that even though the DEFAULT CHARSET=utf8 was set to all of the tables, individual fields were having latin COLLATION and who knows what CHARACTER SET...
I had to do smth like this with all of the individual fields in the tables:
ALTER TABLE tbl_example DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;
I hope this helps someone.
Here's the shell script i use in cygwin to remove the ... BOM
#!/bin/bash
for i in $(grep -rli $'\xEF\xBB\xBF' --include=*.php /cygdrive/c/PHP-projects/toto); do
echo Processing $i;
cp $i $i.bak
cat $i | perl -pe 's/\xEF\xBB\xBF//i' > $i.new;
mv $i.new $i;
done

for summing this up here :)
it's often been a reason for headaches ...