How to set up Unicode

To fix issues with display of special language characters once and for all there's a solution: Use Unicode (UTF-8) everywhere. If everything's set up to use Unicode, you can use mostly every language in your application.

There are several places that all may need some configuration tuning to use Unicode:

1. PHP script files

Make sure that you use an editor which is capable of using UTF-8 and save all your files UTF-8 encoded without BOM. If you have some older non-unicode files in your project open them with your editor and save them again UTF-8 encoded. On Linux you can also use command line tools like recode or iconv to convert a whole bunch of files.

2. Database tables

Every table in your database needs to use UTF-8 charset for its content. The configuration for that might differ between database systems.

MySQL

To find out if a table uses utf8 charset you have to look at the CREATE statement for that table. You can use phpMyAdmin's export feature and look at the CREATE statement.

Info: Don't confuse the encoding of characters in a table with its collation. The latter is used for sorting in queries and can be changed easily with e.g. phpMyAdmin or even for a single query.

You could also issue this SQL statement:

SHOW CREATE TABLE your_tablename;

You'll see a CREATE statement with the CHARSET information at the end. It should like this:

CREATE TABLE IF NOT EXISTS `your_tablename` (
  .... your field definitions ...
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

If your table doesn't use UTF-8 charset yet the easiest way to change this is to export your table, adapt the CREATE statement's CHARSET parameter and re-import your table again into the database.

Be very careful when doing this conversion and make sure you save the file with the changed SQL statement in UTF-8 and convert it if neccessary. If not performed carefully you can easily end up with messed up encodings, e.g. having ISO-8859-1 encoded characters in a table with utf8 CHARSET.

Tip: To have MySQL create all of your tables with utf8 CHARSET by default, you can add this to your MySQL configuration (e.g. my.cnf file):

[mysqld]
character-set-server = utf8
# for older versions:
default-character-set = utf8

3. Database connection

When connecting to a database a client like PHP has to use a specific charset encoding. To specify the charset to use for a connection in Yii, configure it like this:

return array(
    ......
    'components'=>array(
        ......
        'db'=>array(
            'connectionString'=>'sqlite:protected/data/source.db',
            'charset'=>'utf8',
        ),
    ),
    ......
);

4. Webserver/HTTP-Header

We also need to let the browser know, that we use UTF-8 with our pages. The best place to do this is in the header of an HTTP response. Configuring this varies between different server software.

Tip: If you use this approach, there's no need to add additional header information about encoding to your pages. Using the HTTP header is enough.

Apache

You can configure UTF-8 charset either in a VirtualHost section of your server configuration or by adding this line into a .htaccess file in your DocumentRoot:

AddDefaultCharset UTF-8

5. PHP string functions

PHP needs to use UTF-8 internally in order for e.g. string length validation to work correctly. Full Unicode support will be available in PHP 6 and is still work in progress.

mbstring

The alternative is to use mbstring functions instead of the non-multibyte aware counterparts. Since mbstring is a non-default extension it might not be available on every host. That's one of the reasons why Yii uses the non-multibyte functions like strlen() instead of mb_strlen() by default.

Using mbstring with Yii > 1.1.1

Since version 1.1.1 you can use the encoding parameter of CStringValidator. If you set it to utf-8 it will use the mbstring functions for different string validation operations.

Using mbstring with older versions of Yii

A workaround for older releases is to use mbstring's function overloading feature. This will override then non-multibyte aware functions with their mbstring counterpart. To set this up add this in your php.ini:

mbstring.func_overload "7"
mbstring.internal_encoding "UTF-8"

As an alternative you can also enable it for a single VirtualHost in Apache in the according configuration section:

php_admin_value mbstring.func_overload "7"
php_admin_value mbstring.internal_encoding "UTF-8"

Note: Unfortunately it's not recommended to set this an an .htaccess file as this may lead to undefined behavior.

Links

Chinese version

Total 9 comments:

#180
thanks
by thomas.mery at 4:41pm on April 8, 2009.

for summing this up here :)

it's often been a reason for headaches ...

#397
thanks for tips
by piwer at 6:49am on June 21, 2009.

in my case adding 'charset'=>'utf8' to db connection config helped with polish letters.

#670
config...
by mech7 at 9:20am on September 22, 2009.

'charset' => 'UTF-8',

is needed for CHtml::encode

#937
Remove UTF-8 BOM from ouput
by Mirco at 12:31am on December 28, 2009.

You can remove the UTF-8 BOM from the output using the ob_start function. This way you can leave the UTF-8 BOM in your source files so your editor understands it is really UTF-8.

In the /protected/config/main.php you have to add before returning the config array:

ob_start('My_OB');
function My_OB($str, $flags)
{
    //remove UTF-8 BOM
    $str = preg_replace("/\xef\xbb\xbf/","",$str);
    return $str;
}
return array( ... yii config array ...);

P.S. You don't have to call ob_end_flush(), php will do this automatically at the end of the script.

#956
SETTING THE CHARSET
by solleon at 1:32am on January 7, 2010.

Just add on your 'rootApp/protected/config/main.php' the correct charset of your app on the root of the return array, like that:

'charset'=>'iso-8859-1',

So change your layout 'appRoot/protected/views/layouts/main.php', or others, to use the charset off your app, changing the meta tag of header section to this:

And your problems are solved, without need to do conversions and etc.

Bye!

#957
SETTING THE CHARSET - CORRECT
by solleon at 1:34am on January 7, 2010.

Just add on your 'rootApp/protected/config/main.php' the correct charset of your app on the root of the return array, like that:

'charset'=>'iso-8859-1',

So change your main layout to use the charset off your app, changing the meta tag of header section:

< meta http-equiv="Content-Type" content="text/html; charset=<?= Yii::app()->charset ?>" />

And your problems are solved, without need to do conversions and etc.

Bye!

#1274
php.ini
by Troto at 3:45pm on March 14, 2010.

If your php.ini has a default_charset set then everything might just get ignored (like in my case).

Just put this at the begining of the entry script (index.php) ini_set('default_charset','utf-8');

#1387
Individual fields of the table
by Karolis at 4:18am on April 14, 2010.

I was struggling to get utf8 work, my problem was that even though the DEFAULT CHARSET=utf8 was set to all of the tables, individual fields were having latin COLLATION and who knows what CHARACTER SET...

I had to do smth like this with all of the individual fields in the tables: ALTER TABLE tbl_example DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;

I hope this helps someone.

#1834
Shell script
by cma at 8:48am on August 26, 2010.

Here's the shell script i use in cygwin to remove the ... BOM

#!/bin/bash
for i in $(grep -rli $'\xEF\xBB\xBF' --include=*.php /cygdrive/c/PHP-projects/toto); do
    echo Processing $i;
    cp $i $i.bak
    cat $i | perl -pe 's/\xEF\xBB\xBF//i' > $i.new;
    mv $i.new $i;
done


Your Comment:

You may enter comment using Markdown syntax.

Please login with your forum account.
Note: you must have at least ONE forum post with your account.