Yii 1.1: How to set up Unicode

13 followers

To fix issues with display of special language characters once and for all there's a solution: use Unicode UTF-8 everywhere. If everything is set up to use Unicode, you can use mostly every language in your application.

Info: Strictly speaking, Unicode is a character set. It lists and names characters from every main language around the world. UTF-8 is an encoding. It defines a mapping between Unicode characters and a sequence of bytes. Other Unicode encodings exists, like UTF-16, but they are far less used on the web. UTF-8 has a main advantage over other Unicode encodings : it is backward compatible with ASCII.

There are several places that all may need some configuration tuning to use Unicode.

1. PHP script files

Every text file is stored in a specific character set on disk. For your PHP files this must be UTF-8 charset without BOM. Make sure to use an editor which is capable of Unicode. If you have some older non-unicode files in your project open them with your editor and save them again UTF-8 encoded.

Tip: On Windows you can for example use Notepad++, which has an Encoding menu from where you can change encodings of your files.

On Linux you can also use command line tools like recode or iconv to convert a whole bunch of files. Here's a script that converts every php file in the directory myproject/ and its sub-directories:

$ cd myproject/
$ for i in $(find -name '*.php'); do encoding=$(file -bi "$i" | sed -e 's/.*[ ]charset=//'); iconv -f $encoding -t UTF-8 -o "$i" "$i"; done

2. PHP-Code and Yii Application

PHP needs to use UTF-8 internally in order for e.g. string length validation to work correctly. Scripts should use mbstring functions instead of the non-multibyte aware counterparts.

By default, the Yii applications already supposes your character set to be UTF-8. See CApplication::charset. This is used for encoding text in HTML pages, e.g. by CHtml::encode().

Yii > 1.1.1

Yii will try to use mbstring functions if they are available. For the string validator you should set the encoding parameter to utf-8.

Older versions of Yii

A workaround for older releases is to use mbstring's function overloading feature. This will override then non-multibyte aware functions with their mbstring counterpart.

To set this up add this in your php.ini:

mbstring.func_overload "7"
mbstring.internal_encoding "UTF-8"

or configure it in a VirtualHost section in Apache:

php_admin_value mbstring.func_overload "7"
php_admin_value mbstring.internal_encoding "UTF-8"

Note: Unfortunately it's not recommended to set this in an .htaccess file as this may lead to undefined behavior.

3. Database

Your database needs to know that it should store data in utf-8. The configuration for that might differ between database systems.

MySQL

The charset can be defined per database and per table. Use the following SQL to find out the charset for an existing database or table:

SHOW CHARACTER SET FOR mydatabase;
SHOW CHARACTER SET FOR mydatabase.mytable;

Info: Don't confuse the encoding of characters in a table with its collation. The latter is used for sorting in queries and can be changed easily with e.g. phpMyAdmin or even for a single query.

If your table doesn't use UTF-8 charset yet the most reliable way to change this is to export your table, modify the CREATE statement's CHARSET parameter and re-import your table again into the database.

Be very careful when doing this conversion. You need to make sure you use the correct connection charset and save the file in UTF-8. If not performed carefully you can easily end up with messed up encodings, e.g. having ISO-8859-1 encoded characters in a table with utf8 CHARSET.

Tip: To have MySQL create all of your tables with utf8 charset and collation by default, you can add this to your MySQL configuration (e.g. my.cnf file):

[mysqld]
character_set_server    = utf8
collation_server        = utf8_general_ci
# for older versions:
default-character-set = utf8

4. Database connection

When connecting to a database a client like PHP also has to use a specific charset encoding. To specify the charset to use for a connection in Yii, configure it like this:

return array(
    // ...
    'components' => array(
        // ...
        'db' => array(
            // ..
            'charset' => 'utf8',
        ),
    ),

If you have problems with the charset configuration above you can also try to set the charset with a SQL command. You can use the initSQLs configuration:

'db'=>array(
        'connectionString'=>'sqlite:protected/data/source.db',
        'initSQLs'=>'SET NAMES utf8 ;',
    ),

5. HTTP Content-Type

We also need to let the browser know, that we use UTF-8 with our pages. There are 2 options for this:

  • HTTP Content-Type header. This is configured in the webserver but can also be set from PHP (see below).
  • Content-Type meta tag. You could add a meta tag to your HTML pages like <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />.

The recommended way is to use the HTTP header as it overrides what you have set in the meta tag.

Tip: If you let the webserver set the header, there's no need to add additional header information about encoding to your pages. In this case you would only have to overwrite the HTTP header if your page where not in UTF-8.

Apache

You can configure the Content-Type header either in a VirtualHost section of your server or in a .htaccess file in your DocumentRoot. Add this line:

AddDefaultCharset UTF-8

Nginx

The right Content-Type header is set with this directive:

server {
   charset UTF-8;
   ...
}

PHP alternative

If you don't have access to or don't want to modify your server configuration you can also set the content type from PHP. Again you have different options:

  • Set default_charset to utf8 in your php.ini
  • Add the following PHP command to Yii's index.php: header('Content-Type: text/html; charset=utf-8');.

The drawback of this method is that it sets the header only for PHP files. So if you also serve some static content, it will not have the right Content-Type header set.

Total 10 comments

#16006 report it
Firebreaker at 2014/01/08 10:19am
Unicode routing

How to setup unicode routing? Is there any option in hosting or is it in the urlManager in main.php in config dir?

#12768 report it
Mike at 2013/04/11 03:23am
Cleanup

To the other editors: I've cleaned up and reorganized the article. I think, some content was not really part of the HOWTO (e.g. the section about DB indexes). If you still think, that's useful information please add it as a comment here.

#5383 report it
Roman Solomatin at 2011/10/08 03:14pm
Great article

Very useful stuff. Some of this knowledge I learned the hard way... over the years.

#2963 report it
Leric at 2011/03/03 12:12am
Really helpful

It's important to remember to set encoding parameter when using CStringValidator with not latin charactors

#1946 report it
zaccaria at 2010/10/18 10:18am
set names

In my installation I have to do even ini set, in order to have database and application with the same data:

return array(
    ......
    'components'=>array(
        ......
        'db'=>array(
            'connectionString'=>'sqlite:protected/data/source.db',
            'charset'=>'utf8',
            'initSQLs'=>array('set names utf8'),
       ),
    ),
    ......
);
#151 report it
cma at 2010/08/26 03:48am
Shell script

Here's the shell script i use in cygwin to remove the ... BOM

#!/bin/bash
for i in $(grep -rli $'\xEF\xBB\xBF' --include=*.php /cygdrive/c/PHP-projects/toto); do
    echo Processing $i;
    cp $i $i.bak
    cat $i | perl -pe 's/\xEF\xBB\xBF//i' > $i.new;
    mv $i.new $i;
done
#595 report it
Karolis at 2010/04/14 12:18pm
Individual fields of the table

I was struggling to get utf8 work, my problem was that even though the DEFAULT CHARSET=utf8 was set to all of the tables, individual fields were having latin COLLATION and who knows what CHARACTER SET...

I had to do smth like this with all of the individual fields in the tables: ALTER TABLE tbl_example DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;

I hope this helps someone.

#708 report it
Troto at 2010/03/14 08:45pm
php.ini

If your php.ini has a default_charset set then everything might just get ignored (like in my case).

Just put this at the begining of the entry script (index.php) ini_set('default_charset','utf-8');

#1023 report it
Solleon at 2010/01/07 08:34am
SETTING THE CHARSET - CORRECT

Just add on your 'rootApp/protected/config/main.php' the correct charset of your app on the root of the return array, like that:

'charset'=>'iso-8859-1',

So change your main layout to use the charset off your app, changing the meta tag of header section:

< meta http-equiv="Content-Type" content="text/html; charset=<?= Yii::app()->charset ?>" />

And your problems are solved, without need to do conversions and etc.

Bye!

#1043 report it
Mirco at 2009/12/28 06:31am
Remove UTF-8 BOM from ouput

You can remove the UTF-8 BOM from the output using the ob_start function. This way you can leave the UTF-8 BOM in your source files so your editor understands it is really UTF-8.

In the /protected/config/main.php you have to add before returning the config array:

ob_start('My_OB');
function My_OB($str, $flags)
{
    //remove UTF-8 BOM
    $str = preg_replace("/\xef\xbb\xbf/","",$str);
    return $str;
}
return array( ... yii config array ...);

P.S. You don't have to call ob_end_flush(), php will do this automatically at the end of the script.

Leave a comment

Please to leave your comment.

Write new article
  • Written by: Mike
  • Updated by: Roman Solomatin
  • Category: How-tos
  • Yii Version: 1.1
  • Votes: +19
  • Viewed: 61,953 times
  • Created on: Feb 21, 2009
  • Last updated: Apr 26, 2013
  • Tags: i18n, unicode