How to set up Unicode

You are viewing revision #11 of this wiki article.
This version may not be up to date with the latest version.
You may want to view the differences to the latest version.

next (#13) »

  1. 1. PHP script files
  2. 2. Database tables
  3. 3. Database connection
  4. 4. Webserver/HTTP-Header
  5. 5. PHP string functions

To fix issues with display of special language characters once and for all there's a solution: Use Unicode (UTF-8) everywhere. If everything's set up to use Unicode, you can use mostly every language in your application.

There are several places that all may need some configuration tuning to use Unicode:

1. PHP script files

Make sure that you use an editor which is capable of using UTF-8 and save all your files UTF-8 encoded without BOM. If you have some older non-unicode files in your project open them with your editor and save them again UTF-8 encoded. On Linux you can also use command line tools like recode or iconv to convert a whole bunch of files.

2. Database tables

Every table in your database needs to use UTF-8 charset for its content. The configuration for that might differ between database systems.

MySQL

To find out if a table uses utf8 charset you have to look at the CREATE statement for that table. You can use phpMyAdmin's export feature and look at the CREATE statement.

>Info: Don't confuse the encoding of characters in a table with its collation. The latter is used for sorting in queries and can be changed easily with e.g. phpMyAdmin or even for a single query.

You could also issue this SQL statement:

[sql]
SHOW CREATE TABLE your_tablename;

You'll see a CREATE statement with the CHARSET information at the end. It should like this:

[sql]
CREATE TABLE IF NOT EXISTS `your_tablename` (
  .... your field definitions ...
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

If your table doesn't use UTF-8 charset yet the easiest way to change this is to export your table, adapt the CREATE statement's CHARSET parameter and re-import your table again into the database.

Be very careful when doing this conversion and make sure you save the file with the changed SQL statement in UTF-8 and convert it if neccessary. If not performed carefully you can easily end up with messed up encodings, e.g. having ISO-8859-1 encoded characters in a table with utf8 CHARSET.

>Tip: To have MySQL create all of your tables with utf8 >CHARSET by default, you can add this to your MySQL >configuration (e.g. my.cnf file): > >~~~ >[mysqld] >character-set-server = utf8 ># for older versions: >default-character-set = utf8 >~~~

3. Database connection

When connecting to a database a client like PHP has to use a specific charset encoding. To specify the charset to use for a connection in Yii, configure it like this:

return array(
    ......
    'components'=>array(
        ......
        'db'=>array(
            'connectionString'=>'sqlite:protected/data/source.db',
            'charset'=>'utf8',
        ),
    ),
    ......
);

4. Webserver/HTTP-Header

We also need to let the browser know, that we use UTF-8 with our pages. The best place to do this is in the header of an HTTP response. Configuring this varies between different server software.

>Tip: If you use this approach, there's no need to add additional header information about encoding to your pages. Using the HTTP header is enough.

Apache

You can configure UTF-8 charset either in a VirtualHost section of your server configuration or by adding this line into a .htaccess file in your DocumentRoot:

AddDefaultCharset UTF-8

5. PHP string functions

PHP needs to use UTF-8 internally in order for e.g. string length validation to work correctly. Full Unicode support will be available in PHP 6 and is still work in progress. For the time being the alternative is to use mbstring functions instead of the non-multibyte aware counterparts. Since mbstring is a non-default extension it might not be available on every host. That's one of the reasons why Yii uses the non-multibyte functions like strlen() instead of mb_strlen().

One workaround is to use mbstring's function overloading feature. This will override then non-multibyte aware functions with their mbstring counterpart. To set this up add this in your php.ini:

mbstring.func_overload "7"
mbstring.internal_encoding "UTF-8"

As an alternative you can also enable it for a single VirtualHost in Apache in the according configuration section:

php_admin_value mbstring.func_overload "7"
php_admin_value mbstring.internal_encoding "UTF-8"

>Note: Unfortunately it's not recommended to set this an an .htaccess file as this may lead to undefined behavior.

Links

Chinese version

19 0
14 followers
Viewed: 135 850 times
Version: Unknown (update)
Category: Tutorials
Tags: i18n, unicode
Written by: Mike
Last updated by: Roman Solomatin
Created on: Feb 21, 2009
Last updated: 10 years ago
Update Article

Revisions

View all history

Related Articles