PCRE and locale

The PCRE section in the PHP manual says about \w (any word character):

So PCRE’s escape sequence \w should actually include locale-specific characters (like Ä, Ö, Ü in german) if the right locale is set. I tried to call setlocale(LC_ALL,‘de’) before i use this ‘match’ validator:


array('name','match','pattern'=>'/^\w+$/i'),

But it still doesn’t accept german umlauts. So any idea how to enable locale-specific character classes?

This validator uses preg_match

Check the last comment maybe it will help you - http://php.net/manual/en/function.preg-match.php

Regarding the locale, try this (from php.net):


/* try different possible locale names for german as of PHP 4.3.0 */

$loc_de = setlocale(LC_ALL, 'de_DE@euro', 'de_DE', 'de', 'ge');

echo "Preferred locale for german on this system is '$loc_de'";

For me it outputs

So this way you can obviously make sure the correct locale was selected.

Regarding the regex, you have to add u modifier I think:


array('name','match','pattern'=>'/^\w+$/iu'),

If there are still any problems try putting this in the entry script (not sure if it helps though):




mb_internal_encoding('UTF-8');

mb_regex_encoding('UTF-8');



Thanks guys.

Your tips have put me on the right track: The right locale is ‘de_DE.utf8’ on my system (gentoo).

Before finding this out i wondered if we should fix this in CRegularExpressionValidator. But it’s too system specific. I will not rely on this either.

So i’ll omit \w and will use custom character classes instead:


'pattern'=>'/^[a-z0-9_äöüß]+$/ui'

works pretty fine.

Coming back to this old topic:

An alternative can be unicode character properties. They are locale independent, though. Still very useful, if you want to match e.g. any possible letter character in any language.