Yii multibyte safe?

I am running into trouble when using the length validator, its reporting a length error on my utf8 input fields (there are some umlauts) even though the length is ok.

This seems to happen because the validator uses strlen instead of  mb_strlen.

I could not find the command that sets the encoding for multibyte functions in the Yii code either ( i.e.: mb_internal_encoding("UTF-8") )

Is there a reason for this or is this a bug?

Any ideas?

I don't know how Yii takes care about multibyte but if I test it like this

<?php echo strlen('öüäe'); ?>

it returns 4 like expected. Do you have the charset definition in the main config file?

If you save your file with UTF8 encoding this will return 7.

In PHP you have to use the multibyte string functions for UTF8, they all start with mb_

Yii does not seem to do that, I will open a bug report.

Or do I get something wrong here Qiang?

I think mbstring is a non default module for PHP. So many providers might not have it compiled in. Maybe it's better to use the function overloading feature of this module:

http://de.php.net/ma…ng.overload.php

Unfortunately this can't be set with ini_set() but at least you can use a .htaccess file to enable it.

Yii uses UTF8 as the standard charset, which is good I think.

But then every function that works on user provided data has to be multibyte safe, otherwise UTF8 support is broken.

I am not sure if this strlen issue is the only problem.

Could this be fixed in another way? Maybe there should be a config option in Yii to turn on multibyte string functions?

This is merely a PHP problem. The framework can't fix this, if mbstring extension is not compiled into PHP. But it can easily be fixed by activating function overloading of mbstring extension in the Apache VirtualHost configuration:

php_admin_value mbstring.func_overload "7"


php_admin_value mbstring.internal_encoding "UTF-8"


Mmmh, the PHP manual sounds more like that the overloading functionality is provided to adapt older software for multibyte support without having to change it. The other problem is, that many providers won't allow you to change the virtual host settings for your package.

Though maybe this issue only occurs with the string validator, I could just write a new one then.

Quote

Mmmh, the PHP manual sounds more like that the overloading functionality is provided to adapt older software for multibyte support without having to change it.

I think it's nothing wrong in using this to fix the problem. That's what this feature is for. Real Unicode support will be available in PHP 6 AFAIK. If the framework code is replaced with mb_strlen() instead of strlen() there might be a lot of people complaining, that errors are thrown because of missing mbstring. The described fix avoids this.

I'll add it to the Unicode cookbook arcticle for now.

I see your point, working with UTF8 in PHP is still quite a hack as of now.

Its a good idea to mention some issues and solutions in the cookbook, thanx.

We should also provide another validator (or enhance the string validator) to check for valid UTF8 strings using mb_check_encoding(). Otherwise you can submit invalid characters which cause database exceptions (e.g. Incorrect string value: '\xFC').

I could write a new multibyte string validator which offers this whole functionality, maybe this could help other people too. Where should I put it?

Ok, here is my solution for now, I will use this validator instead of the CStringValidator (length) validator, I changed some attributes (i.e. 'max' to 'maxlength', etc.), feel free to use/improve/criticize it:



<?php


class mbstring extends CValidator {


    public $maxlength; // maximum allowed string length


    public $minlength; // minimum allowed string length


    public $islength; // required exact length





    public $tooShort; // custom message for short string


    public $tooLong; // custom message for long string


    public $wrongCharset;  // custom message for wrong character set


    public $allowEmpty=true;





    protected function validateAttribute($object,$attribute) {


        mb_internal_encoding(Yii::app()->charset);


        $value = $object->$attribute;


        if($this->allowEmpty && ($value === null || $value === ''))


            return;


        if (!mb_check_encoding($value)) {


            $message=$this->wrongCharset !== null ? $this->wrongCharset : Yii::t('yii','{attribute} has wrong character set.');


            $this->addError($object,$attribute,$message);


        }


        $length = mb_strlen($value);


        if($this->minlength !== null && $length < $this->minlength) {


            $message=$this->tooShort!==null?$this->tooShort:Yii::t('yii','{attribute} is too short (minimum is {min} characters).');


            $this->addError($object,$attribute,$message,array('{min}'=>$this->minlength));


        }


        if($this->maxlength!==null && $length>$this->maxlength) {


            $message=$this->tooLong!==null?$this->tooLong:Yii::t('yii','{attribute} is too long (maximum is {max} characters).');


            $this->addError($object,$attribute,$message,array('{max}'=>$this->maxlength));


        }


        if($this->islength!==null && $length!==$this->islength) {


            $message=$this->message!==null?$this->message:Yii::t('yii','{attribute} is of the wrong length (should be {length} characters).');


            $this->addError($object,$attribute,$message,array('{length}'=>$this->islength));


        }


    }


}


how to switch completely to the mbstring validation?

when I’m validating the model, the specified mbstring field validates through both mbstring and CStringValidator… ofcourse CStringValidator returns the length error.

I defined the rules as follows (I redefined length property back to standard max/min):





return array(

   array('title_ru', 'mbstring', 'length', 'max' => 45),

)




In Yii 1.1 you can pass a charset to the string validator, so it actually supports multibyte now.




public function rules() {

    return array(

        array('title_ru','length','max'=>45,'encoding'=>Yii::app()->charset)

    );

}



I still wrote my own version of the validator to also check for valid characters and to not always have to pass the ‘encoding’ parameter, so here is my new mbLength class:




class mbLength extends CStringValidator {

    public $wrongCharset;  // custom message for wrong character set

    

    public function __construct() {

        $this->encoding = Yii::app()->charset;

    }

    

    protected function validateAttribute($object,$attribute) {

        $value = $object->$attribute;

        if (!$this->isCharsetCorrect($value)) {

            $message=$this->wrongCharset !== null ? $this->wrongCharset : Yii::t('yii','Wrong character set.');

            $object->$attribute = '';

            $this->addError($object,$attribute,$message);

        }

        parent::validateAttribute($object,$attribute);

    }


    public function isCharsetCorrect($string) {

        $string = (string)$string;

        $convertCS = 'UTF-8';

        $sourceCS = Yii::app()->charset;

        return $string === mb_convert_encoding ( mb_convert_encoding ( $string, $convertCS, $sourceCS ), $sourceCS, $convertCS );

    }

}



To use this you would only have to specify one validator (mbLength) in the rule:




public function rules() {

    return array(

        array('title_ru','mbLength','max'=>45)

    );

}



Thanks, that helped :)