Case-insensitive search of Unicode characters

Case-insensitive matching of Unicode characters does not work, and this behavior is known SQLite limitation. As sqlite.org suggests this situation can be solved overriding like(), upper() and lower() functions (as needed) using sqlite3_create_function().

I’ve seen some php, pdo solutions but I don’t know how to implement it in yii.

Any help much appreciated.

Thanks

Marcus

I’ll help you telling that you need coding standard. Try to call you database tables and fields all in lowercase. Them will work in any case: sensitive or insensitive. Windows or Linux.

Sensorario thank you for your answer, but I don’t think any db naming convention will solve SQLite problem with Unicode content and searching / sorting (which is by design by the way) . Surely I wasn’t precise enough so: To keep sqlite library smallest possible developers made a decision not to implement full Unicode support and result is pretty poor collation functionality. There are only three possibilities regarding Collation when creating / altering a table field: BINARY, NOCASE and RTRIM.

NOCASE is made available to aid case insensitive matching using LIKE operator and lowering field content and pattern to match before evaluation. That’s all nice but it only works for ASCII characters.

To be precise any NON-ASCII character will not be lowered and thus only exact match will be returned. To simplify search term ANDRIJEVIĆ will be matched (exact) using one of the following ‘ANDRIJEVIĆ’, ‘andrijeviĆ’ (and of course combinations) but not andrijević, where lowercase variant of non-ascii character is used for match patter while field content is uppercase .

As I mentioned there is solution / implementation (thanks to blog.amartynov.ru) which works:

function lexa_ci_utf8_like($mask, $value) {

[indent]$mask = str_replace(


    [indent]array("%", "_"),


    array(".*?", "."),


    preg_quote($mask, "/")[/indent]


);


$mask = "/^$mask$/ui";


return preg_match($mask, $value);[/indent]

}

$pdo->sqliteCreateFunction(‘like’, “lexa_ci_utf8_like”, 2);

After this it’s possible to query any-case-macro combination using like function as usual and it’ll work for Unicode characters as well.

So, I’d like to know is it possible to extend regular Active Record functionality using code similar to one shown.

I’m not sure if this makes problem any clearer but still…

Thank you for your patience…

Marcus