• Welcome to Simple Machines Community Forum. Please login or sign up.
September 19, 2021, 07:30:50 AM

News:

Want to get involved in developing SMF, then why not lend a hand on our github!


Solutions for UTF-8 problems (Always in Multi-byte language)

Started by eyesofkids, December 30, 2005, 05:06:28 AM

Previous topic - Next topic

eyesofkids

I have used SMF in my chinese website (with Joomla).
But there are some serious chinese substring and some special characters wrong problems always.
I decided to fixed them and have resolve most of them.
And hope the official can add these solutions in next version if can.

1. Substring problems:
Because the function substr and strlen can handle utf-8 characters well. I find a good solution from the utf-8 library of hxxp:wiki.splitbrain.org/wiki:dokuwiki [nonactive].
I user utf8_substr and utf8_strlen to replace some of them.

2.Search result substring problems
There is the same problem like 1. and i add the pattern modifier '/u' in the preg_match_all function

3. Special utf-8 characters.
This problem is occur when some utf-8 characters conflict with \xA0 in Sub.php.
To handle it, the preg_replace function should add '/u' pattern modifier.

Eddy Chang
--------------
TaiwanJoomla.com

taka

Quote from: eyesofkids on December 30, 2005, 05:06:28 AM
And hope the official can add these solutions in next version if can.
I second that.  These two issues are major road blocks for SMF to be used in Asia.  The good news is, it's not difficult to make it work with UTF-8 while it continues working for European languages which uses ISO-8859-1 charset encoding.

I would like to propose having wrapper for string functions.  In each of these, we can call appropriate multibyte string functions.  AFAIK, mb_* functions are widely adopted and major hosting service providers have configured PHP with mb_* functions.  For more detail, see the document on php.net.

http://us3.php.net/manual/en/ref.mbstring.php [nofollow]

I will post a patch for 1.1RC1 as soon as possible.

spiros

Hello guys,

You might be interested in the discussion about new UTF problems in RC2
(taka is already aware, thank you Taka for your help)

http://www.simplemachines.org/community/index.php?topic=63235.0

Maybe you should change the topic title to

Solutions for RC1


Advertisement: