Advertisement:

Author Topic: [g330] SMF, php 5.4, htmlspecialchars and non utf-8 languages.  (Read 15347 times)

Offline digger

  • Sr. Member
  • ****
  • Posts: 761
  • Gender: Male
    • realdigger on GitHub
    • SMF Russian Community
[g330] SMF, php 5.4, htmlspecialchars and non utf-8 languages.
« on: February 10, 2013, 03:49:54 PM »
Since php5.4 htmlspecialchars have new default encoding. Now htmlspecialchars function use utf-8 encoding if third parameter not defined. But
SMF have many hardcoded htmlspecialchars() calls without smcFunc and not defined encoding. And we have some troubles with non English and non utf-8 forums.
All htmlspecialchars calls should be replaced with proper smcFunc.
« Last Edit: April 12, 2013, 07:31:32 PM by Labradoodle-360 »

Offline emanuele

  • SMF Super Hero
  • *******
  • Posts: 14,156
  • Gender: Male
  • THERE'S JUST ME
Re: SMF, php 5.4, htmlspecialchars and non utf-8 languages.
« Reply #1 on: February 10, 2013, 04:10:52 PM »
Exactly what instances of htmlspecialchars are creating issues?


Take a peek at what I'm doing! ;D



Hai bisogno di supporto in Italiano?

Aiutateci ad aiutarvi: spiegate bene il vostro problema: no, "non funziona" non è una spiegazione!!
1) Cosa fai,
2) cosa ti aspetti,
3) cosa ottieni.

Offline digger

  • Sr. Member
  • ****
  • Posts: 761
  • Gender: Male
    • realdigger on GitHub
    • SMF Russian Community
Re: SMF, php 5.4, htmlspecialchars and non utf-8 languages.
« Reply #2 on: February 10, 2013, 04:49:39 PM »
Exactly what instances of htmlspecialchars are creating issues?

For example, I don't see cyrillic filenames in the "Attachments and Avatars - Browse Files - Attachments" admin area. I see something like "548x730 54.64КБ".

In the ManageAttachments.php file
find
Code: [Select]
$link .= sprintf(\'>%1$s</a>\', preg_replace(\'~&amp;#(\\\\d{1,7}|x[0-9a-fA-F]{1,6});~\', \'&#\\\\1;\', htmlspecialchars($rowData[\'filename\'])));replace with
Code: [Select]
$link .= sprintf(\'>%1$s</a>\', preg_replace(\'~&amp;#(\\\\d{1,7}|x[0-9a-fA-F]{1,6});~\', \'&#\\\\1;\', htmlspecialchars($rowData[\'filename\'], false, \'cp1251\')));or
Code: [Select]
$link .= sprintf(\'>%1$s</a>\', preg_replace(\'~&amp;#(\\\\d{1,7}|x[0-9a-fA-F]{1,6});~\', \'&#\\\\1;\', htmlspecialchars($rowData[\'filename\'], false, \'\')));
and now I see cyrillic filenames like "_душ.JPG 548x730   54.64КБ"

There are many other same instances in the sources. Htmlspecialchars returns blank line if calls without encodings parameter and have non utf8 input string.
Users can't change some cyrillic values in text fields like forum title in the admin area. Can't use cyrillic smileys codes. Don't see cyrillic filenames of attachments or avatars.
« Last Edit: February 10, 2013, 05:12:47 PM by digger »

Offline Arantor

  • Resident Overthinker
  • SMF Friend
  • SMF Legend
  • *
  • Posts: 71,254
    • StoryBB/StoryBB on GitHub
Re: SMF, php 5.4, htmlspecialchars and non utf-8 languages.
« Reply #3 on: February 11, 2013, 12:14:41 AM »
In fact it's pretty much every instance of htmlspecialchars that doesn't refer back to $smcFunc.

There is even a bug on Mantis about this, from years back. While there are some that should not be changed to smcFunc instances (I'm thinking primarily strlen here, where you need bytes not characters), I cannot envisage a case where bare htmlspecialchars() should be called without awareness of encoding.
Don’t try to tell me that some power can corrupt a person. You haven’t had enough to know what it’s like.

No good deed goes unpunished / No act of charity goes unresented.

Offline digger

  • Sr. Member
  • ****
  • Posts: 761
  • Gender: Male
    • realdigger on GitHub
    • SMF Russian Community
Re: SMF, php 5.4, htmlspecialchars and non utf-8 languages.
« Reply #4 on: February 11, 2013, 02:11:05 PM »
If the SMF is a multilingual forum, it is a critical bug for it.
Not utf-8 forums can't use many of the functions with the current version of php. This should be fixed or developers should drop support for not utf-8 and clearly inform about it.

Offline Arantor

  • Resident Overthinker
  • SMF Friend
  • SMF Legend
  • *
  • Posts: 71,254
    • StoryBB/StoryBB on GitHub
Re: SMF, php 5.4, htmlspecialchars and non utf-8 languages.
« Reply #5 on: February 11, 2013, 02:18:57 PM »
Dropping non UTF-8 support is a massive undertaking, but entirely doable.
Don’t try to tell me that some power can corrupt a person. You haven’t had enough to know what it’s like.

No good deed goes unpunished / No act of charity goes unresented.

Offline digger

  • Sr. Member
  • ****
  • Posts: 761
  • Gender: Male
    • realdigger on GitHub
    • SMF Russian Community
Re: SMF, php 5.4, htmlspecialchars and non utf-8 languages.
« Reply #6 on: February 12, 2013, 07:34:50 PM »
Nobody cares

Offline Arantor

  • Resident Overthinker
  • SMF Friend
  • SMF Legend
  • *
  • Posts: 71,254
    • StoryBB/StoryBB on GitHub
Re: SMF, php 5.4, htmlspecialchars and non utf-8 languages.
« Reply #7 on: February 12, 2013, 07:35:45 PM »
I do, but I'm not really in a position to do anything about it here. Elsewhere, that's another story entirely.
Don’t try to tell me that some power can corrupt a person. You haven’t had enough to know what it’s like.

No good deed goes unpunished / No act of charity goes unresented.

Offline IchBin™

  • SMF Friend
  • SMF Super Hero
  • *
  • Posts: 11,115
  • Gender: Male
  • I don't speak German.
Re: SMF, php 5.4, htmlspecialchars and non utf-8 languages.
« Reply #8 on: February 12, 2013, 11:07:28 PM »
Nobody cares

One can only care so much when what they do here is volunteer free time out of their own personal life. Nobody gets paid to cater to your every issue or suggestion. If you feel that strongly about it, and have a fix to apply. Go to github and propose a pull request to fix it in the next version.

https://github.com/SimpleMachines/SMF2.1#readme
IchBin™        TinyPortal
Coding Guidelines       

Offline theymos

  • Newbie
  • *
  • Posts: 2
Re: SMF, php 5.4, htmlspecialchars and non utf-8 languages.
« Reply #9 on: March 23, 2013, 01:02:25 AM »
This is a particular problem for forums using ISO-8859-1 because htmlspecialchars will throw away all of its input if it contains a non-breaking space character (because a string containing 0xA0 alone is not valid UTF-8). The non-breaking space character is very common, so this causes many problems: email notifications for PMs will frequently be blank; the file editor "randomly" removes lines; etc.

By the way, SMF should IMO not convert multiple spaces to non-breaking spaces, especially in [code] segments. Doing so produces different characters than the poster intended, which can cause problems. I'd keep the multiple spaces but put them in <span style="whitespace:pre"> so they aren't collapsed.

Offline Oldiesmann

  • SMF Friend
  • SMF Super Hero
  • *
  • Posts: 24,866
  • Gender: Male
  • Ask me about the function DB :)
    • oldiesmann on Facebook
    • Oldiesmann on GitHub
    • https://www.linkedin.com/in/michaeleshom on LinkedIn
    • @oldiesmann on Twitter
    • Archie Comics Fan Forum
Re: SMF, php 5.4, htmlspecialchars and non utf-8 languages.
« Reply #10 on: March 23, 2013, 01:18:42 PM »
The reason why SMF has not gone "UTF8-only"  is because we still support older versions of MySQL which do not have character set / collation support.

Even with SMF 2.1, we will still be supporting MySQL versions as old as 4.0.18 (though I have no idea why).

At this point we will definitely try to fix as many issues like this as possible, but we can't rewrite half of SMF to support only UTF8 and still expect to get 2.1 final out by the end of the year.
Michael Eshom
Cincy Space - now open!

Offline Arantor

  • Resident Overthinker
  • SMF Friend
  • SMF Legend
  • *
  • Posts: 71,254
    • StoryBB/StoryBB on GitHub
Re: SMF, php 5.4, htmlspecialchars and non utf-8 languages.
« Reply #11 on: March 23, 2013, 01:43:37 PM »
MySQL 5.0 stable came out around the same time as SMF 1.1. You really have no reason to have < 5.0 compatibility. There are already issues from even-earlier MySQL support (TYPE vs ENGINE) so going 5.0+ only will fix some of that.

Quote
but we can't rewrite half of SMF to support only UTF8 and still expect to get 2.1 final out by the end of the year.

It's 3 days work to gut the innards and replace it with UTF-8 only, a week tops. (And before anyone tells me otherwise... I already did this. It took 2 days and sporadic bug fixes thereafter, totalling no more than 3 days work for me.)

Your biggest problem there is the upgrader, not the core of SMF.
Don’t try to tell me that some power can corrupt a person. You haven’t had enough to know what it’s like.

No good deed goes unpunished / No act of charity goes unresented.

Offline Oldiesmann

  • SMF Friend
  • SMF Super Hero
  • *
  • Posts: 24,866
  • Gender: Male
  • Ask me about the function DB :)
    • oldiesmann on Facebook
    • Oldiesmann on GitHub
    • https://www.linkedin.com/in/michaeleshom on LinkedIn
    • @oldiesmann on Twitter
    • Archie Comics Fan Forum
Re: SMF, php 5.4, htmlspecialchars and non utf-8 languages.
« Reply #12 on: March 23, 2013, 01:54:23 PM »
MySQL 5.0 stable came out around the same time as SMF 1.1. You really have no reason to have < 5.0 compatibility. There are already issues from even-earlier MySQL support (TYPE vs ENGINE) so going 5.0+ only will fix some of that.

Quote
but we can't rewrite half of SMF to support only UTF8 and still expect to get 2.1 final out by the end of the year.

It's 3 days work to gut the innards and replace it with UTF-8 only, a week tops. (And before anyone tells me otherwise... I already did this. It took 2 days and sporadic bug fixes thereafter, totalling no more than 3 days work for me.)

Your biggest problem there is the upgrader, not the core of SMF.

Given the flack I received from emanuele about wanting to push a major feature improvement into 2.1, I sincerely doubt we'll see any changes to support newer versions of MySQL and/or drop support for non-UTF8 languages for that version. One can dream though :)
Michael Eshom
Cincy Space - now open!

Offline redone

  • SMF Friend
  • SMF Hero
  • *
  • Posts: 8,939
  • Gender: Male
Re: SMF, php 5.4, htmlspecialchars and non utf-8 languages.
« Reply #13 on: March 23, 2013, 02:04:08 PM »
It would make sense for Arantor to share his fix? I would not consider this a "new-feature" though typically versions do get feature frozen for obvious reasons.

Seems fairly common sense to me. Maybe I am crazy! ;)

~redone

Offline Study Force

  • SMF Hero
  • ******
  • Posts: 3,656
    • StudyForcePS on Facebook
    • @studyforceps on Twitter
Re: SMF, php 5.4, htmlspecialchars and non utf-8 languages.
« Reply #14 on: March 23, 2013, 02:08:58 PM »
Hi Digger, a bit off topic, but how did you get ulogin to work on your smf website?

Offline Arantor

  • Resident Overthinker
  • SMF Friend
  • SMF Legend
  • *
  • Posts: 71,254
    • StoryBB/StoryBB on GitHub
Re: SMF, php 5.4, htmlspecialchars and non utf-8 languages.
« Reply #15 on: March 23, 2013, 02:09:39 PM »
The fix involves replacing hundreds and hundreds of changes to SMF, to make it UTF-8 only. It is not a simple fix and providing even the diff would be useless as a great many changes had already occurred by that time.
Don’t try to tell me that some power can corrupt a person. You haven’t had enough to know what it’s like.

No good deed goes unpunished / No act of charity goes unresented.

Offline ^HeRaCLeS^

  • SMF Hero
  • ******
  • Posts: 3,656
  • ♥ Valen ♥
    • AdkTeam.net on Facebook
    • @adk_team on Twitter
    • SmfPersonal
Re: SMF, php 5.4, htmlspecialchars and non utf-8 languages.
« Reply #16 on: April 12, 2013, 04:42:21 PM »
I'm in my modifications solved this as follows:

Code: [Select]
htmlspecialchars($string,ENT_QUOTES, $context['character_set']);
Not if it's the best way, but it works.
^HeRaCLeS^
*¤×• Ni te molestes en enviarme un Mp porque el soporte lo doy solo por el foro •×¤*

SMFPersonal

Offline digger

  • Sr. Member
  • ****
  • Posts: 761
  • Gender: Male
    • realdigger on GitHub
    • SMF Russian Community
Re: SMF, php 5.4, htmlspecialchars and non utf-8 languages.
« Reply #17 on: April 12, 2013, 04:48:42 PM »
Hi Digger, a bit off topic, but how did you get ulogin to work on your smf website?
I just installed ulogin mod. Without any problems.

Offline Matthew K.

  • SMF Super Hero
  • *******
  • Posts: 12,430
  • Gender: Male
    • matthew.kerle on Facebook
    • @matthew_kerle on Twitter
Re: SMF, php 5.4, htmlspecialchars and non utf-8 languages.
« Reply #18 on: April 12, 2013, 05:04:06 PM »
I'm in my modifications solved this as follows:

Code: [Select]
htmlspecialchars($string,ENT_QUOTES, $context['character_set']);
Not if it's the best way, but it works.
And why not just use $smcFunc['htmlspecialchars'](); which takes into account character set automatically? Which was kind of already stated, is the reason for using $smcFunc['htmlspecialchars'](); over "plain" htmlspecialchars();.

Offline ^HeRaCLeS^

  • SMF Hero
  • ******
  • Posts: 3,656
  • ♥ Valen ♥
    • AdkTeam.net on Facebook
    • @adk_team on Twitter
    • SmfPersonal
Re: SMF, php 5.4, htmlspecialchars and non utf-8 languages.
« Reply #19 on: April 12, 2013, 05:18:54 PM »
Labradoodle-360: I have a question ..
If you use that code is better... because it not used in throughout smf?

^HeRaCLeS^
*¤×• Ni te molestes en enviarme un Mp porque el soporte lo doy solo por el foro •×¤*

SMFPersonal