YaBB Community and Support Forum
YaBB Home About YaBB Download YaBB YaBB Support Customize Your Forum Development Contribute to the Project
  Welcome, Guest. Please Login or Register


 
Pages: 1 2 
Topic Tools
 
UTF-8 support (Read 5,459 times)
 Jan 29th, 2011 at 5:06pm
There are no actions to perform.  

Michael Prager 
Boardmod Team
Development Team
*****
Offline
Posts: 975
Germany


None
UTF-8 support
I've familiarized myself a bit with the internals of Unicode and the UTF-8 encoding. Good references on the subject I've found:

http://www.unicode.org/versions/Unicode5.2.0/
http://www.cl.cam.ac.uk/~mgk25/unicode.html
http://perldoc.perl.org/perluniintro.html
http://perldoc.perl.org/utf8.html
http://perldoc.perl.org/perlunitut.html
http://perldoc.perl.org/perlunicode.html

Looks like UTF-8 is the best choice to me, as it is fully supported by modern browsers as well as perl (as of 5.8) and by mysql (5.0). So we can encode everything in it and bother no more about encoding. Since it is compatible with latin1 it won't have much impact on disk usage for english based forums.

I've played a little with the current code and was shocked how easy it was to get working results: replace $yycharset with "UTF-8" in Languages/English/Main.pl and create a database with utf8 default encoding. That's it. Shocked

But this also reveals a serious problem with current multi language support in YaBB:
the encoding of data written to and read from the database (no matter if mysql or flatfile) is always dependent on the current user's selected language! So if I have two installed languages for my forum (for example one English and one Russian) and one user posts something in Russian language, every other user that uses english language setting will only see corrupted data! That's because the data was saved to the database in KOI8-U while it was loaded in Latin1. So any YaBB forum out there using multi language will probably have mixed or broken encodings Undecided
 
Nail here for a new monitor! --> [x]
WWW 91407017  
IP Logged  
 Reply #1 - Jan 29th, 2011 at 5:13pm
There are no actions to perform.  
Captain John 
Ex Member


None
Re: UTF-8 support
Michael Prager wrote on Jan 29th, 2011 at 5:06pm:
That's because the data was saved to the database in KOI8-U while it was loaded in Latin1.

 Isn't this the cure ?  Writing english to UTF http://www.yabbforum.com/community/YaBB.pl?num=1284419291/15#15
 
 
IP Logged  
 Reply #2 - Jan 29th, 2011 at 5:36pm
There are no actions to perform.  

Michael Prager 
Boardmod Team
Development Team
*****
Offline
Posts: 975
Germany


None
Re: UTF-8 support
Yes, we have to make sure data is saved only in one single format (UTF-8). I'm not sure if Carsten's code will solve the problem, as there is no way to tell in what encoding things were stored in the database before the patch.
« Last Edit: Jan 29th, 2011 at 5:37pm by Michael Prager »  
Nail here for a new monitor! --> [x]
WWW 91407017  
IP Logged  
 Reply #3 - Jan 29th, 2011 at 6:28pm
There are no actions to perform.  

Corey Chapman 
YaBB Administrator
*****
Offline
Posts: 10,015
Rock Hill, South Carolina


YaBB 2.5
Re: UTF-8 support
Perhaps with the converter we'll need for the data to the database we can solve this issue.  However, I have experienced what Michael is talking about.  I know that all foreign languages except a couple look like jibberish symbols (not language characters in that language) to me while using the English language pack.  However, if I change to that language's pack, I can see their language fine and the rest are jibberish.  This is not new to YaBB, it's been an ongoing issue with all forums which is why they have all debated encoding over the past couple years.
« Last Edit: Jan 29th, 2011 at 6:29pm by Corey Chapman »  
...
722568493  
IP Logged  
 Reply #4 - Jan 30th, 2011 at 8:03pm
There are no actions to perform.  

Michael Prager 
Boardmod Team
Development Team
*****
Offline
Posts: 975
Germany


None
Re: UTF-8 support
I've completed my work on the UTF-8 implementation. Because I can't login on Sourceforge's SVN right now (their password reset function is broken for me) I have uploaded my changes to GIT. As soon as I get access again, I'll commit it to /trunk/.

Download:
YaBB 3 SVN rev 323 with full UTF-8 support

Changelog

YaBB is now working entirely with Unicode. That means it will output HTML in UTF-8, it will read user data in UTF-8 and it will store its data in UTF-8 (flatfile and mysql).

The following should be noted:
  • no other encoding is supported anymore! But there is no need for others anyway
  • YaBB now requires the "Encode" module to be installed
  • YaBB now requires Perl 5.8.1 or above because Perl only fully supports UTF-8 since that version
  • all Language and Help files have to be UTF-8 encoded from now on
  • This does not yet contain functionality to convert existing custom encoded board data to UTF-8. This is no problem for boards that have stored data in Latin1/ISO-8859-1 (e.g. use english language only). That encoding is compatible with UTF-8. But you have to be careful with boards that have data stored in other formats, those will break without proper conversion.

 
Nail here for a new monitor! --> [x]
WWW 91407017  
IP Logged  
 Reply #5 - Jan 30th, 2011 at 8:18pm
There are no actions to perform.  

Jet Li 
Legacy Dev Team
Development Team
****
Offline
Posts: 6,588
Hong Kong


None
Re: UTF-8 support
Thnx Michael, but I get this Error if I upload to my Dev Board.

Code Select All
System Information

An Error Has Occurred! utf8 "\xA0" does not map to Unicode at ./Sources/Subs.pl line 2243.  

 
...
PM me for YaBB Installation Service
WWW Jet Li 100000788351637  
IP Logged  
 Reply #6 - Jan 30th, 2011 at 8:37pm
There are no actions to perform.  

Michael Prager 
Boardmod Team
Development Team
*****
Offline
Posts: 975
Germany


None
Re: UTF-8 support
Probably some non latin1 encoded data causing the issue there. Guess we need to catch those errors instead of just dying. Should work with a fresh install though.

Btw: resolved my sf account problem, SVN updated Smiley
 
Nail here for a new monitor! --> [x]
WWW 91407017  
IP Logged  
 Reply #7 - Jan 30th, 2011 at 9:34pm
There are no actions to perform.  

Michael Prager 
Boardmod Team
Development Team
*****
Offline
Posts: 975
Germany


None
Re: UTF-8 support
I'm curious though... tried to stuff all kind of invalid encoding garbage into my test board, but it never breaks, no error at all. What does the file look like that causes that error?
 
Nail here for a new monitor! --> [x]
WWW 91407017  
IP Logged  
 Reply #8 - Jan 30th, 2011 at 9:44pm
There are no actions to perform.  

Jet Li 
Legacy Dev Team
Development Team
****
Offline
Posts: 6,588
Hong Kong


None
Re: UTF-8 support
Maybe doing at Board Index. If I visit Topic via User Profile works. Only if I visit Message Index or Board Index or Rebuild Message Index I get Error.
 
...
PM me for YaBB Installation Service
WWW Jet Li 100000788351637  
IP Logged  
 Reply #9 - Jan 30th, 2011 at 10:53pm
There are no actions to perform.  

Michael Prager 
Boardmod Team
Development Team
*****
Offline
Posts: 975
Germany


None
Re: UTF-8 support
Ah ok, writing 0xfefeffff to forum.totals does produce the error. Investigating... Smiley
 
Nail here for a new monitor! --> [x]
WWW 91407017  
IP Logged  
 Reply #10 - Jan 31st, 2011 at 5:00pm
There are no actions to perform.  

Jet Li 
Legacy Dev Team
Development Team
****
Offline
Posts: 6,588
Hong Kong


None
Re: UTF-8 support
Ok. I will wait. The Boy have same issue on his Test Forum. See Post on Dev Board. Smiley
 
...
PM me for YaBB Installation Service
WWW Jet Li 100000788351637  
IP Logged  
 Reply #11 - Jan 31st, 2011 at 6:54pm
There are no actions to perform.  

Michael Prager 
Boardmod Team
Development Team
*****
Offline
Posts: 975
Germany


None
Re: UTF-8 support
Ok I've thought about it. The way it currently works is very strict. It will pop an error when incorrect encoded data is encountered. But that's actually not a bad thing. Now the user can notice immediately if something got corrupted.

The question is how to convert existing data. On-the-fly conversion is probably not an option. Because the user has do specify what the encoding for the old data should be. It may even differ between the boards of a forum (like the Russian board on this forum). So we probably need a separate converter that offers a nice interface to specify the encoding. Where to we place such a converter? Should we put it in Setup.pl just like the Y1 converter? That way we wouldn't have to bother about wrong formats within the YaBB main code - handle all the conversion separately.
 
Nail here for a new monitor! --> [x]
WWW 91407017  
IP Logged  
 Reply #12 - Jan 31st, 2011 at 7:04pm
There are no actions to perform.  

Jet Li 
Legacy Dev Team
Development Team
****
Offline
Posts: 6,588
Hong Kong


None
Re: UTF-8 support
Michael Prager wrote on Jan 31st, 2011 at 6:54pm:
Should we put it in Setup.pl just like the Y1 converter? That way we wouldn't have to bother about wrong formats within the YaBB main code - handle all the conversion separately.

That would be nice. On Converter Page user can choose 2 Conversions.

YaBB 1.x to YaBB 3.x
YaBB 2.x to YaBB 3.x

or

YaBB 1.x to YaBB 2.x
YaBB 2.x to YaBB 3.x
« Last Edit: Jan 31st, 2011 at 7:06pm by Jet Li »  
...
PM me for YaBB Installation Service
WWW Jet Li 100000788351637  
IP Logged  
 Reply #13 - Feb 1st, 2011 at 1:37am
There are no actions to perform.  
Captain John 
Ex Member


None
Re: UTF-8 support
Info:
 of the 8 downloadable languages available for YaBB (other than English)

charset's are:
1 is Windows - 1251  (Russian)
1 is Windows - 1256  (Arabic)
5 are ISO-8859-1  (Danish, Spanish, Finnish, Deutsche & Deutsche_Du)
1 is ISO-8859-2  (Polish)

« Last Edit: Feb 1st, 2011 at 6:15pm by N/A »  
 
IP Logged  
 Reply #14 - Feb 1st, 2011 at 3:12am
There are no actions to perform.  

Michael Prager 
Boardmod Team
Development Team
*****
Offline
Posts: 975
Germany


None
Re: UTF-8 support
Ok great, I'll put those into the encoding selection list. It might help the user to select a language instead of an encoding though. Or at least a table that shows which language file used what encoding.
 
Nail here for a new monitor! --> [x]
WWW 91407017  
IP Logged  
Pages: 1 2 
Topic Tools
 

Get Yet another Bulletin Board at SourceForge.net. Fast, secure and Free Open Source software downloads Support This Project BoardMod - YaBB features and templates YaBB Codex - support on installation and usage YaBB Toolbar for your browser

YaBB Facebook Group Page

Vulnerability Scanner

Valid RSS Valid XHTML Valid CSS Powered by Perl
YaBB Chat and Support Community » Powered by YaBB 3.0 Beta!
YaBB Forum Software © 2000-2011. All Rights Reserved.