Page Index Toggle Pages: 1 [2] 
Topic Tools
Hot Topic (More than 10 Replies) Convert to UTF-8 (Read 4,958 times)
Carsten
Ex Member


Re: Convert to UTF-8
Reply #15 - Sep 15th, 2010 at 1:19pm
Post Tools
@blackcatnc - to get details on what i am thinking of you'll have to be a little patient - remember i'm only danish and need a little time for coding and testing Wink

Try this for a start:

In Subs.pl find
Code
Select All
	$_[0] =~ s/\[ch(\d{3,})\]/ $1>127 ? "\&#$1;" : '' /egis; 


and add after
Code
Select All
	if ($yycharset =~ m\utf-8\i) {
		use Encode qw(decode_utf8 from_to);
		if (eval { decode_utf8($_[0], Encode::FB_CROAK); 1 }) {}
		else { from_to($_[0], "iso-8859-1", "utf8"); }
	} 



Remember - this is very early and temporary code.
  
Back to top
 
IP Logged
 
JonB
YaBB Administrator
YaBB Next Team
Operations Team
Beta Testers
Support Team
*****
Offline



Posts: 4,038
Location: Land of the Blazing Sun!

YaBB 2.6.1
Re: Convert to UTF-8
Reply #16 - Sep 15th, 2010 at 6:56pm
Post Tools
blackcatnc -

First, this isn't about your word, rather about how things work. In the 'olive branch' PM I had sent you, I explained that YaBB (from my research) used transliteration (read that as on-the-fly-transcoding) and didn't work through wholesale conversion - thus Carsten's reply (and proposed method) should not have been a surprise. You may also extend that logic to 'YaBB is really an adaptive transcoder' - in that it works with what its got.  I'm not going to dig out LaTex here, and make the cute little dot triangle - I'll say "Therefore - normally, a conversion is not anticipated".  And your desire to convert character sets makes the case that we may need a mechanism that works correctly with the character set adaptations already built into the core code.  I think your request might be the first.  If you look down in the Language Specific Support Boards, you will find working Chinese and Russian forums. Frankly, I'm unsure how often conversions would be an issue, but I think that is worth discussing also.   

Finally - I can say that one of the other posters on this topic had already broached the 'unthinkable' - Do we need/want to consider this complete change in method. So, I guess we (YaBB as a team project) are neither too old in the tooth nor too cantankerous to have our working assumptions challenged.

How's that for a quote from a 'support person'?

Wink





« Last Edit: Sep 15th, 2010 at 6:58pm by JonB »  

I find your lack of faith disturbing.
Back to top
IP Logged
 
Carsten
Ex Member


Re: Convert to UTF-8
Reply #17 - Sep 15th, 2010 at 8:04pm
Post Tools
Lots of arguments, quotes and fancy words. I'm a simple guy - so let me try and boil this topic down to something even I can understand.

blackcatnc has an existing forum using iso-8859-1 encoding.

Now he wants to change to UTF-8 encoding.

Question: Will YaBB display all the old iso-8859-1 encoded data correctly?

Answer: No - a lot of characters (from non english languages) will need conversion to display correctly - period.

Solution: You can do one of two: Convert the entire base of data, displayed names, signatures, board descriptions... Or you can convert 'on the fly'. I've provided a (first try) method to do the latter. It will test if the string is valid utf-8 - if is nothing happens else the string will be converted.
« Last Edit: Sep 15th, 2010 at 8:07pm by »  
Back to top
 
IP Logged
 
blackcatnc
YaBB Newcomer
*
Offline



Posts: 13
Re: Convert to UTF-8
Reply #18 - Sep 15th, 2010 at 8:42pm
Post Tools
Thank you Carsten. That sums up everything. We are now on the same page. Smiley I agree with the consensus. I understand your on the fly approach. Certainly solves the issue.

My alternate approach was to attempt to convert the member, board, and message data files to UTF-8, by mass converting all. I just didn't know if this would be a problem for YaBB since the flat-file format delimiters and all would be included. That was my original question in post 1.

Anyone else with this same issue please note that in the event one might want to switch to other forum software in the future, you should probably do the file conversion option. Otherwise, with on the fly, you would have to convert your files to valid UTF-8 first before you'd be able to use any available converters, because you have mixed encoding in your data files.
  
Back to top
 
IP Logged
 
Carsten
Ex Member


Re: Convert to UTF-8
Reply #19 - Sep 15th, 2010 at 11:34pm
Post Tools
Yep - you can choose to convert the whole charade to valid UTF-8 now or you can use 'on the fly' and postpone conversion till when/if you decide to change to other forum software.
« Last Edit: Sep 15th, 2010 at 11:35pm by »  
Back to top
 
IP Logged
 
blackcatnc
YaBB Newcomer
*
Offline



Posts: 13
Re: Convert to UTF-8
Reply #20 - Sep 16th, 2010 at 12:46am
Post Tools
If YaBB3 uses a MySQL database, you would also need a single encoding. Unless YaBB adds some magic code such as  Carsten's to account for the mix, and handle it all for you. Something to keep in mind. Smiley

I have converted all files to UTF-8 and everything is working appropriately. To anyone else in this situation UTFCast is a good free utility to do batch file conversions and leave the UTF-8 BOM out (some converters automatically put it in) which will cause YaBB to malfunction.

I'm surprised this is a rare request here. I have seen it often in similar software that started with one encoding and moved to UTF-8 as standard over the years. Wordpress and SMF come to mind off the top of my head.

Hopefully this topic will be useful to others in the future. I'm glad we got it all sorted out.  Smiley
  
Back to top
 
IP Logged
 
Page Index Toggle Pages: 1 [2] 
Topic Tools