Page Index Toggle Pages: 1 [2] 
Topic Tools
Hot Topic (More than 10 Replies) Convert to UTF-8 (Read 4,184 times)
JonB
YaBB Administrator
YaBB Next Team
Operations Team
Beta Testers
Support Team
*****
Offline



Posts: 3,785
Location: Land of the Blazing Sun!

YaBB 2.6.0
Re: Convert to UTF-8
Reply #5 - Sep 14th, 2010 at 6:19pm
Post Tools
Umm - not quite correct -

That's not true if you use any special characters or mix in words from a language other than English for any reason.

should be "mix in words from a NON-LATINATE chraracter set'. ISO-8859-1 is NOT English - its Latin -1. which supports most Western European languages.  (I guess you didn't read that Wikipedia page you quoted me very well.) 

and basically - "whatever"

I said I would ask the core person for this question, and I did.

Of course, you could always just set up another YaBB board move your stuff in and see what happens.   Exclaim

I know you want to argue, I don't.

See ya


 
  

I find your lack of faith disturbing.
Back to top
IP Logged
 
blackcatnc
YaBB Newcomer
*
Offline



Posts: 13
Re: Convert to UTF-8
Reply #4 - Sep 14th, 2010 at 5:16pm
Post Tools
UTF-8 is only backwards compatible with ASCII. The example word I just gave you in my post above uses a character value above 127. Its a false assumption to think all your ISO-8859-1 data is ASCII. That's not true if you use any special characters or mix in words from a language other than English for any reason. Half the usable characters will NOT be compatible:

http://en.wikipedia.org/wiki/ISO-8859-1#ISO-8859-1

Terminology aside (I consider it a flat file database), that's why I was asking about the rest of the data. The problem here is the text data strings encoded in ISO-8859-1. For them to work properly in UTF-8, those strings must be converted. It looks like the flat file format delimiting characters are not using any non ASCII characters, so I suspected the whole thing might work if you converted all files to UTF-8. However, I only know about the few files I looked at and do not have overall YaBB development experience.

Your data will show up wrong if you just start serving UTF-8 and don't convert the text data to match. As I said, the same behavior can be duplicated by forcing your browser to UTF-8 encoding right now. You'll see the word will not show up correctly. Then new data will be input as UTF-8 and you will then never be able to correct it as you have multiple character encodings for your strings.
« Last Edit: Sep 14th, 2010 at 5:18pm by blackcatnc »  
Back to top
 
IP Logged
 
JonB
YaBB Administrator
YaBB Next Team
Operations Team
Beta Testers
Support Team
*****
Offline



Posts: 3,785
Location: Land of the Blazing Sun!

YaBB 2.6.0
Re: Convert to UTF-8
Reply #3 - Sep 14th, 2010 at 4:17pm
Post Tools
Here's a thought - and a question

You say you have had a YaBB forum for some years, and yet you say 'database'.  YaBB is a text based, flatfile system.  This has a fair number of implications, particularly since everything is a string,-- fullstop.

Quote:
Thus was invented the brilliant concept of UTF-8. UTF-8 was another system for storing your string of Unicode code points, those magic U+ numbers, in memory using 8 bit bytes. In UTF-8, every code point from 0-127 is stored in a single byte. Only code points 128 and above are stored using 2, 3, in fact, up to 6 bytes.

This has the neat side effect that English text looks exactly the same in UTF-8 as it did in ASCII, so Americans don't even notice anything wrong. Only the rest of the world has to jump through hoops. Specifically, Hello, which was U+0048 U+0065 U+006C U+006C U+006F, will be stored as 48 65 6C 6C 6F, which, behold! is the same as it was stored in ASCII, and ANSI, and every OEM character set on the planet. Now, if you are so bold as to use accented letters or Greek letters or Klingon letters, you'll have to use several bytes to store a single code point, but the Americans will never notice. (UTF-8 also has the nice property that ignorant old string-processing code that wants to use a single 0 byte as the null-terminator will not truncate strings).


read-um this:
http://www.joelonsoftware.com/articles/Unicode.html

To be sure I'm 'righto', and not 'wrongo' (and thus also the good Captain) I'm going to consult a guru -

Good Luck with your forum.

Cool
« Last Edit: Sep 14th, 2010 at 4:29pm by JonB »  

I find your lack of faith disturbing.
Back to top
IP Logged
 
blackcatnc
YaBB Newcomer
*
Offline



Posts: 13
Re: Convert to UTF-8
Reply #2 - Sep 14th, 2010 at 12:32pm
Post Tools
No, it can't. Look at this example.

Say you have the word "pokémon" in your database. It's ISO-8859-1.

You cannot switch to UTF-8 and have that display correctly. Go ahead and try it in your browser. This forum is currently serving ISO-8859-1.

A conversion needs to be done with the existing data to convert it to UTF-8. Then you can use UTF-8 from that point forward. That's typically how I've seen it work with every other database and charset issue I've seen.

You have text data stored in one character set, you can't switch character sets without converting the data. In fact, it technically corrupts your database if you do. Because then you start inputting UTF-8 when you had ISO-8859-1. Now you've got to two different character encodings in your database and it's probably impossible to fix it after that.

From what you've told me, it seems like YaBB isn't ready for UTF-8 yet. Sad In this day and age with a global community, UTF-8 should probably be the standard.
  
Back to top
 
IP Logged
 
Captain John
Ex Member


Re: Convert to UTF-8
Reply #1 - Sep 14th, 2010 at 2:57am
Post Tools
No ... UTF-8 should be able to handle the ISO encoding without problems, but acceptance of UTF-8 characters has just recently been enabled in the newest version, Display/Username and I believe password now.
  
Back to top
 
IP Logged
 
blackcatnc
YaBB Newcomer
*
Offline



Posts: 13
Convert to UTF-8
Sep 13th, 2010 at 11:08pm
Post Tools
I have had a YaBB forum for several years in ISO-8859-1 and am looking to move to UTF-8.

Surprisingly, it seems information on UTF-8 with YaBB is a bit sparse. From reading the forums I've managed to figure out:

1. YaBB seems to support UTF-8.
2. UTF-8 is enabled by editing $yycharset in Admin.lng and Main.lng.

That seems to work. However, what do I do about the database message history in ISO-8859-1 format? Can I just download and batch convert all files to UTF-8 and reupload?

How about members and board files?
  
Back to top
 
IP Logged
 
Page Index Toggle Pages: 1 [2] 
Topic Tools
 
  « Board Index ‹ Board  ^Top