April 6, 2012

UTF-8 charset issues from MySQL in PHP

Question by Nick

this is really doing my nut…..

all relevant PHP Output scripts set headers (in this case only one file – the main php script):

header("Content-type: text/html; charset=utf-8");

HTML meta is set in head:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

all Mysql tables and related columns set to:

utf8_unicode_ci     Unicode (multilingual), case-insensitive

I have been writing a class to do some translation.. when the class writes to a file using fopen, fputs etc everything works great, the correct chars appear in my output files (Which are written as php arrays and saved to the filesystem as .php or .htm files. eval() brings back .htm files correctly, as does just including the .php files when I want to use them. All good.

Prob is when I am trying to create translation entries to my DB. My DB connection class has the following line added directly after the initial connection:

 mysql_query("SET NAMES utf8, character_set_results = 'utf8', character_set_client = 'utf8', character_set_connection = 'utf8', character_set_database = 'utf8', character_set_server = 'utf8'");

instead of seeing the correct chars, i get the usual crud you would expect using the wrong charset in the DB. Eg:

Propriétés

instead of:

propriétés

don’t even get me started on Russian, Japanese, etc chars! But then using UTF8 should not make any single language charset an issue…

What have I missed? I know its not the PHP as the site shows the correct chars from the included translation .php or .htm files, its only when I am dealing with the MySQL DB that I am having these issues. PHPMyAdmin shows the entries with the wrong chars, so I assume its happening when the PHP “writes” to MySQL. Have checked similar questions here on stack, but none of the answers (all of which were taken care of) give me any clues…

Also, anyone have thoughts on speed difference using include $filename vs eval(file_get_contents($filename)).

Answer by Sebastián Grignoli

You say that you are seeing “the usual crud you would expect using the wrong charset”. But that crud is in fact created by using utf8_encode() on an already UTF8 string, so chances are that you are not using the “wrong encoding” anywhere, but exceeding the times you are encoding into UTF8.

You may take a look into a library I made to fix that kind of problems:

http://stackoverflow.com/a/3521340/290221

Answer by Starx

There is a mysql_set_charset('utf8'); in mysql for that. Run the query at the beginning of another query.

Author: Nabin Nepal (Starx)

Hello, I am Nabin Nepal and you can call me Starx. This is my blog where write about my life and my involvements. I am a Software Developer, A Cyclist and a Realist. I hope you will find my blog interesting. Follow me on Google+

...

Please fill the form - I will response as fast as I can!