“Groovy, baby!”, as international man of mystery likes to say.
It seems converting character and text data to utf8 character set is a common task for MySQL, especially during an upgrade. In fact, I had trouble with it during server and WordPress database upgrade for this blog site. I wrote about it in this post, where I explained how to do it step-by-step using a command line tool such as mysql, taking advantage of some nifty code generation trick with information_schema.
One drawback of that method is that it’s a manual process, therefore time-consuming and error-prone.
I spent some time today to cook up a simple Perl script to automate this task. See below. Remember to change MyDatabase to your database name (there are 3 places that need to be changed), and adjust the user name and password accordingly. This script also prints out sql statements used during the process so you will know what has been done to the database.
I have tested this successfully on my own WordPress blog. Hopefully it will help somebody out there. Enjoy!
[sourcecode language=”perl”]
#!/usr/bin/perl
# MyUtf8Converter.pl – convert all character data to utf8 character set
use strict;
use warnings;
use DBI;
# data source name, username, password, connection attributes
my $dsn = “DBI:mysql:MyDatabase:localhost”;
my $user_name = “MySQLUserName”;
my $password = “MySQLPassword”;
my %conn_attrs = (RaiseError => 1, PrintError => 0, AutoCommit => 1);
# connect to database
my $dbh = DBI->connect ($dsn, $user_name, $password, \%conn_attrs);
# Convert char types first. Use information_schema to generate alter table code to convert data to binary first.
my $sth = $dbh->prepare (“SELECT CONCAT(‘ALTER TABLE ‘, table_name, ‘ MODIFY ‘, column_name, ‘ ‘, REPLACE(column_type, ‘char’, ‘binary’)) FROM information_schema.columns WHERE table_schema = ‘MyDatabase’ and data_type LIKE ‘%char%’”);
$sth->execute ();
# Run alter table stamement. First convert char types to binary types, then convert those to char types utf8 character set
while (my ($val) = $sth->fetchrow_array ())
{
print $val, “\n”;
$dbh->do ($val);
$val =~ s/binary/char/;
$val = $val . ” CHARACTER SET utf8″;
print $val, “\n”;
$dbh->do ($val);
}
$sth->finish ();
# Now we convert text data. Use information_schema to generate alter table code to convert data to blob first.
$sth = $dbh->prepare (“SELECT CONCAT(‘ALTER TABLE ‘, table_name, ‘ MODIFY ‘, column_name, ‘ ‘, REPLACE(column_type, ‘text’, ‘blob’)) FROM information_schema.columns WHERE table_schema = ‘MyDatabase’ and data_type LIKE ‘%text%’”);
$sth->execute ();
# Run alter table stamement. First convert text types to blob types, then convert those to text types utf8 character set
while (my ($val) = $sth->fetchrow_array ())
{
print $val, “\n”;
$dbh->do ($val);
$val =~ s/blob/text/;
$val = $val . ” CHARACTER SET utf8″;
print $val, “\n”;
$dbh->do ($val);
}
$sth->finish ();
$dbh->disconnect ();
[/sourcecode]
14 responses to “Perl script to convert MySQL character set to utf8”
you rock, man.
Thanks Chuck. Long time no see. Hope all is well.
ok I did sort of the same but I do get after changing a TEXT field to BLOB than the data was to big?? while the data war almost nothing
any tips where to look? corrupt unicode?
Hi Bastiaan,
In your case, I doubt it is a matter of corrupt unicode. When the character set is UTF8, MySQL uses 3 bytes to store one character, instead of 1 when the character is Latin1. I suppose that is why the size grown after the conversion?
Hi, I’m getting this in my error log:
malformed header from script. Bad header=ALTER TABLE wp_comments MODIFY: convert.pl
(convert.pl being the name of the script)
The only other change I made to the script other than changing the database info is that I changed “localhost” to the correct hostname, since I was getting an error on that too.
Do you know what the new error means?
Thanks!
Hmmm, not sure about that SL.
I tried it again on my own machine and it worked. Hopefully you’ve fixed this problem. If not, send me the script and I can take a look.
Hi,
I tried it on my localhost(wamp, perl installed correctly.) but got “500 Internal Server Error”
By the way can you convert it to php, it will be very useful.
Hi Emre,
Sorry, at this point I don’t have enough time for a proper php code.
You can try the manual approach documented here and see what happens:
http://www.haidongji.com/2008/11/11/convert-character-set-to-utf8-in-mysql/
Hope it helps.
If there are foreign keys defined on in your table (column values referenced from other tables), conversion won’t work for the referenced column. drop f-keys first, then recreate. can be tricky, though.
Thanks for pointing that out, knb.
this is a chunk of gold
Thanks Ипотпал. Glad you liked it. Cheers!
[…] get me started, I thought it would be fun to rewrite a Perl utility I wrote before with Python. That script converts MySQL character sets to utf8, a very common task for wikis and […]
[…] If you have shell access, you could also try using the script detailed here or here. […]