Git Product home page Git Product logo

mysql-fix-utf8's Introduction

mysql-fix-utf8

Convert a MySQL database from Latin-1 to utf8mb4 (UTF-8)

This script is an implementation of the steps described in the excellent blog post by Stephen Balukoff Getting out of MySQL Character Set Hell

Problem

You have a MySQL database that contains "broken special characters". Perhaps the database was running for a long time behind a Ruby or PHP application without much thought given to character encoding. Special characters appear corrupted in some places, and correctly in other places. You now want to fix your application and database to correctly handle special characters using Unicode. You've made the necessary changes in your application, but you still have to address the problem in your MySQL database.

MySQL has historically used latin1 as its default character encoding. Using latin1 can result in international and special characters being encoded and displayed incorrectly in some instances, especially if parts of your application expect a different encoding. Using UTF-8 consistently, everywhere, is usually the recommended way to handle character data and avoid such problems. Fixing these problems after they've had time to grow can be a challenge. This script converts a MySQL database from latin1 to utfmb4 by dropping and recreating the database, then looping through every table and every column to correct characters.

MySQL introduced character encoding utf8, then later introduced utf8mb4. utf8 uses up to 3 bytes per character, so it does not support the entire Unicode character set. utf8mb4 uses up to 4 bytes per character so it supports a much wider range of characters.

Using this script

First, read Getting out of MySQL Character Set Hell and How to support full Unicode in MySQL databases. Do your research and decide on the best approach. Make any changes necessary for your application to use UTF-8 (which might include adding "encoding: utf8mb4" to your database connection properties).

To use this script you will need a linux shell, Perl, and MySQL 5.5.3 or higher.

Make a copy of your production database into a test environment. Do not run this script on a live database. Using the copy of your database:

  1. Edit mysql_fix_utf8.sh and change the lines at the top to provide the name of your database, password and username. The user must have the ability to drop and create databases (many shared hosting services restrict this ability). If you want to start with an existing mysqldump backup file instead of a MySQL database comment out the mysqldump line.

  2. If you need to apply any changes (ALTER TABLE... etc.) as part of the process add these to prepare_utf8_fix.sql. For example, you might have to change a column from TEXT to MEDIUMTEXT to allow for up to 4 bytes per character instead of only 1 byte per character.

  3. If you need to exclude certain tables or columns from the double-encoding fix modify procedure_fix_utf8.sql where indicated in the comments.

  4. Run mysql_fix_utf8.sh: chmod u+x mysql_fix_utf8.sh && ./mysql_fix_utf8.sh

Test and examine the effect of the script on your data. Once you're satisfied, decide how you want to apply the change on your production database.

YMMV

Naturally any situation will have its own unique set of variables. This script might do what you need, or you may need to modify it for your situation, or this script might be an example that helps you create your own script. This script can complete in a few minutes on a small database with a few hundred thousand rows. On a larger database you might want a script that uses an incremental approach.

The shell script mysql_fix_utf8.sh uses bash shell to run commands on mysql and mysqldump. It also uses Perl to do a find-and-replace. The decision to use bash and Perl was purely for my own convenience.

Disclaimer

This code is public domain. It is provided with no warranty or guarantee of any kind. Use at your own risk.

mysql-fix-utf8's People

Contributors

vincentml avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.