Git Product home page Git Product logo

mongo2mysql's Introduction

mongo2mysql

This tool is a general purpose command line utility that will take a mongoDB and convert it to a mysql database. Right now this is mostly the case, but I ran into the clown car that is the nodeBB database structure and had to hard code in some kludges for its schema. Perhaps some dedicated data export specialist can refactor those kludges in the future.

Features

This is a fairly basic tool, but there are some features of note.

  • Creates the mysql schema and imports the data automatically from a given mongoDB.
  • Infers database types from the values in rows, automatically making schema changes as necessary.
  • Flattens deeply nested object structures.
  • Creates child tables from array values and ginormous child objects.
  • Reasonable percent complete and time estimation so you don't pull your hair out wondering what's taking so long.

Installation

You are going to need php, composer, mysql, mongoDB, and the php-mongo extension to use this tool.

  1. Clone this repository locally.
  2. Run composer install to bring in dependencies.
  3. Symlink bin/mongo2mysql into a directory in your path. Protip: Make yourself a ~/bin directory for all your little command line tools.
  4. Run mongo2mysql --help to see what to do next.

Restoring a MongoDB

Generally, you'll need to restore a mongoDB database to the same computer as your mysql database. To do this you can use the mongorestore command line utility.

  1. Unzip the bson files in the backup.
  2. The directory of the bson files will become the name of the mongoDB so rename it if you want to.
  3. Call mongorestore dirname/ to restore the files.

Tips

  • The _id column will be set to the primary key of the export table, but otherwise no indexes are added. Make sure to add your own indexes where needed after the conversion.

  • You might see some tables with a double underscore (__) in their names. These are child tables that have been created because a table had an array or giant object column. When you encounter one of these tables you'll see a primary key consisting of _parentid and _index columns. The _index column is automatically added from the array index or object key.

  • There is a _num column added to each export table. This number starts at one and increments up and can be used to help transition to integer primary keys instead of the hex IDs that mongoDB uses.

  • Since this is a command line application you should copy and paste your export command into a text file or task so that you can reproduce the export exactly between test and final exports.

Limitations

Since mongoDB is schemaless it really can give developers enough rope to hang themselves with. As such some databases may not export properly or at all. Here are some things to keep in mind.

  • Rows with two many keys won't be exported. You'll see a message printed when a row is skipped including that row's _id attribute so you can look up the row and see what's wrong.

  • The type checking isn't exhaustive at this point, but new types can be added easily.

  • Indexes aren't transferred. Make sure you add necessary indexes before processing resulting data.

  • The export does a schema check on every row of every mongoDB collection which is a way of saying this code isn't really optimized. The tool was implemented against a database with about 8 million rows and the export takes about an hour.

mongo2mysql's People

Contributors

initvector avatar tburry avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

isabella232

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.