Git Product home page Git Product logo

wp-md's Introduction

wp-md

Wordpress to Jekyll markdown extractor and converter.

I needed this because I nolonger had my website. Only an SQLDump.

You will need to create a mysql database and load your SQLDump. But that's not hard.

This set of scripts will query your mysql database and convert and rename the posts into individual jekyll/markdown posts using jekyll/markdown conventions for the header and naming.

You'll need to edit the posts afterward just to be sure. You may have HTML that the 'convert.sed' script does not handle. But the bulk of the work will be done.


environment

If you would like to share what is needed for your platform do a pull request and update this readme.


If you are wondering... This is shell, awk and sed. It just evolved that way. But It's a nice way to build things. If we get new queries, great. Everything is encapsulated and clear. Nothing is huge. New patterns for conversion, those go over there in the convert.sed file. It has, by default, a nice sustainable, maintainable architecture. I doubt that any other languages be faster.


Why ?

I needed this because I no longer had my wordpress site. I Only had the last SQLDump.

How ?

The first step is to set up a MySQL server and load your SQLDump into it. There are a bunch of articles on how to do that. It's really not hard.

I use Arch Linux so I just followed the directions for MariaDB the prefered flavor of MySQL on Arch.

There are ton of articles on how to setup mysql and restore an SQLDump, so I'm not going to talk about that here. Once you have your database up and running, all you really need to do is run the 'gen-posts' script.


gen-posts -u <userid> -d <database-name>

For me that was gen-posts -u eric -d ericgebhart

Or like this if you want to extract your pages into a different place:

gen-posts -u <username> -d <databasename> -q select_pages.sql -w wp_pages -m md_pages

Or if you don't want to query again but want to reconvert:

gen-posts -n

For help:

gen-posts -h

With my measly 22 posts this runs in a second or two. Then I have to check them all, edit any weirdness and add annoying things to the convert.sed script.

There is help. gen-posts -h to get a verbose explanation.


The process

The steps done by gen-posts are as follows:

  • Query the database and create all_posts.txt using the select-posts.sql query.
  • Use awk to extract all the posts into individual files in the wp_posts directory. These will be named in the pattern of wp_post<#>
  • Use html-md.sh to rename and convert the posts, using convert.sed for the cleanup and conversion to markdown, place the resulting markdown posts in the md_posts directory. These will have a nice name in the form of -<Title>.md

The Files

The program files should be pretty obvious, but just in case.

gen-posts

The master of all. It uses the query in select-posts.sql to get everything into a file named all_posts.txt. Then it has a line of awk to break all_posts.txt into individual wordpress posts. Then it lets html-md do the rest. There are options to do just about everything you would want. Run an alternate query, not run a query at all, put stuff in different directories. A basic command will look something like this. gen-posts -u eric -d ericgebhart

A more complex command could be something like this.

gen-posts -u eric -d ericgebhart -q select_pages.sql -w wp_pages -m md_pages

or to just redo the conversions from html to markdown again,

gen-posts -n or maybe gen-posts -n -w wp_pages -m md_pages

for a single file, which implies no query, this works.

gen-posts -f <wp-post-filename>
Add -m directory-name needed.

select-posts.sql and select-pages.sql

There are two SQL queries one is select-posts.sql and the other is select-pages.sql They are pretty basic queries so you may want to roll your own. I'm sure you can find a post on the internet about the various ways to query posts by author and other things. The important parts are that the query uses the '\G' and that you get the title, the date and the content.

If you dive into that take a look at the resulting 'all-posts' file you'll want your results to look like that.

gen-posts script has an option '-q' to pass any query you like.

html-md

The rest of this is done with Awk and Sed. The html-md script extracts the date and title to create the filename and creates the jekyll header for the post. It uses the convert.sed file for cleanup and the conversion from html to markdown Finally the output is piped through fmt which reformats the paragraphs from continuous text into lines with newlines. The default is a line length of 75 characters.

If tags and category fields were added, this script would be the one that would need to change.

convert.sed

This is just a list of sed commands to swap html tags for markdown or delete them all together. It also gets rid of the _^M_s that are everywhere, and the Content: label from the query. URLs are fixed,

 tags are left because they work.

My HTML was pretty simple, YMMV.

If you run into things I didn't cover which is likely, just add new patterns to 'convert.sed'.


For more details on how this actually works read the code and additionally read my post


Please, if you make improvements do a pull request so other people can enjoy the fruits of your labor.

wp-md's People

Contributors

ericgebhart avatar

Watchers

James Cloos avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.