How I Rescued My Blog from Google’s Cache
Detail-oriented readers may have noticed the sudden appearance of a full-fledged category list, which, if you explore, is the gateway to a fully populated set of archives stretching once again back to the misty pre-9/11 world of innocence and laughter.
After the cut: nerdy details of how I recovered, just in case anyone else has a sudden blog database failure.
- Once I realized that my database backups were useless, I started frantically saving cached pages from Google. I ran through searches like inurl:greengabbro.net/2002/05 to save entries one month at a time; this was a huge pain in the ass and could probably be done more effectively with some clever wgettage. In the end, though, I had html files for all my individual entry pages.
- I wrote a perl script to read through the html, pull out the important bits, and save them in Movable Type’s import/export format. This was the trickiest part of the process, as I’m not a native regex speaker. In addition to the regexing, I kludged together a system of toggle variables and scratch files to make sure everything was written out in the right order. I expect this system to be somewhat irritating to anyone trying to read the script (including myself, after a week or so) but nevertheless, if you’re interested, here it is.
Because I had 5 years of archives to contend with, and the WordPress importer can sometimes choke on large files, I split the import job into 5 files.
- I used the WordPress import process just like any other person would if they were migrating from Movable Type. My first couple attempts didn’t work quite right, as I’d forgotten some essential parts of the Movable Type format when I wrote my script, but once I shook those bugs out everything worked like a charm.