How I Rescued My Blog from Google’s Cache

Detail-oriented readers may have noticed the sudden appearance of a full-fledged category list, which, if you explore, is the gateway to a fully populated set of archives stretching once again back to the misty pre-9/11 world of innocence and laughter.

After the cut: nerdy details of how I recovered, just in case anyone else has a sudden blog database failure.

  1. Once I realized that my database backups were useless, I started frantically saving cached pages from Google. I ran through searches like to save entries one month at a time; this was a huge pain in the ass and could probably be done more effectively with some clever wgettage. In the end, though, I had html files for all my individual entry pages.
  2. I wrote a perl script to read through the html, pull out the important bits, and save them in Movable Type’s import/export format. This was the trickiest part of the process, as I’m not a native regex speaker. In addition to the regexing, I kludged together a system of toggle variables and scratch files to make sure everything was written out in the right order. I expect this system to be somewhat irritating to anyone trying to read the script (including myself, after a week or so) but nevertheless, if you’re interested, here it is.

    Because I had 5 years of archives to contend with, and the WordPress importer can sometimes choke on large files, I split the import job into 5 files.

  3. I used the WordPress import process just like any other person would if they were migrating from Movable Type. My first couple attempts didn’t work quite right, as I’d forgotten some essential parts of the Movable Type format when I wrote my script, but once I shook those bugs out everything worked like a charm.


  1. Rasmus wrote:

    Welcome to the wonderous world of WordPress.

    I can’t believe you went through Google’s cache like that. When I lost my database a while back (also using MT and afterwards switching to WP), I just said fekkit! and moved on. Maybe I shouldn’t have …

  2. yami wrote:

    Oh, I was using WordPress before, it’s just that the Movable Type import format was most convenient.

    Had I ever been able to keep a real journal other than this blog, I wouldn’t’ve bothered. But, it was four years’ worth of sentimental blah-de-blah…

Post a Comment

Your email is never published nor shared. Required fields are marked *