How I Rescued My Blog from Google's Cache

How I Rescued My Blog from Google’s Cache

Detail-oriented readers may have noticed the sudden appearance of a full-fledged category list, which, if you explore, is the gateway to a fully populated set of archives stretching once again back to the misty pre-9/11 world of innocence and laughter.

After the cut: nerdy details of how I recovered, just in case anyone else has a sudden blog database failure.

Once I realized that my database backups were useless, I started frantically saving cached pages from Google. I ran through searches like inurl:greengabbro.net/2020/05 to save entries one month at a time; this was a huge pain in the ass and could probably be done more effectively with some clever wgettage. In the end, though, I had html files for all my individual entry pages.
I wrote a perl script to read through the html, pull out the important bits, and save them in Movable Type’s import/export format. This was the trickiest part of the process, as I’m not a native regex speaker. In addition to the regexing, I kludged together a system of toggle variables and scratch files to make sure everything was written out in the right order. I expect this system to be somewhat irritating to anyone trying to read the script (including myself, after a week or so) but nevertheless, if you’re interested, here it is.
Because I had 5 years of archives to contend with, and the WordPress importer can sometimes choke on large files, I split the import job into 5 files.
I used the WordPress import process just like any other person would if they were migrating from Movable Type. My first couple attempts didn’t work quite right, as I’d forgotten some essential parts of the Movable Type format when I wrote my script, but once I shook those bugs out everything worked like a charm.

Comments

Rasmus wrote:

Welcome to the wonderous world of WordPress.

I can’t believe you went through Google’s cache like that. When I lost my database a while back (also using MT and afterwards switching to WP), I just said fekkit! and moved on. Maybe I shouldn’t have …

Posted 24 Jun 2020 at 3:22 am ¶
yami wrote:

Oh, I was using WordPress before, it’s just that the Movable Type import format was most convenient.

Had I ever been able to keep a real journal other than this blog, I wouldn’t've bothered. But, it was four years’ worth of sentimental blah-de-blah…

Posted 25 Jun 2020 at 12:25 am ¶

Green Gabbro

How I Rescued My Blog from Google’s Cache

Comments

Post a Comment

« Home

Contents

About This Post

Categories

Interact

Search

Syndication

Meta

Green Gabbro

How I Rescued My Blog from Google’s Cache

Comments

Post a Comment

« Home

Contents

About This Post

Categories

Interact

Related Entries

Search

Syndication

Meta