Converting a WordPress Site to static HTML

I have a few old sites I created in WordPress but no longer update aside from installing newer versions of WordPress. I don’t want to run out of date software, but I also don’t want to take down the content.

It’s been a decade since Fanboy’s Convention List was last updated, but there are blog posts with well established URLs. (Besides, I still have dreams of one day reviving the site.)

My solution is to first, create backups of everything, and then spider the site, capture all the generated HTML and put the static pages back to replace them.

I’ve known about GNU Wget for just about forever, but only as an alternative to curl. What I discovered is that Wget has a —mirror option which allows you to download the entire site. It has a lot of options you’ll want to look into (so do look at the docs) but what I finally settled on for my purposes was

wget --mirror --page-requisites --wait=2 https://www.fanboyslist.com

Note: The --wait=2 makes Wget wait two seconds between requests. If you’re using this to mirror someone else’s site, consider using a higher value in order to avoid overloading their server. Badly behaved spiders can wreak havoc on sites with dynamically generated pages and may be blocked as a result.

Fun fact: although you may associate Wget with Linux, it’s also available for Windows. There are some differences in what characters are used for outputting file names, it should otherwise work the same way.

On the first pass, instead of directories mirroring the site structure, there were a bunch of files with names like index.html?p=257 (on Windows, this would show up as index.html%3Dp=257). Turns out that at some point, the site’s permalinks were turned off and WordPress had reverted to query string parameters.

Fix the permalinks, make sure categories and tags will have names instead of parameters.

The next pass has the directory structure, but still had the files with query strings. Digging in a bit, WordPress is generating “shortlinks” in the form

https://www.fanboyslist.com/blog/?p=257

Shortlinks are a microformat, meant to provide a shorter link for when you’re manually typing the URL. But this site doesn’t provide a means for manually discovering them, and I’m trying to remove the mechanism for resolving them, so that’s not needed in the static page (for my purposes, the canonical URL is much more useful).

One Google search later, I found a comment on a support thread about disabling shortlinks. In a nutshell, add this line to the end of the theme’s functions.php file:

remove_action('wp_head', 'wp_shortlink_wp_head', 10, 0);

(Note: I’m trying to remove this entire WordPress installation, so I’m going to modify the theme’s file. On an installation you were planning to keep, this should go in a child theme.)

While we’re fiddling with functions.php, remove the headers for the REST API, from https://wordpress.stackexchange.com/a/211469

remove_action( 'wp_head', 'rest_output_link_wp_head'              );
remove_action( 'wp_head', 'wp_oembed_add_discovery_links'         );
remove_action( 'template_redirect', 'rest_output_link_header', 11 );

Next, let’s get rid of the bit where the site is loading support for emoji (this site predates most US use of emoji). Here’s a nice little snippet from: https://www.netmagik.com/how-to-disable-emojis-in-wordpress/

/**
 * Disable the emoji's
 */
function disable_emojis() {
	remove_action( 'wp_head', 'print_emoji_detection_script', 7 );
	remove_action( 'admin_print_scripts', 'print_emoji_detection_script' );
	remove_action( 'wp_print_styles', 'print_emoji_styles' );
	remove_action( 'admin_print_styles', 'print_emoji_styles' );	
	remove_filter( 'the_content_feed', 'wp_staticize_emoji' );
	remove_filter( 'comment_text_rss', 'wp_staticize_emoji' );	
	remove_filter( 'wp_mail', 'wp_staticize_emoji_for_email' );
	
	// Remove from TinyMCE
	add_filter( 'tiny_mce_plugins', 'disable_emojis_tinymce' );
}
add_action( 'init', 'disable_emojis' );

/**
 * Filter out the tinymce emoji plugin.
 */
function disable_emojis_tinymce( $plugins ) {
	if ( is_array( $plugins ) ) {
		return array_diff( $plugins, array( 'wpemoji' ) );
	} else {
		return array();
	}
}

Remove the individual RSS feeds for each post’s comments

add_filter( 'feed_links_show_comments_feed', '__return_false' );

And then, a pile of other things to remove comes from this answer on Stack Overflow.

remove_action( 'wp_head', 'feed_links_extra', 3 ); // Display the links to the extra feeds such as category feeds
remove_action( 'wp_head', 'feed_links', 2 ); // Display the links to the general feeds: Post and Comment Feed
remove_action( 'wp_head', 'rsd_link' ); // Display the link to the Really Simple Discovery service endpoint, EditURI link
remove_action( 'wp_head', 'wlwmanifest_link' ); // Display the link to the Windows Live Writer manifest file.
remove_action( 'wp_head', 'index_rel_link' ); // index link
remove_action( 'wp_head', 'parent_post_rel_link', 10, 0 ); // prev link
remove_action( 'wp_head', 'start_post_rel_link', 10, 0 ); // start link
remove_action( 'wp_head', 'adjacent_posts_rel_link', 10, 0 ); // Display relational links for the posts adjacent to the current post.

And then, one item that isn’t in functions.php, Get rid off all the <link rel="pingback".... lines by installing the bye-bye-pingback plugin.

The theme was pretty old, based on Kubrik from around 2009 and some of the changes actually needed to be done via changes to theme files (e.g. remove the blog’s overall RSS feed), but with all these changes in place, I can now run Wget one last time and get a clean copy of the blog.

Generating Images from HTML

Editing images is hard. Moving things to the right location, adding other elements, going back to the first one and readjusting the location or size. And if you want to create multiple images with just a slightly different bit of text, or a different subject in the foreground…..

I’ve known people who can create masterpieces of art with PhotoShop and the like, but I’ve never developed the knack.

A while back, it occurred to me that I could do all kinds of fancy “this-goes-in-front-of-that” and rearranging things in a web page. Then I could just take a screenshot, do a little cropping and resizing (the secret to some of my best photos) and voilà, exactly the kind of image I wanted! And if I needed to make several such photos, well, web pages are just plain text and very easy to edit.

That’s great for a small number of images, but if you want to make a bunch of images (say, social media previews for 40 biographies), that would get tedious quickly.

The best way to deal with tedious tasks is automation.

So I created an automated HTML to Image Generator (it really needs a better name).

The idea behind it is you start off with a simple web page like this one:

<body>
	<div class="container">
		<img src="image/frog.png">
		<p class="name">Green</p>
	</div>
</body>

which creates a page looking like this:

Replace a few elements in the HTML with placeholders

<body>
	<div class="container">
		<img src="{{image}}">
		<p class="name">{{name}}</p>
	</div>
</body>

and then create a set of data files containing other values for those those placeholders. For example, this data file

{
    "name": "Yellow",
    "image": "image/sun.png",
    "colorCode": "#cccc00"
}

Creates this image

You can check out the whole thing in the HTML to Image Generator’s GitHub repository.

I hope someone finds it useful, and if you have suggestions for a better name, leave a message in the comments below.

Rubber Duck Debugging

There is a bit of dev folklore about a developer who had the experience of people coming to ask for help with problems they had encountered. What kept happening was  that they would stop midway through explaining the problem and walk away with a solution. All without the dev saying a thing.

After this happened a few times, the dev realized his participation in the process might not be required. To test this theory, he put a rubber duck on his desk.

The rule was, if you wanted to ask a question, you first had to explain the problem to the duck. Amazingly, explaining the problem to the duck had the same success rate as explaining the problem to the dev.

This practice has become known as “Rubber Duck Debugging.”

I’m not saying I’ve ever engaged in rubber duck debugging, but just yesterday I stopped partway through entering a support ticket and implemented the solution without any involvement from the support team….