Converting a WordPress Site to static HTML

I have a few old sites I created in WordPress but no longer update aside from installing newer versions of WordPress. I don’t want to run out of date software, but I also don’t want to take down the content.

It’s been a decade since Fanboy’s Convention List was last updated, but there are blog posts with well established URLs. (Besides, I still have dreams of one day reviving the site.)

My solution is to first, create backups of everything, and then spider the site, capture all the generated HTML and put the static pages back to replace them.

I’ve known about GNU Wget for just about forever, but only as an alternative to curl. What I discovered is that Wget has a —mirror option which allows you to download the entire site. It has a lot of options you’ll want to look into (so do look at the docs) but what I finally settled on for my purposes was

wget --mirror --page-requisites --wait=2 https://www.fanboyslist.com

Note: The --wait=2 makes Wget wait two seconds between requests. If you’re using this to mirror someone else’s site, consider using a higher value in order to avoid overloading their server. Badly behaved spiders can wreak havoc on sites with dynamically generated pages and may be blocked as a result.

Fun fact: although you may associate Wget with Linux, it’s also available for Windows. There are some differences in what characters are used for outputting file names, it should otherwise work the same way.

On the first pass, instead of directories mirroring the site structure, there were a bunch of files with names like index.html?p=257 (on Windows, this would show up as index.html%3Dp=257). Turns out that at some point, the site’s permalinks were turned off and WordPress had reverted to query string parameters.

Fix the permalinks, make sure categories and tags will have names instead of parameters.

The next pass has the directory structure, but still had the files with query strings. Digging in a bit, WordPress is generating “shortlinks” in the form

https://www.fanboyslist.com/blog/?p=257

Shortlinks are a microformat, meant to provide a shorter link for when you’re manually typing the URL. But this site doesn’t provide a means for manually discovering them, and I’m trying to remove the mechanism for resolving them, so that’s not needed in the static page (for my purposes, the canonical URL is much more useful).

One Google search later, I found a comment on a support thread about disabling shortlinks. In a nutshell, add this line to the end of the theme’s functions.php file:

remove_action('wp_head', 'wp_shortlink_wp_head', 10, 0);

(Note: I’m trying to remove this entire WordPress installation, so I’m going to modify the theme’s file. On an installation you were planning to keep, this should go in a child theme.)

While we’re fiddling with functions.php, remove the headers for the REST API, from https://wordpress.stackexchange.com/a/211469

remove_action( 'wp_head', 'rest_output_link_wp_head'              );
remove_action( 'wp_head', 'wp_oembed_add_discovery_links'         );
remove_action( 'template_redirect', 'rest_output_link_header', 11 );

Next, let’s get rid of the bit where the site is loading support for emoji (this site predates most US use of emoji). Here’s a nice little snippet from: https://www.netmagik.com/how-to-disable-emojis-in-wordpress/

/**
 * Disable the emoji's
 */
function disable_emojis() {
	remove_action( 'wp_head', 'print_emoji_detection_script', 7 );
	remove_action( 'admin_print_scripts', 'print_emoji_detection_script' );
	remove_action( 'wp_print_styles', 'print_emoji_styles' );
	remove_action( 'admin_print_styles', 'print_emoji_styles' );	
	remove_filter( 'the_content_feed', 'wp_staticize_emoji' );
	remove_filter( 'comment_text_rss', 'wp_staticize_emoji' );	
	remove_filter( 'wp_mail', 'wp_staticize_emoji_for_email' );
	
	// Remove from TinyMCE
	add_filter( 'tiny_mce_plugins', 'disable_emojis_tinymce' );
}
add_action( 'init', 'disable_emojis' );

/**
 * Filter out the tinymce emoji plugin.
 */
function disable_emojis_tinymce( $plugins ) {
	if ( is_array( $plugins ) ) {
		return array_diff( $plugins, array( 'wpemoji' ) );
	} else {
		return array();
	}
}

Remove the individual RSS feeds for each post’s comments

add_filter( 'feed_links_show_comments_feed', '__return_false' );

And then, a pile of other things to remove comes from this answer on Stack Overflow.

remove_action( 'wp_head', 'feed_links_extra', 3 ); // Display the links to the extra feeds such as category feeds
remove_action( 'wp_head', 'feed_links', 2 ); // Display the links to the general feeds: Post and Comment Feed
remove_action( 'wp_head', 'rsd_link' ); // Display the link to the Really Simple Discovery service endpoint, EditURI link
remove_action( 'wp_head', 'wlwmanifest_link' ); // Display the link to the Windows Live Writer manifest file.
remove_action( 'wp_head', 'index_rel_link' ); // index link
remove_action( 'wp_head', 'parent_post_rel_link', 10, 0 ); // prev link
remove_action( 'wp_head', 'start_post_rel_link', 10, 0 ); // start link
remove_action( 'wp_head', 'adjacent_posts_rel_link', 10, 0 ); // Display relational links for the posts adjacent to the current post.

And then, one item that isn’t in functions.php, Get rid off all the <link rel="pingback".... lines by installing the bye-bye-pingback plugin.

The theme was pretty old, based on Kubrik from around 2009 and some of the changes actually needed to be done via changes to theme files (e.g. remove the blog’s overall RSS feed), but with all these changes in place, I can now run Wget one last time and get a clean copy of the blog.

Generating Images from HTML

Editing images is hard. Moving things to the right location, adding other elements, going back to the first one and readjusting the location or size. And if you want to create multiple images with just a slightly different bit of text, or a different subject in the foreground…..

I’ve known people who can create masterpieces of art with PhotoShop and the like, but I’ve never developed the knack.

A while back, it occurred to me that I could do all kinds of fancy “this-goes-in-front-of-that” and rearranging things in a web page. Then I could just take a screenshot, do a little cropping and resizing (the secret to some of my best photos) and voilà, exactly the kind of image I wanted! And if I needed to make several such photos, well, web pages are just plain text and very easy to edit.

That’s great for a small number of images, but if you want to make a bunch of images (say, social media previews for 40 biographies), that would get tedious quickly.

The best way to deal with tedious tasks is automation.

So I created an automated HTML to Image Generator (it really needs a better name).

The idea behind it is you start off with a simple web page like this one:

<body>
	<div class="container">
		<img src="image/frog.png">
		<p class="name">Green</p>
	</div>
</body>

which creates a page looking like this:

Replace a few elements in the HTML with placeholders

<body>
	<div class="container">
		<img src="{{image}}">
		<p class="name">{{name}}</p>
	</div>
</body>

and then create a set of data files containing other values for those those placeholders. For example, this data file

{
    "name": "Yellow",
    "image": "image/sun.png",
    "colorCode": "#cccc00"
}

Creates this image

You can check out the whole thing in the HTML to Image Generator’s GitHub repository.

I hope someone finds it useful, and if you have suggestions for a better name, leave a message in the comments below.

Rubber Duck Debugging

There is a bit of dev folklore about a developer who had the experience of people coming to ask for help with problems they had encountered. What kept happening was  that they would stop midway through explaining the problem and walk away with a solution. All without the dev saying a thing.

After this happened a few times, the dev realized his participation in the process might not be required. To test this theory, he put a rubber duck on his desk.

The rule was, if you wanted to ask a question, you first had to explain the problem to the duck. Amazingly, explaining the problem to the duck had the same success rate as explaining the problem to the dev.

This practice has become known as “Rubber Duck Debugging.”

I’m not saying I’ve ever engaged in rubber duck debugging, but just yesterday I stopped partway through entering a support ticket and implemented the solution without any involvement from the support team….

Home Assistant: Text to Speech and URLs

This is one of those “In case I run into this again” type of posts, with the hope that it might help someone else too.

I’ve been trying to get Home Assistant’s text to speech integration working, but when I try to play anything via the developer tools or even a smart speaker’s entity card, all I get is a beep but no speech. I haven’t much use for it until recently, but I know it was working at one time, so something must have changed.

What I finally figured out is that my Home Assistant instance was misconfigured. Under Configuration > General, there are two URL settings. One is “External URL”, which is the URL to use for accessing your Home Assistant instance from outside your house. The other is “Internal URL” which is the URL to use from devices which are on your home network.

A few months ago, I set up Let’s Encrypt with DuckDNS so I could securely use the Home Assistant companion app from outside the house. This had the side effect of making it so the assistant could only be contacted via https. It’s still on port 8123 though, so there’s really no place to redirect from.

What does all of this have to do with Home Assistant? The TLS certificate associated with my setup only works for the name I setup with DuckDNS, so I’ve been using that name and hadn’t noticed that Home Assistant’s “Internal URL” was set to the RaspberryPi’s IP address instead of the DuckDNS name. So when my smart speaker attempted to retrieve the audio file from that URL, the HTTP connection it was using failed.

I updated the internal URL to match the DuckDNS name, and voila! I can now play speech through my smart speakers.

Turning off Web Proxy Auto-Discovery Protocol (WPAD)

Along with blocking some trackers, running my own DNS with Pi-hole gives me the “super power” of being able to see what DNS queries my computers are doing. This morning, I happened to notice that my desktop PC had made a bunch of lookups for “wpad.lan”.

Pi-hole appends “.lan” to the name of any machine on the local network, but that’s not a name I recognized. So what’s going on here?

Googling for “wpad.lan” lead me to discover that it’s a protocol for automatically discovering and configuring proxy servers. Most operating systems have it off by default, but Windows defaults it to on. More concerning, having proxy auto-discovery turned on is a security concern. Not so much on a home or corporate network (indeed, it’s likely helpful for corporate networks, which is perhaps why it’s on by default), but if you have it on and connect to a public network (e.g. a coffee shop, library, etc.) an attacker may be able to see all the details of your http requests (not breaking https, but working around it).

The desktop PC isn’t super-portable, so I’m not too concerned about unfamiliar WiFi, but apparently this is even a risk if you’re using VPN, so I definitely want to lockdown the laptops.

A bit more digging led me to a How-To Geek article summarizing the problem and including detailed instructions on how to turn off the auto-discovery.

In a nutshell:

  1. Launch the settings app
  2. Go to “Network & Internet”
  3. In the left navigation, choose “Proxy”
  4. Turn off the slider for “Automatically detect settings.”

Troubleshooting puppeteer in WSL2

I’m working on a small project to generate image files from HTML using a web browser. This is something I’ve toyed with for a while, but never really dug into canvas far enough. Once I discovered the puppeteer package for node, the dream seemed suddenly within reach.

Everything was going along fine, until I got to the point of actually trying to launch the headless browser. Then my program started crashing with the message:

(node:4279) UnhandledPromiseRejectionWarning: Error: Failed to launch the browser process!
/mnt/c/Users/blair/git/image-gen/node_modules/puppeteer/.local-chromium/linux-818858/chrome-linux/chrome: error while loading shared libraries: libnss3.so: cannot open shared object file: No such file or directory

The message included a link to a troubleshooting guide, which did mention some tips for Windows, but that was the Windows GUI environment and I’m using Ubuntu 20.04, running under the Windows Subsystem for Linux (WSL2). That meant it was either fix it myself, fire up a VM, or install node under Windows (which would mean losing the node version manager tool).

One of my main reasons for doing Node development in Linux is the ability to use nvm. A VM is much too heavy a solution for my tastes, so I wanted to see if I could get it working. And off to Google I went.

Searching for the error message is my usual first step, but although it turned up plenty of other people having problems (plus a few open GitHub issues from several years ago), it didn’t offer any solutions. Finally, a search for “puppeter wsl2 libnss3.so” led to a comment on an issue from last June where someone got it running by installing a bunch of packages manually.

0ne of the nice things about WSL is if you break your installation badly, it’s fairly trivial to remove it and reinstall a new copy. So it was fairly low risk to try installing the missing pieces to see if I could get it to work.

The error message even gave me a starting point: “error while loading shared libraries: libnss3.so: cannot open shared object file: No such file or directory.”

There’s a page at https://packages.ubuntu.com/ which allows you to search for which package a library comes from. I started by putting libnss3 in the keyword field and specifying focal (aka “20.04”) as the distribution, and began the iterative process of looking up the installing the missing packages, trying my program again, and then looking up the next failed package. Happily, all it took was a half-dozen tries before my script started working again.

Here’s the list:

libnss3
libatk-adaptor
libcups2
libxkbcommon0
libgtk-3-0
libgbm1

Full-disclosure: midway through, it occurred to me that the reason the packages were missing might be because WSL isn’t a GUI environment and therefore doesn’t have a browser installed. Running sudo apt install -y chromium-browser didn’t solve the problem, but it is possible that this installed some additional packages which I was then able to avoid installing manually.

Now to see if I can get it to render a page. ?

Notes on switching to Mac

A friend recently announced her job was requiring her to use a Mac, but she’d only ever used Windows and could anyone help her get started?

A similar work-related transition caused me to add Mac to my skillset a couple years ago and this request for assistance was the final push I needed to get my notes organized; here they are in a form that will perhaps help others as well.

I’m keyboard-oriented, so a lot of this focuses on using the keyboard and keyboard shortcuts.

Keyboard Navigation

One of the biggest changes from Windows to Mac is that for most things, where Windows uses the control (Ctrl) key, Mac uses the command (Cmd) key. It’s the one that looks like a square with loops on the corners. (If you plug a Windows keyboard into a Mac, you’ll use the “Windows” key as the command key.)

The control key does still get used, but it tends to be more dependent on the individual program.

Here’s a quick list of common keyboard shortcuts. Apple has a longer list in a support document at https://support.apple.com/en-us/HT201236.

FunctionWindowsMac
CopyCtrl-CCmd-C
CutCtrl-XCmd-X
PasteCtrl-VCmd-V
Move to start of lineHomeCmd-left arrow
Move to end of lineEndCmd-right arrow
Move to previous wordCtrl-left arrowOption-left arrow
Move to next wordCtrl-right arrowOption-right arrow
UndoCtrl-ZCmd-Z
Redo (undo the undo)Ctrl-YShift-Cmd-Z

Switching between programs

On Windows, you can “Alt-Tab” to switch between programs. On Mac, you use Command-Tab to switch between programs, but it doesn’t work the way Windows does. If you have multiple copies of Word open, Command-Tab will bring them ALL to the foreground.

To switch between instances of the same program (e.g. Switch between a meeting agenda and a report) use Command-` (That’s the key in the far upper-left of the keyboard, usually between Escape and Tab. It’s also known as the “backtick” or accent key. The “uppercase” version of that key is the tilde.)

Navigating the file system

On Windows, you navigate the file system with Windows Explorer. On Mac, it’s the Finder. This is the blue “smiley face” which appears in the “Dock.” (When I started using Mac, this was at the bottom of the screen, with the Finder icon on the left. Your mileage may vary.)

Launching Programs

There are at least two ways to launch applications

I find the fastest way to launch a program is by holding down the command key and pressing the space bar. This causes a prompt to appear where you can type the name of the program you want to run. As soon as you’ve typed enough for the program name to be selected, hit the Enter key to launch it. (This is the “Spotlight Search.)

Alternatively, in the finder, the area on the left includes an “Applications” tab. If you click on that, you’ll be presented with a list of installed applications.

Once a program has been launched, it will appear in the dock. You can right click on the application and choose to have it remain in the dock, even if it’s not running.

Macworld has a list of five ways to launch an app at:
https://www.macworld.com/article/3108469/5-ways-to-launch-mac-apps-from-the-keyboard.html

Screenshots

Mac keyboards don’t have a print screen button. If you plug in a Windows keyboard, the print screen button won’t do anything.

To take a screenshot in Mac, hold down the Command and Shift keys and then press the 4. You then use the mouse to select the area of the screen you wish to capture. Afterward, a thumbnail image will appear at the bottom right of the screen for 5-10 seconds. Click on the thumbnail to access the full-size image which you can then perform some rudimentary editing on before using Command-C to copy it into another program. (This is similar to the Windows-Shift-S functionality recently added to Windows 10.)

Along with Cmd-Shift-4, Apple’s list of keyboard shortcuts says you can also use Cmd-Shift-3 and (in newer versions of the OS) Cmd-Shift-5. (This latter apparently gives you an ability to record the screen which I wasn’t aware of before writing this.)

Apple offers a support article on screenshots at https://support.apple.com/en-us/HT201361

Program preferences

In Windows, programs are free to use whatever conventions they wish to launch program settings (generally a “Settings” item in the “File” menu, or sometimes “Preferences” under the “Edit” menu).

On Mac, program preferences are always (almost always?) accessible via a “Preferences” item on the menu item with the program’s name. This may also be accessed via the Command-Comma keyboard shortcut.

Accessing the Menu Bar

As mentioned at the beginning, I’m keyboard-oriented. I’ve not found a reliable way to do this. According to an article on c|net titled “Access menus via the keyboard in OSX“, you can use Command-F2.

Unfortunately, on newer Macbooks equipped with a touchbar, the function keys aren’t always available. As an alternative, you can use Command-Shift-/ (aka “Command-?”) to get into the Help search menu item. I find that to be enough of a hassle that using the mouse is easier.

Helpful Bookmarks

RDP connection to Linux

Going down one rabbit hole or another last night, I somewhat randomly found an article detailing how to install and access a graphical desktop UI on the Windows Subsystem for Linux.

The gist of it is

  1. Update your packages.
  2. Install the xfce4 package and optionally, xfce4-goodies (one imagines this would work for other desktops as well)
  3. Install xrdp
  4. Change the port (the default RDP port is used for connecting to the shell)
  5. Start xrdp
  6. Launch an RDP to localhost, using the new port number.

It’s a neat trick, but I’m not sure how much use I have for it. Most of what I do with WSL (e.g. running various Linux utilities) is command-line oriented. A GUI just adds extra steps. Plus, because of the way WSL works, you have to restart xrdp any time you restart Windows. That’s already a nuisance with XAMPP.

But that one step, installing xrdp. I might have a use case for that. I keep a couple Linux VMs around for things where I do want a GUI, and it’s also a nuisance having to launch the Hyper-V manager in order to connect. If I could just leave the VM running in the background and RDP to it as needed…. that would be helpful.

FR, Gridview, and puns

I follow @Cassidoo on Twitter and spotted this thread she started.

I know a few folks who think puns are for children and not groan adults, but me, I’ve always enjoyed a good play on words. She continues on for a few more tweets, all playing off CSS units of measurement, and other people chimed in with their own.

As I said, I like a good play on words, but one thing was bothering me, “What’s this ‘fr’ thing?” One Google search later, I now know it’s a unit of measure meaning, to use an automatically calculated fraction of the space in a container. It’s used in grids and flexboxes and solves some problems where you accidentally use more than 100% of the available space.

You can read about it in the spec, but I found this introduction to the fr CSS unit to be quite helpful.

I do believe, I may have a few dozen uses for this. ?

Home Assistant and TP-Link

Last week, I spotted this tweet from the official Home-Assistant account.

In short, what’s happened is that TP-Link issued a firmware update that turns off the ability to control their smart plugs (and, one assumes, smart switches) from a device on the local network (e.g. Home Assistant), leaving the cloud-based API, and their official KASA app, as the only way to control the devices.

I use TP-Link smart plugs myself. Currently to automate some lamps in the living room, but I’ll also be using them soon to automate the Christmas lights. (Sure, I could use a lamp timer, but I want the lights to go on right at sunset, not “sometime near sunset.” ?) For me, key parts of the value proposition were (a) It worked with Home Assistant (b) It didn’t require using someone else’s cloud (i.e. my usage patterns remain private).

Digging into it a bit… Turns out that there really is a legit security flaw with these devices. I haven’t seen any official details from TP-Link, but I found other reports of problems (Which?, October 2020; Fernando Gont, March 2017) involving weak encryption and the ability for other people to control the device.

So, it’s a legitimate concern. Ideally, the fix would be a locally accessible API with authentication. Turning off local access altogether is rather “ham fisted.”

Home Assistant has issued an alert that the TP-Link integration is “broken” with a link to a user-community discussion, though the alert isn’t really as obvious as one might hope….

Now that I know about the problem, I’ll have to weigh the risks of leaving the firmware out of date against losing my automations. I like the TP-Link plugs, they’ve been pretty reliable over the past several years, and the Home Assistant integration is about as simple as they come (you add a plug to your network, Home Assistant adds it to the list of devices…. easy peasy).

Ultimately, this comes down to the risks of using a “black box” product, where there is no official support for Home Assistant. Fortunately, there is a bit of good news in this. TP-Link seems to value the Home Assistant community and in response to the uproar is working on a fix to restore the local-control functionality.

The question is, do I trust them not to break it again?