A tale of two uses of mod_rewrite

By Mattias Kihlström

It was the spring of hope, it was the winter of despair. “Just put everything from the same year into the same directory” I said to myself. The reason for this was that I wanted the URLs to look like /year/title-of-page, but at the same time I didn’t want title-of-page to be a directory, for the fear of mixing up files in my code editor if they were all named index.html. A page published in 2012 and titled “A new hope” would end up as the file a-new-hope.html in the 2012 directory.

Adding a .html suffix to URLs using mod_rewrite

In order to make the URLs work without the .html suffix I created a .htaccess file in the site root directory and added the following lines:

Options +FollowSymLinks
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.+)$ $1.html  [L]

These rules were then interpreted by mod_rewrite, a plugin that is a standard part of almost every Apache web server. The first two rows are needed in order for mod_rewrite to work at all. The first RewriteCond checks that the request is not for a directory and the second RewriteCond checks that an actual .html file exists before the RewriteRule does the work of adding .html to the request. The [L] at the end tells mod_rewrite, in case of a match, to not process any more rewrite rules that uses the same rewrite conditions (should there be any).

Two years later

As long as all content is nothing more than text, putting HTML files in a year directory looks fine. But what about pages containing images or making use of separate example files? Can you imagine what it might look like after a year? After two? I wouldn’t exactly call it a zen like experience when taking a look inside the year directories. Maybe what I said to myself two years ago wasn’t such a great idea after all.

New goals

Cleaner directories. More structure. Those were my new goals. The first part of cleaning up and adding more structure was to create subdirectories in the year directories with the same name as the old HTML files (sans .html). What was once a file called a-new-hope.html in a 2012 directory became a file called index.html in a a-new-hope directory inside the 2012 directory.

Wow. Such structure. So clean.

After the move to the new directory structure the URLs can continue to be the same as before. The URL for the example above is still /2012/a-new-hope. Is this it then? Of course not. You have probably already realized that should anybody have bookmarked /2012/a-new-hope.html they will now get error 404, not found. Not good.

Redirect .html files to subdirectories using mod_rewrite

Once again mod_rewrite lends its helping hand. Since there is only a few years of content in this case, I have made the choice to hard code the names of the year directories instead of matching them using a regular expression (for clarity). The new version of the .htaccess file looks like this:

Options +FollowSymLinks
RewriteEngine On
RewriteBase /
RewriteRule ^2012/([^/]+)\.html$ 2012/$1/	[R=301,L]
RewriteRule ^2013/([^/]+)\.html$ 2013/$1/	[R=301,L]
RewriteRule ^2014/([^/]+)\.html$ 2014/$1/	[R=301,L]

RewriteBase / tells mod_rewrite to base relative URLs on the site root. [^/]+ in the RewriteRule matches files in the specified directory, but not in any subdirectory and \.html$ implies that the request must end in .html. $1 is replaced by what is inside the first pair of matching parentheses (the file name matching [^/]+ in this case). Finally, R=301 is for letting the browser know that this redirect is permanent.

And what about my fear of mixing up files with the same name in my code editor? Well, I soon learned that in Sublime Text this is rarely a problem. If two open files have the same name, the uniqe part of each file’s path is added to the corresponding tab in the editor.

This is far, far better…

Further reading