Samuel Sjöberg's weblog

Skip to navigation

Setting up a decent 404 page

Today I realised that a decent 404 page is a must. The custom error page provided by my host was too simple to even consider. It didn't even show which address you were trying to reach.

My solution was to replace the given file with a SSI file which in turn includes a PHP script where a search for possible matches is performed.

The PHP script calculates the soundex string for the requested URI and returns a list of possible entries. At the moment I'm only doing this with entries since I figure that's the most common thing people might misspell.

First thing I did was to include the following in the Error document:

<!--#include virtual="/soundex.php?url=$REQUEST_URI"-->

The problem is that I'm using mod_rewrite so the REQUEST_URI breaks if I do a redirect when nothing is found in the database. My solution was to make a PHP function that grabs the Error document and replaces include-tag above with the output from my soundex script.

<?php
function report_404($soundex = true) {
   header('Status: 404 Not Found');
   $str = file_get_contents('error-doc/404.shtml')
   $str = str_replace('<!--#echo var="REQUEST_URI"-->',
              $_SERVER['REQUEST_URI'], $str);
   if ($soundex) {
      ob_start();
      include('soundex.php');
      $matches = ob_get_contents();
      ob_end_clean();
      $str = str_replace('<!--#include virtual="/soundex.php'.
                 '?url=$REQUEST_URI"-->', $matches, $str);
   }
   echo $str;
   exit;
} 
?>

Let's have a look at soundex.php. It must be able to handle both $_SERVER['REQUEST_URI'] and $_GET['url'] since it can be included with either SSI or PHP.

<?php
//inclusion of db-files left out...
 
if (isset($_GET['url']))
   $url = strip_tags(trim($_GET['url']));
else
   $url = strip_tags(trim($_SERVER['REQUEST_URI']));
 
$url = parse_url($url);
$url = basename($url['path']);
 
$sql = "SELECT e.entry_title, e.entry_url,
   CONCAT(DATE_FORMAT(e.entry_date, '%Y'), '/',
   DATE_FORMAT(e.entry_date, '%m')) as entry_archive
   FROM entries e
   WHERE e.entry_status = 'published'
      AND SOUNDEX(e.entry_url) = SOUNDEX('". $url ."')
   ORDER BY e.entry_url ASC";
 
$result = sql($sql) or die(mysql_error());
 
if (mysql_num_rows($result)) {
   echo "<h2>Possible matches</h2>n".
        "<p>The following entries were found ".
        "that have a similar title.</p>\n";
 
   echo "<ul>\n";
 
   while ($row = mysql_fetch_assoc($result)) {
      echo '<li><a href="http://samuelsjoberg.com/archive/'.
           $row['entry_archive'] .'/'. $row['entry_url'] .
           '" title="'. $row['entry_title'] .
           ">http://samuelsjoberg.com/archive/'. 
           $row['entry_archive'] .'/'. $row['entry_url']. 
           "</a></li>\n";
   }
   echo "</ul>\n";
}
?>

As you can see the script is pretty simple. I'm comparing the soundex-strings of the last part of the address (after the rightmost slash) with the URL's stored in the database and echo's a list with the matches found.

I hope you grasped the concept of all this. Now, try it out to see for yourself. What do you think of this, is it useful or are the search to sensitive to typos? I haven't tested it too much yet.

Pages linking to this entry

Pingback is enabled on all archived entries. Read more about pingback in the Pingback 1.0 Specification.

About this post

Created 12th November 2004 20:03 CET. Filed under PHP.

0 Comments
0 Pingbacks