Protecting yourself from "offline browsers"

Using the Apache Rewrite Engine

Written by Paul Bourke
April 2001


There are a number of software packages, both freeware and commercial, that will automatically copy all the pages and associated files from a remote WWW server. These are often called "offline browsers", the user who finds a site or group of pages they like makes a local copy for later exploration. There are some legitimate reasons why some people want to do this, as well as some not so legitimate ones. The most commonly quoted reason is perhaps related to internet connectivity limitations, instead of staying online and browsing the site in "human time", the computer can quickly copy the site or group of pages which can then be browsed at the users leasure without being online.

There are some cases however where the content creator does not want this to happen. Some reasons are given below, they mostly apply to large sites and most are based upon the most likely fact that only a small percentage of the downloaded files will ever be looked at.

The following, which relies on the Apache WWW server, is a straightforward and reliable way of stopping commonly used offline browsers from copying a site. The following, after being modified for your local site, should be placed in the .htaccess file at the directory it is intended to protect.

RewriteEngine on
# (testing purposes) RewriteCond %{HTTP_USER_AGENT}  ^Mozilla*       [OR]
RewriteCond %{HTTP_USER_AGENT}  ^FAST\-WebCrawler*       [OR]
RewriteCond %{HTTP_USER_AGENT}  ^ia_archiver*            [OR]
RewriteCond %{HTTP_USER_AGENT}  ^Dart*                   [OR]
RewriteCond %{HTTP_USER_AGENT}  ^Pockey*                 [OR]
RewriteCond %{HTTP_USER_AGENT}  ^NetMechanic*            [OR]
RewriteCond %{HTTP_USER_AGENT}  ^SuperBot*               [OR]
RewriteCond %{HTTP_USER_AGENT}  ^QRVA*                   [OR]
RewriteCond %{HTTP_USER_AGENT}  ^WebMiner*               [OR]
RewriteCond %{HTTP_USER_AGENT}  ^WebCopier*              [OR]
RewriteCond %{HTTP_USER_AGENT}  ^WebDownloader*          [OR]
RewriteCond %{HTTP_USER_AGENT}  ^Web\ Downloader*        [OR]
RewriteCond %{HTTP_USER_AGENT}  ^WebMirror*              [OR]
RewriteCond %{HTTP_USER_AGENT}  ^Offline*                [OR]
RewriteCond %{HTTP_USER_AGENT}  ^WebZIP*                 [OR]
RewriteCond %{HTTP_USER_AGENT}  ^WebReaper*              [OR]
RewriteCond %{HTTP_USER_AGENT}  ^Anarchie*               [OR]
RewriteCond %{HTTP_USER_AGENT}  ^Mass\ Down*             [OR]
RewriteCond %{HTTP_USER_AGENT}  ^Slurp*                  [OR]
RewriteCond %{HTTP_USER_AGENT}  ^BlackWidow*             [OR]
RewriteCond %{HTTP_USER_AGENT}  ^WebStripper*            [OR]
RewriteCond %{HTTP_USER_AGENT}  ^Wget*                   [OR]
RewriteCond %{HTTP_USER_AGENT}  ^WebHook*                [OR]
RewriteCond %{HTTP_USER_AGENT}  ^Scooter*                [OR]
RewriteCond %{HTTP_USER_AGENT}  ^Teleport*
RewriteRule ^.*$ /pbourke/errors/robots.html     [L]

Note

And finally, this document is not intended to imply that these offline browsers are inherently undesirable. There are however circumstances where their behavior is undesirable.