Features of WWW Servers

or

What to expect from your service provider

Written by Paul Bourke
September 1997


The World Wide Web (WWW) has matured significantly over the last few years. Perhaps the most noticeable change has been the improvement in page design due mainly to the involvement of trained graphic designers. Another improvement has occurred in the prevalence in the variety of media types from simple animated gifs, advanced interactive content requiring client based plugins, to Java and applets. An often ignored improvement has occurred in the powerful extensions being added to the software distributing the information to the user, namely the WWW server. This is a combination of hardware and software normally remote to the designer of the content, that is, it is controlled by the Internet Service Provider (ISP). Because of this remoteness the designer is often missing out on many potentially useful, time saving, and powerful capabilities. Furthermore they may be accepting inferior service without being aware of the alternatives.

This document will describe many of the features that are now standard in the better WWW servers. It will be useful for those wishing to know "what they are missing out on" as well as help when choosing an ISP, it will give some indication what questions to ask and how to differentiate between different service providers. The topics will be discussed in no particular order of importance as this will differ depending on a particular users requirements.

Examples will be given where possible and although general they will refer to New Zeleand and Australia as it is in those two countries that I have admininstrative and configuration control over servers.

The Virtual Domain

The first decision when embarking on a WWW presence is the choice of a domain name. This is normally related in some way to the content of the pages being designed, for example, the company name. The importance of this domain name should not be underestimated, it will often relate directly to your "presence". If your domain name is abc.co.nz then your URL will be something like
http://abc.co.nz/
and you can create as many email addresses of the form
yourname@abc.co.nz

It is critical that any reference to this domain name is reflected in the information the client receives. A common unacceptable procedure is for some service providers to use a relatively new feature of some browsers called "Non-IP Virtual Hosting". While this is a nice feature, it is by no means widespread among WWW browsers. The result is that many users will fail to access your pages and be redirected to a WWW page not related to your URL.

Server Side Parsing

This is a very powerful technique for varying the content of a page. Instead of the server blindly sending each page to the browser, it looks at the content of the page and acts on directives included within the page. The directives are normally transparent because they are based on the comment capabilities of html <!-- ... -->

A great time saving feature is to be able to include other files such as standard headers and footers. If a standard piece of html is to be included on every page it can be a big job to make a change. Using the file include facility means you only have one copy of the footer say, when you update it all pages that include that footer will reflect that updated information. As an example, the links and copyright shown at the bottom of this page is included using the line <!--#include virtual="../tableofcontents.html" -->

Server side includes provide for any of the environment variables to be displayed, so for example your IP address is 54.205.153.63, you are reading this file on Thursday, 19-Jan-2017 14:28:19 EST, the browser you are using is CCBot/2.0 (http://commoncrawl.org/faq/), and this document was last modified on Sunday, 19-Jun-2016 04:33:41 EDT.

Many of the later topics discussed below can be implemented by server side includes. It is possible in the more recent servers to perform logical operations on the global variables. For example a page could include different elements dependent on any of the environment variables available such as the day of the week, browser being used, the domain the viewer is from. It is common to provide special links say on a page if it is accessed from within your organisation.

Server side parsing has been built into servers from the early days but was only "turned on" for certain files, one convention was to perform the parsing if the file ended in .shtml instead of .html. This is a clumpsy alternative and there is little reason why server side parsing shouldn't be available for all documents.

Spelling corrections

It has been possible for a while for the server to automatically check common mistakes made by users when typing URLs and automatically make corrections. The most common mistake was incorrect capitalisation, this was easily fixed by the server. In the recent release of servers the types of corrections has been extended to include a single inserted letter, single omitted letter, transposed letters, and mistyped characters. For example, all the below should work!
http://silver.wasp.uwa.edu.au/~pbourke/index.html (correct url)
http://silver.wasp.uwa.edu.au/~pbourke/ndex.html (missing i in index)
http://silver.wasp.uwa.edu.au/~pbourke/idnex.html (swapped n and d)
http://silver.wasp.uwa.edu.au/~pbourke/Index.html (incorrect case)
http://silver.wasp.uwa.edu.au/~pbourke/inddex.html (extra d)

CGI Access

CGI's are programs that run on the server to extend it's functionality as opposed to Java which runs on the client. Perhaps the most common CGI is one which takes the content of a form, massages it, and sends the result to an email address. Another common CGI are access counters. The big problem with such programs on the server is restricting them so they don't adversely affect the server. As a result most service providers don't allow users to create and install their own custom CGI's. This isn't always an issue as the service provider supplies a collection of generally useful and checked CGI scripts in a part of the server accessible by all, normally a directory called "cgi-bin/".

There are many situations where this is unacceptable, for example, if the page designer wants their own search engine, shopping trolley, or customised form handler.

This problem has now been addressed by the most widely used WWW servers and is generally referred to as Secure CGI. This allows page designers to install PERL or compiled C programs in their own WWW page directories in such as way that they can't damage other parts of the server.

Error files

A common omission when configuring virtual servers is the behaviour of error (incorrect URL) conditions. The ISP normally has an error handling facility, in many cases it is important to ensure the viewers of your pages aren't redirected to unrelated and potentially competitive information. It is easy to test whether viewers of your pages get a sensible error page by using an non existent URL within your domain. For example if your domain is abc.co.nz, try the URL http://abc.co.nz/thisdoesntexist.html

A correctly configured WWW server should support customised error handling for each virtual domain. Indeed there should be no way by which a URL based upon your domain name should return anything other than a page related to your domain. The domain belongs to you, not your ISP!

Logging

A WWW server records every access as well as every incorrect attempt to access a site. Make sure you have access to these logs generally referred to as "access logs" and "error logs". While many of the errors will be a result of users making typing errors, they will also consist of errors made by the page designer. The error logs are a valuable diagnostic tool for authors of sites with a large number of pages.

A well organised ISP will not only be able to supply "raw" log files, they will also supply preferable online software which calculates meaningful statistics derived from the raw logs. This is particularly important in situations where the owners of the site are paying on a volume basis. As an example, the following describes custom log software available online to customers: logit.cgi.

There are two other log files most servers can produce, they are refered to as "referer logs" and "agent logs". The first records where viewers were before they came to your page, great for determining where your page is linked from. The second records what browser the client is using which is valuable if you are concerned about customising for specific browsers.

Server can also generate what are known as "custom" log files. These are in whatever format the user wishes and would normally be written automatically to the users home directory. For example you may wish to analyse the accesses to your site using a spreadsheet. You may only want the following statistics:

   filename bytesent datetime browsertype useripname
The above can be generated continuously and automatically in your home directory from where you can ftp it for analysis. All the environment variables can be recorded in custom logs as well as information on the data transfer such as the success, number of bytes, duration.....

Language Handling

There are any number of WWW servers which can choose from a number of pages distinguished by language. This choice is not based upon the country domain the client is viewing from but rather on the preferred language set within the browsers configuration or preferences. This first option is obviously undesirable, I speak English and if I happen to be browsing from Germany I don't want to get pages in German by default.

The different WWW servers handle this is various ways, the important feature being that if you supply pages in multiple languages then the server chooses the appropriate one automatically. Of course it is up to the designer of the page content if they also want links on the page to the various language versions.

Browser Dependency

How many times have you seen links on a WWW page directing the viewer to choose the pages they view based on the browser they are using. Putting aside the debate on whether it is acceptable practice to design pages that work on some browsers and not on others, this redirection can be done directly by the server. It was accomplished in the past by routing all pages through a cgi which checked the global variable "HTTP_USER_AGENT" and forwarded the appropriate page depending on the result.

Now this can be accomplished transparently by the server, of course, as with the redirection based on language the designer may still choose to give the user the choice after the default decision has been made although this is probably unnecessary.

Shifting House

or

Redirection

It is not uncommon to change domain name, this can arise in many different ways. For example if you relocate or mirror your pages in another country.

Almost all WWW servers can redirect requests to a new URL, for example, if you changed your domain name from abc.co.nz to abc.com, you wouldn't want all attempts to link to http://abc.co.nz/ to suddenly fail (they may exist as links in directory services). Servers can automatically handle the remapping required, indeed they can map any URL to any other URL.

Mirrors

It is relatively common to operate a number of similar if not identical servers each with different domains, eg: abc.co.nz and abc.com. This is done for a variety of reasons, access is often faster within a country, access is often cheaper within a country. In these circumstances it is common to let the user choose which server they want to access but listing the options on the first page they encounter.

As with multiple languages, most servers can perform this remapping based on the country of origin of the browser thus relieving the user from making the choice.

Private Parts

Often it is desirable to have part of your WWW page accessible to only one person such as an administrator or to a group as in an intranet environment. This possibility has been built into servers almost from the beginning in the form of a user name and password being requested before some pages on the server can be viewed. Normally the protection is assigned to the contents of a directory but it can be also assigned to a single file. In addition, creating such protected directories need not require intervention by the administrators of the server, individual page maintainers can create protected directories and add/delete/update the attributes of the account holders.

As an example try accessing this directory.

The Fantasy of the Secure Server

Having a secure server relates to how information between the client and the server is encoded or more correctly, encypted. It is normally discussed with regard to how information entered through forms such as credit card details are transmitted. While there are many situations where such encryption is desirable and even necessary, the threat of data interception is greatly exaggerated. Further, the most likely source of data insecurity resides with employees of the ISP, encrypting the information on the way to the server isn't bypassing this greatest source of risk.

The encryption on secure servers come is various flavours reflected in the length of the keys used. If you decide you do need a secure server make sure it uses 128 bit keys and further ensure your customers are going to use browsers that support those keys, browsers exported from the USA may not do so.

Damn www.

Make sure you site works with or without the leading "www.", it is totally superfluous. If you domain is abc.co.nz, then both the following should work identically
http://abc.co.nz
and
http://www.abc.co.nz