|
The World Wide Web (WWW) has matured significantly over the last few years.
Perhaps the most noticeable change has been the improvement in
page design due mainly to the involvement of trained graphic designers.
Another improvement has occurred in the prevalence in the variety of media
types from simple animated gifs, advanced interactive content requiring
client based plugins, to Java and applets. An often ignored improvement
has occurred in the powerful extensions being added to the software
distributing the information to the user, namely the WWW server. This
is a combination of hardware and software normally remote to the designer
of the content, that is, it is controlled by the Internet Service Provider
(ISP). Because of this remoteness the designer is often missing out on
many potentially useful, time saving, and powerful capabilities. Furthermore
they may be accepting inferior service without being aware of the alternatives.
This document will describe many of the features that are now
standard in the better WWW servers. It will be useful for those wishing
to know "what they are missing out on" as well as help when choosing an
ISP, it will give some indication what questions to ask and how to
differentiate between different service providers. The topics will be
discussed in no particular order of importance as this will differ
depending on a particular users requirements.
Examples will be given where possible and although general they
will refer to New Zeleand and Australia as it is in those two countries
that I have admininstrative and configuration control over servers.
The Virtual Domain
|
The first decision when embarking on a WWW presence is the
choice of a domain name. This is normally related in some way to the
content of the pages being designed, for example, the company name.
The importance of this domain name should not be underestimated, it
will often relate directly to your "presence". If your domain name
is abc.co.nz then your URL will be something like
http://abc.co.nz/
and you can create as many email addresses of the form
yourname@abc.co.nz
It is critical that any reference to this domain name is reflected
in the information the client receives. A common unacceptable procedure
is for some service providers to use a
relatively new feature of some browsers called
"Non-IP Virtual Hosting". While this is a nice feature, it is by no means
widespread among WWW browsers. The result is that many users will
fail to access your pages and be redirected to a WWW page not related
to your URL.
|
Server Side Parsing
|
This is a very powerful technique for varying the content of a page.
Instead of the server blindly sending each page to the browser, it
looks at the content of the page and acts on directives included
within the page. The directives are normally transparent because they
are based on the comment capabilities of html
<!-- ... -->
A great time saving feature is to be able to include other files such
as standard headers and footers. If a standard piece of html is to be included
on every page it can be a big job to make a change. Using the file include
facility means you only have one copy of the footer say, when you update
it all pages that include that footer will reflect that updated information.
As an example, the links and copyright shown at the bottom of this page
is included using the line
<!--#include virtual="../tableofcontents.html" -->
Server side includes provide for any of the
environment variables to be displayed,
so for example your IP address is
38.107.179.236,
you are reading this file on
Sunday, 12-Feb-2012 09:21:25 EST,
the browser you are using is
CCBot/1.0 (+http://www.commoncrawl.org/bot.html),
and this document was last modified on
Sunday, 20-Feb-2011 05:46:57 EST.
Many of the later topics discussed below can be implemented by server
side includes. It is possible in the more recent servers to perform
logical operations on the global variables. For example a page could
include different elements dependent on any of the
environment variables available
such as the day of the week, browser being used, the domain the viewer
is from. It is common to provide special links say on a page if it is
accessed from within your organisation.
Server side parsing has been built into servers from the early days
but was only "turned on" for certain files, one convention was to
perform the parsing if the file ended in .shtml instead of .html.
This is a clumpsy alternative and there is little reason why
server side parsing shouldn't be available for all documents.
|
Spelling corrections
|
It has been possible for a while for the server to automatically check
common mistakes made by users when typing URLs and automatically make
corrections. The most common mistake was incorrect capitalisation, this
was easily fixed by the server. In the recent release of servers the
types of corrections has been extended to include a single inserted letter,
single omitted letter, transposed letters, and mistyped characters.
For example, all the below should work!
http://silver.wasp.uwa.edu.au/~pbourke/index.html (correct url)
http://silver.wasp.uwa.edu.au/~pbourke/ndex.html (missing i in index)
http://silver.wasp.uwa.edu.au/~pbourke/idnex.html (swapped n and d)
http://silver.wasp.uwa.edu.au/~pbourke/Index.html (incorrect case)
http://silver.wasp.uwa.edu.au/~pbourke/inddex.html (extra d)
|
CGI Access
|
CGI's are programs that run on the server to extend it's functionality
as opposed to Java which runs on the client. Perhaps the most common CGI
is one which takes the content of a form, massages it, and sends the
result to an email address. Another common CGI are access counters.
The big problem with such programs on the server is restricting them
so they don't adversely affect the server. As a result most service
providers don't allow users to create and install their own custom
CGI's. This isn't always an issue as the service provider supplies
a collection of generally useful and checked CGI scripts in a part of the server
accessible by all, normally a directory called "cgi-bin/".
There are many situations where this is unacceptable, for example,
if the page designer wants their own search engine, shopping trolley,
or customised form handler.
This problem has now been addressed by the most widely used WWW servers
and is generally referred to as Secure CGI. This allows page
designers to install PERL or compiled C programs in their own WWW
page directories in such as way that they can't damage other parts
of the server.
|
Error files
|
A common omission when configuring virtual servers is the behaviour of
error (incorrect URL) conditions. The ISP normally has an error handling
facility, in many cases it is important to ensure the viewers of your
pages aren't redirected to unrelated and potentially competitive information.
It is easy to test whether viewers of your pages get a sensible error
page by using an non existent URL within your domain. For example if
your domain is abc.co.nz, try the URL
http://abc.co.nz/thisdoesntexist.html
A correctly configured WWW server should support customised error handling
for each virtual domain. Indeed there should be no way by which a URL
based upon your domain name should return anything other than a page
related to your domain. The domain belongs to you, not your ISP!
|
Logging
|
A WWW server records every access as well as every incorrect attempt
to access a site. Make sure you have access to these logs generally
referred to as "access logs" and "error logs". While many of the errors
will be a result of users making typing errors, they will also consist
of errors made by the page designer. The error logs are a valuable
diagnostic tool for authors of sites with a large number of pages.
A well organised ISP will not only be able to supply "raw" log files,
they will also supply preferable online software which calculates meaningful
statistics derived from the raw logs. This is particularly important
in situations where the owners of the site are paying on a volume basis.
As an example, the following describes custom log software available online
to customers:
logit.cgi.
There are two other log files most servers can produce, they are refered
to as "referer logs" and "agent logs". The first records where viewers
were before they came to your page, great for determining where your
page is linked from. The second records what browser the client is using
which is valuable if you are concerned about customising for specific
browsers.
Server can also generate what are known as "custom" log files. These are
in whatever format the user wishes and would normally be written automatically
to the users home directory. For example you may wish to analyse the
accesses to your site using a spreadsheet. You may only want the following
statistics:
filename bytesent datetime browsertype useripname
The above can be generated continuously and automatically in your home
directory from where you can ftp it for analysis. All the
environment variables can be recorded in
custom logs as well as information on the data transfer such as the
success, number of bytes, duration.....
|
Language Handling
|
There are any number of WWW servers which can choose from a number of
pages distinguished by language. This choice is not based upon
the country domain the client is viewing from but rather on the
preferred language set within the browsers configuration or preferences.
This first option is obviously undesirable, I speak English and if
I happen to be browsing from Germany I don't want to get pages in German
by default.
The different WWW servers handle this is various ways, the important
feature being that if you supply pages in multiple languages then the
server chooses the appropriate one automatically. Of course
it is up to the designer of the page content if they also want links
on the page to the various language versions.
|
Browser Dependency
|
How many times have you seen links on a WWW page directing
the viewer to choose the pages they view based on the browser they are
using. Putting aside the debate on whether it is acceptable practice
to design pages that work on some browsers and not on others, this
redirection can be done directly by the server. It was accomplished in
the past by routing all pages through a cgi which checked the global
variable "HTTP_USER_AGENT" and forwarded the appropriate page depending
on the result.
Now this can be accomplished transparently by the server, of course, as
with the redirection based on language the designer may still choose
to give the user the choice after the default decision has been made
although this is probably unnecessary.
|
Shifting House
or
Redirection
|
It is not uncommon to change domain name, this can arise in many different
ways. For example if you relocate or mirror your pages in another country.
Almost all WWW servers can redirect requests to a new URL, for example,
if you changed your domain name from abc.co.nz to abc.com, you wouldn't
want all attempts to link to http://abc.co.nz/
to suddenly fail (they may exist as links in directory services). Servers
can automatically handle the remapping required, indeed they can map any
URL to any other URL.
|
Mirrors
|
It is relatively common to operate a number of similar if not identical
servers each with different domains, eg: abc.co.nz and abc.com. This
is done for a variety of reasons, access is often faster within a country,
access is often cheaper within a country. In these circumstances it is
common to let the user choose which server they want to access but listing
the options on the first page they encounter.
As with multiple languages, most servers can perform this remapping based
on the country of origin of the browser thus relieving the user from
making the choice.
|
Private Parts
|
Often it is desirable to have part of your WWW page accessible to only
one person such as an administrator or to a group as in an intranet
environment. This possibility has been built into servers almost from
the beginning in the form of a user name and password being requested
before some pages on the server can be viewed.
Normally the protection is assigned to the contents of
a directory but it can be also assigned to a single file. In addition,
creating such protected directories need not require intervention by
the administrators of the server, individual page maintainers can
create protected directories and add/delete/update the attributes
of the account holders.
As an example try accessing this directory.
|
The Fantasy of the Secure Server
|
Having a secure server relates to how information between the client and
the server is encoded or more correctly, encypted.
It is normally discussed with regard to how information entered through
forms such as credit card details are transmitted. While there are
many situations where such encryption is desirable and even necessary,
the threat of data interception is greatly exaggerated. Further, the
most likely source of data insecurity resides with employees of the
ISP, encrypting the information on the way to the server isn't
bypassing this greatest source of risk.
The encryption on secure servers come is various flavours reflected
in the length of the keys used. If you decide you do need a secure
server make sure it uses 128 bit keys and further ensure your customers
are going to use browsers that support those keys, browsers exported
from the USA may not do so.
|
Damn www.
|
Make sure you site works with or without the leading "www.", it is totally
superfluous. If you domain is abc.co.nz, then both the following should
work identically
http://abc.co.nz
and
http://www.abc.co.nz
|
|