# Least Squares

Written by Paul Bourke

June 1997

One often has a number of data points and the need to describe the
behaviour of the
data with a function of a known form. In general the data points
are noisy samples and the functions parameters can't be chosen so as
to exactly describe all the samples. The desire then is to find the
parameters of the function that "best" describe the samples.

The method of least squared error is one common choice of the "best" estimate.
The squared error is the sum of the squared differences between each sample
and the expected value as predicted by the function.
The parameter values that minimise this error are considered the best estimates.

The first example will illustrate the key points behind the technique
by applying it to a problem that can readily be worked out manually.

### Example 1

Consider finding the equation of a straight line given n samples
(x_{i},y_{i}). We will fit the function f(x) = ax + c,
the normal definition of a linear function where a is the slope, and
c is where the line intercepts the y axis.
The key features and conventions are illustrated below.

The general expression for the least square error E is defined as

In this particular case it is

This function will have a minimum where the first derivatives with
respect to a and c are 0.

dE/da = 0 =
(ax_{0} + c - y_{0})x_{0} +
(ax_{1} + c - y_{1})x_{1} +
... +
(ax_{n-1} + c - y_{n-1})x_{n-1}
DE/dc = 0 =
(ax_{0} + c - y_{0}) +
(ax_{1} + c - y_{1}) +
... +
(ax_{n-1} + c - y_{n-1})

These can be rewritten as

a (x_{0}x_{0} + x_{1}x_{1} +
... + x_{n-1}x_{n-1}) +
c (x_{0} + x_{1} + ... + x_{n-1}) =
x_{0}y_{0} + x_{1}y_{1} +
... + x_{n-1}y_{n-1}
a (x_{0} + x_{1} +
... + x_{n-1}) +
c (1 + 1 + ... + 1) =
y_{0} + y_{1} +
... + y_{n-1}

From the above it might be observed that if we define the vectors
**x** = (x_{0}, x_{1}, x_{2}, ... x_{n-1})
**y** = (y_{0}, y_{1}, y_{2}, ... y_{n-1})

**u** = (1,1,1,... 1)

then we can write the problem as two simultaneous equations
with two unknowns, a and c as follows

a **x**.**x** + c **x**.**u** = **x**.**y**
a **u**.**x** + c **u**.**u** = **u**.**y**

Where "." refers to the vector dot product.
Solving the above two equations for a and c is straightforward. In general
all least squares estimates irrespective of the type of function or
how many parameters can be formulated into an set of simultaneous
equations. There are the same number of equations as there are unknowns
or parameters to be fitted.

### Example 2

To fit a plane f(x,y) = ax + by + c given samples
**x** = (x_{0}, x_{1}, x_{2}, ... x_{n-1})
**y** = (y_{0}, y_{1}, y_{2}, ... y_{n-1})

**z** = (z_{0}, z_{1}, z_{2}, ... z_{n-1})

and applying the same derivation as in the case of the line we get
3 simultaneous equations and 3 unknowns

a **x**.**x** + b **x**.**y** + c **x**.**u** =
**x**.**z**
a **y**.**x** + b **y**.**y** + c **y**.**u** =
**y**.**z**

a **u**.**x** + b **u**.**y** + c **u**.**u** =
**u**.**z**

These and larger sets of simultaneous equations can be readily solved
using standard Gaussian elimination,
see here for more details and source code.