| x | y |
| 3 | 14 |
| 4 | 20 |
| 6 | 27 |
| 8 | 41 |
| 12 | 63 |
| 15 | 73 |
We could write this as a matrix equation
. This equation has no solution. This is
unfortunate, but not surprising, since we are trying to find a single
line that passes through 6 different points. In general, we will write
our matrix equation as Aa = y where A =
, a =
, and y =
and the xi and yi
are the data values. As long as we have more than two data points, we
probably won't be able to find an exact solution. Algebraically, this
corresponds to the fact that A doesn't have an inverse matrix,
since it isn't square. Since we can't find an exact solution, we will
try to find an approximate solution. Geometrically, since there is no
line that passes through all our data points, we will find a line that
comes close to all the points. We measure the error of our line using
the sum of squared error,
.
To minimize SSE, we differentiate with respect to both variables a and
b and set the results equal to 0. This will give us the following two
equations

Collecting the a and b terms together in these equations gives us the system of two equations in two unknowns (a and b)
Recalling our matrix equation was Aa = y, we observe
that this pair of equations can be written in the form
ATAa = ATy, where
AT is the transpose matrix formed by flipping
A about its main diagonal, AT =
. We now solve the equation by
multiplying both sides on the left by
(ATA)-1 to get a =
(ATA)-1ATy. For our initial
example, this gives a = 559/110 and b = -163/165, and you can check that the line
y = (559/110)x - 163/165 does pass very near all the data
points (a graphing calculator can be quite useful here).
This process is called linear regression. Your students will use linear regression to fit lines to data in their science classes, so it is nice to see how this is an application of the matrices we've been studying. But it would be nicer if we had some tie to the algebra of matrices, this being an algebra class and all. Fortunately, there is such a connection. We have avoided non-square matrices, because the field laws don't apply to them. We can't even define addition and multiplication between arbitrary non-smooth matrices. But while the non-square matrix A can't have an inverse, it does have what is called a pseudoinverse. A psuedoinverse is a matrix B so that BAB = B and ABA = A. The pseudoinverse of an n´2 matrix (with n > 2) is B = (ATA)-1AT. So our approximate solution to Aa = y is a = By, where B is the pseudoinverse of A. In other words, we find an approximate solution by multiplying through by an approximate inverse
The same ideas we've used for linear regression apply to other forms than y = ax + b. For example, if we want to fit a quadratic curve y = ax2 + bx + c to a set of n data points, we could write this as a matrix equation for an n´3 matrix in the same fashion we used an n´2 matrix for linear regression. We would use the pseudoinverse to find the values of a, b, and c that minimize SSE in the exact same way. Because of these applications some advanced statistics classes deal with the algebra of pseudoinverses. For this class, we will just leave them as an example of what can be done in more complicated algebraic structures, and as an example of the applications of matrices to a standard topic in secondary mathematics.