Linear Least-Squares Data Fitting with Orthogonal Polynomials
G.J. Fee
Centre for Experimental and Constructive Mathematics
Department of Mathematics
Simon Fraser University, Burnaby, B.C., Canada, V5A 1S6
gjfee@cecm.sfu.ca
1 Abstract
We consider the problem of fitting data to a polynomial. We review Maple's least-squares methods, which include the normal equations method (NE), the orthogonal factorization method (QR), and the singular value decomposition method (SVD), which are available in the following Maple packages: stats , linalg and LinearAlgebra . We implement three orthogonal polynomial methods in this paper and compare them with the other methods.
The inputs to our weighted linear least-squares problem are as follows:
a positive integer which is the number of data points,
a real vector with
components
,
, ... ,
which define the
values of the independent variable called
,
a real vector with
components
,
, ... ,
which define the
corresponding values of the dependent variable called
,
a real vector with
positive components
,
, ... ,
which contain the
corresponding estimated standard deviations of each of the
dependent variables. It is more common to be given a vector
of
weights, but we will calculate the vector of weights by the formula
for
from
to
. Notice that we have assumed that the independent vector values
,
, ... ,
are exact, so we don't need a vector containing the estimated standard deviations in the values of the independent variable. In the general case of a weighted linear least-squares problem, we require some basis functions so we can express the output as a linear combination of the given basis functions. As we are concerned with polynomial fitting, we just require
the maximum degree of the polynomial fit. We assume
.
Given any real-valued function
that is defined at each of our data points
for
from 1 to
. We define the error vector
by
for
from 1 to
. If
also depends on some parameters then the weighted least-squares algorithm chooses the parameters such that the weighted sum of squares of the errors
is minimized. In general, this leads to a system of non-linear equations to determine the parameters. If the parameters enter linearly, such as the undetermined coefficients in a linear combination of basis functions, then the weighted least-squares approach leads to a linear system of equations to find the parameters. If we measure each error
in terms of the number of standard deviations then we just divide
by
, and the square of this leads to the formula for the weights given above. We think it would be easier for an experimentalist to estimate the standard deviation in a measurement than to estimate the weight that should be used in a weighted least-squares problem, so our input includes the vector of estimated standard deviations, rather than the vector of weights. In the case of given weights, the the solution of the
weight equation for the
standard deviation is
.
Example 1:
There are 201 equally spaced values on the interval
for the vector of independent variable values, that is
for
from 1 to 201. All estimated standard deviations are 1, so all weights are also 1 and we have a conventional unweighted least-squares problem. There are 201 simulated measurements, that is
for
from 1 to 201, where
is the value of a random variable selected from a standard normal distribution with mean 0 and standard deviation 1. The chosen function is
. The problem is to fit a degree 40 polynomial to this data, where the fit is in the least-squares sense. We use the standard monomial basis functions so that
for
from 1 to 41.
This leads to a design matrix
with 201 rows and 41 columns. The symmetric matrix
has an infinity norm condition number equal to 0.9e61 =
. Using standard double precision arithmetic, all the above methods produce unacceptable results.
We can always increase the precision to overcome the ill-conditioning, but a better approach is to use an orthogonal polynomial basis instead of using the standard monomial basis functions. Our first method uses shifted Legendre polynomials as basis functions. Our second method uses discrete Chebyshev polynomials. Both of these orthogonal polynomials are defined in the book Handbook of Mathematical Functions , by M. Abramowitz and I. Stegun.
The discrete Chebyshev polynomials have been generalized to the case of arbitrary positive weights and arbitrary independent variable values. Our third method is a Maple implementation of the algorithm in the book Chebyshev Polynomials in Numerical Analysis by L. Fox and I.B. Parker. The orthogonal polynomial methods produce acceptable results, using double precision arithmetic.
>
>
2 The Standard solution methods
There are three standard solution methods: the normal equations (NE) method, the QR decomposition method, and the singular value decomposition (SVD) method. They are listed in the order of numerical quality of the result. The normal equations method squares the condition number of the problem, so suffers the most roundoff error, but it is the fastest method. The SVD method gives the best numerical results, but it is the slowest. The QR decomposition method is between the other two in both numerical quality and execution speed. The above three methods can handle any basis functions, not just polynomials.
We review the normal equations method. Choose
linearly independent basis functions
,
, ...,
. For the case of a polynomial we could choose the standard monomial basis functions
. Create the function
as a linear combination of the basis functions, where
,
, ...,
are the
parameters. Evaluate the function
at each of the
data points to compute the error vector
. Scale each error component by dividing it by its corresponding estimated standard deviation. Then we form the sum of squares of the scaled errors. A necessary condition for a minimum of this sum is that all the first partial derivatives with respect to the parameters must be set equal to zero. This generates the following symmetric linear system of equations:
where
is the vector of dependent variable observations,
is a diagonal matrix which contains the reciprocals of the corresponding estimated standard deviations along the diagonal,
is the
by
matrix such that
,
denotes the transpose of the matrix
,
since
is a diagonal matrix, and
is the vector of unknown parameters. These equations are called the normal equations. They can be solved by gaussian elimination, by performing an
decomposition of the coefficient matrix, or since the coefficient matrix is positive definite, we can solve the symmetric linear system, by doing a Cholesky decomposition of the coefficient matrix, that is express it as
where
is a lower triangular matrix, and then solve for the vector
.
The QR algorithm first forms the matrix
and the scaled vector
. We see now that we have a conventional unweighted least-squares problem with coefficient matrix
and right hand side vector
. The QR algorithm first expresses
where
has orthonormal columns and
is upper triangular. This can be obtained by the Gram-Schmidt algorithm, or better by the modified Gram-Schmidt algorithm or best by Householder reflectors. Then we form the vector
and solve the upper triangler linear system with coefficient matrix
and the newly created vector on the right hand side, for the unknown parameters.
The singular value decomposition first expresses the above matrix
as
where
is an orthogonal matrix or a rectangular matrix with orthonormal columns,
is a diagonal rectangular matrix with the singular values of
along the diagonal in descending order, and
is an orthogonal matrix. The singular value decomposition can be computed by using Householder reflectors to reduce
to bidiagonal form, then a modified eigenvalue QR algorithm can be used to reduce the size of the off diagonal elements so that they are all less than the given off diagonal tolerance. The solution vector is
where
is obtained from
by replacing the large diagonal elements by their reciprocals, where large means any number that is greater than the given singular value tolerance.
All the above methods can be implemented so that they have a computational complexity of
floating point operations and a memory complexity of
floating point numbers.
>
3 Maple's least-squares procedures
Maple [1] contains severval least-squares procedures, such as,
stats[fit,leastsquare],
linalg[leastsqrs],
LinearAlgebra[LeastSquares],
CurveFitting[LeastSquares].
Maple also has the QR and SVD algorithms in
evalf(Svd()),
linalg[QRdecomp],
LinearAlgebra[QRDecomposition] and
LinearAlgebra[SingularValues]
The normal equations method is used in both the stats and linalg packages. The QR algorithm is the default algorithm used in the LinearAlgebra package. The CurveFitting package just calls the LinearAlgebra package, but it does allow for weights.
We ran severval least-squares procedures at a few settings of
Digits.
The time column is the CPU time measured in seconds with Maple's
time()
procedure. The
error
column is the 2-norm of the difference between the computed solution at the given setting of
Digits
and the computed solution at
Digits:=128;
. The
cond(A,2)
column is the 2-norm condition number of the design matrix
A
, which is the largest singular value of
A
divided by the smallest singular value of
A
. The
column is the 2-norm condition number for the matrix product of the transpose of
A
times
A
. The condition number columns would be the same for all algorithms, since the same design matrix
A
was common input to all of the procedures. All tests were run on a DEC ALPHA computer, using a 64-bit version of Maple 7. Here are some timings for Example 1.
>
#stat[fit,leastsquare]:
Mat1 := Matrix([[''Digits'',time,`error`],
[16, 9.672, 4.32332e+4],
[32, 19.980, 3.01829e+4],
[48, 38.445, 1.06767e+4],
[64, 62.935, 3.52278e-3],
[80, 100.505, 6.72389e-19]]):
Mat1;
>
#linalg[leastsqrs]:
Mat2 := Matrix([[''Digits'',time,`error`,`cond(A,2)`,`cond(A^t.A,2)`],
[16, 6.348, 4.19421e+4, 2.24853e18, 1.12199e18],
[32, 8.263, 2.98281e+4, 1.76026e30, 1.61285e34],
[48, 9.830, 1.43902e+4, 1.76022e30, 5.24675e49],
[64, 10.435, 2.28023e-2, 1.76022e30, 3.09846e60],
[80, 10.562, 2.18639e-19, 1.76022e30, 3.09837e60]]):
Mat2;
>
#linalg[QRdecomp]:
Mat3 := Matrix([[''Digits'',time,`error`],
[16, 167.526, 5.21543e+4],
[32, 257.615, 1.68712e+0],
[48, 363.538, 2.41451e-16],
[64, 644.346, 1.59864e-32],
[80, 542.131, 1.42191e-48]]):
Mat3;
>
#evalf(Svd()):
Mat4 := Matrix([[''Digits'',time,`error`],
[16, 172.816, 3.30814e+4],
[32, 267.279, 2.29236e+1],
[48, 352.360, 1.62756e-15],
[64, 583.362, 1.04760e-31],
[80, 489.131, 1.14070e-47]]):
Mat4;
>
#LinearAlgebra[normal equations]:
Mat5 := Matrix([[''Digits'',time,`error`],
[16, 1.950, 4.87161e+4],
[32, 2.363, 2.95537e+4],
[48, 3.134, 1.34027e+4],
[64, 3.761, 1.00734e-2],
[80, 5.328, 3.90552e-19]]):
Mat5;
>
#LinearAlgebra[LeastSquares]:
Mat6 := Matrix([[''Digits'',time,`error`],
[16, 2.144, 3.84802e+4],
[32, 2.803, 1.57575e+1],
[48, 3.735, 1.47829e-15],
[64, 6.467, 2.12608e-31],
[80, 7.977, 1.26784e-47]]):
Mat6;
>
#LinearAlgebra[SingularValues]:
Mat7 := Matrix([[''Digits'',time,`error`],
[16, 28.193, 3.30826e+4],
[32, 44.920, 9.45847e+0],
[48, 132.777, 1.78400e-15],
[64, 236.261, 1.02603e-31],
[80, 221.756, 5.59193e-48]]):
Mat7;
>
>
4 Least-Squares using Legendre Polynomials
The ill-conditioning in the natural-basis least-squares methods can be cured by using an orthogonal polynomial basis. The Legendre polynomials form an orthogonal system of polynomials with respect to the inner product defined by
where
and
are Lebesque measurable real valued functions defined on the real interval [-1,1]. The Legendre polynomials [2] can be defined according to the following recurrence relation:
,
,
.
We say functions
and
are orthogonal if the inner product <
,
>=0. The Legendre polynomials form an orthogonal system of polynomials, that is
for
<
where
and
are distinct non-negative integers. The Legendre polynomials are not orthonormal. The normalization condition is
They also satisfy the equation
for all non-negative integers
.
The linear transformation
of the argument to a Legendre polynomial translates the interval of orthogonality from [-1,1] to
. For Example the shifted Legendre polynomials
are orthogonal on the interval [0,1]. In general the set of translated Legendre polynomials
form an orthogonal system on the interval
.
We now use the normal equations approach to solve Example 1, by using the shifted Legendre polynomials as our basis functions instead of the standard monomials. We use the above recurrence relation to compute the values of the Legendre polynomials at any given argument. The hf under the Digits column denotes the use of hardware floating point arithmetic using Maple's evalhf() procedure. The results are:
>
#normal equations using shifted Legendre polynomials:
Mat8 := Matrix([[''Digits'',time,`error`,`cond(A,2)`,`cond(A^t.A,2)`],
[hf, .062, 7.81493e-11, 1.18022e1, 1.39292e2],
[16, 2.632, 1.37536e-10, 1.18022e1, 1.39292e2],
[32, 3.287, 2.11346e-26, 1.18022e1, 1.39292e2]
]):
Mat8;
>
>
5 Least-Squares using Chebyshev Polynomials
We now use the Chebyshev polynomials of the first kind instead of the Legendre polynomials. The Chebyshev polynomials form an orthogonal system of polynomials with respect to the weighted inner product defined by
where
is the weight function
. The Chebyshev polynomials [2] can be defined according to the following recurrence relation:
,
,
.
The normalization conditions are <
> =
and <
>=
, for positive integer
. They also satisfy the equation
for all non-negative integers
.
We now use the normal equations approach to solve Example 1, by using the shifted Chebyshev polynomials as our basis functions instead of the standard monomials. This algorithm can be found in [3] in chapter 12 section 2. The results are:
>
#normal equations using shifted Chebyshev polynomials:
Mat9 := Matrix([[''Digits'',time,`error`,`cond(A,2)`,`cond(A^t.A,2)`],
[hf, .045, 7.91818e-11, 6.33780e0, 4.01677e1],
[16, 2.170, 1.25604e-10, 6.33780e0, 4.01677e1],
[32, 2.813, 1.21656e-26, 6.33780e0, 4.01677e1]
]):
Mat9;
>
>
6 The Discrete Chebyshev polynomial solution
We have seen that the classical orthogonal polynomials can overcome the ill-conditioning in the least-squares problem, but an even better approach is to use discrete orthogonal polynomials [4] instead of using the standard monomial or classical orthogonal polynomial basis functions. If we have a standard least-squares problem with all weights unity and
for
from 1 to
, then the appropriate orthogonal polynomials are the discrete Chebyshev [2] polynomials. They have been generalized [5] to the case of arbitrary positive weights and arbitrary independent variable values. The solution to the weighted least-squares problem is
where
in which
is the degree
, monic generalized discrete Chebyshev polynomial. We define a discrete weighted inner product <
> by
<
> =
then the
form an orthogonal system of polynomials with respect to the above inner product. This means that <
> = 0 for
<
. The coefficients in the orthogonal polynomial expansion are given by
= <
> / <
>
in which
is the vector of dependent variable values and we extend the definition of the inner product so that
<
> =
A practical method to compute the values of the monic generalized discrete Chebyshev polynomials is to use the following [5] three term recurrence relation.
,
,
,
in which the recurrence relation coefficients are given by
= <
> / <
>
= <
> / <
>
One of the advantages of the discrete orthogonal polynomial approach is that we can find a lower degree least-squares polynomial, by dropping the higher degree terms, whereas the other methods using the standard monomial basis functions or the classical orthogonal polynomials require a recomputation. The other methods have a computational time complexity of
floating point operations, whereas the discrete orthogonal polynomial approach has a time complexity of
to compute the degree
least-squares polynomial that fits
data points. If we want to compute all the polynomial fits up to degree
, then the standard approach would have a time complexity of
, whereas the time complexity of the discrete orthogonal polynomial approach would remain at
. The memory space complexity of the standard methods is
floating point numbers, but the discrete orthogonal polynomial method requires just
numbers.
The discrete orthogonal polynomial method finds all the least-squares fiited polynomials up to degree 40 for the data given in Example 1, using double precision arithmetic and produces acceptable results.
>
#monic discrete Chebyshev polynomials:
Mat10 := Matrix([[''Digits'',time,`error`],
[hf, .075, 1.00096e-09],
[16, 1.174, 6.48027e-09],
[32, 1.344, 4.90378e-25]
]):
Mat10;
If we use orthonormal discrete Chebyshev polynomials instead of monic discrete Chebyshev polynomials to solve the least-squares problem, then the CPU time increases slightly as show in the following table.
>
#orthonormal discrete Chebyshev polynomials:
Mat11 := Matrix([[''Digits'',time,`error`],
[hf, .089, 7.26114e-10],
[16, 1.610, 2.72974e-09],
[32, 1.859, 2.21080e-25]
]):
Mat11;
The reason we have introduced orthonormal polynomials, was to avoid exponent overflow problems, whch occur in Example 2, while using monic discrete Chebyshev polynomials
Example 2:
There are 10001 equally spaced values on the interval
for the vector of independent variable values, that is
for
from 1 to 10001. All estimated standard deviations are 1.0e-6. There are 10001 simulated measurements, that is
for
from 1 to 10001, where
is the value of a random variable selected from a standard normal distribution with mean 0 and standard deviation 1.0e-6. The chosen function is
, where
is Airy's
function, which is denoted by
AiryAi(x)
in Maple. The problem is to fit a degree 429 polynomial to this data, where the fit is in the least-squares sense.
The error column is the 2-norm of the difference between the computed least-squares polynomial and the exact solution
. The results are:
>
#orthonormal polynomials for Example 2:
Mat12 := Matrix([[''Digits'',time,`error`],
[hf, 48.418, 7.48767e-07]
]):
Mat12;
>
>
References
[1] M.B. Monagan, K.O. Geddes, K.M. Heal, G. Labahn, S.M. Vorkoetter and J. MacCarron, Maple 6 Programming Guide , Waterloo Maple Inc., Waterloo, Canada (2000).
[2] M. Abramowitz and I. Stegun, Handbook of Mathematical Functions , Dover Publications, New York (1972).
[3] Ward Cheney and David Kincaid, Numerical Mathematics and Computing , Brooks/Cole, Monterey, California (1980).
[4] D.G. Kleinbaum, L.L. Kupper and K.E. Muller, Applied Regression Analysis and Other Multivariable Methods , PWS-KENT Publishing, Second edition (1988)
[5] L. Fox and I.B. Parker, Chebyshev Polynomials in Numerical Analysis , Oxford University Press (1968).
>