Polynomial Curve Fitting 多项式曲线拟合

We begin by introducing a simple regression problem,



Suppose we observe a real-valued input variable x and we wish to use this observation to predict the value of a real-valued target variable t.

设想我们观察一个实数变量作为输入变量 x,我们希望用这个x去预测目标变量t的值。

The data for this example is generated from the function sin(2πx) with random    noise included in the target values。


Now suppose that we are given a training set comprising N observations of x, written x ≡ (x1,...,xN)T, together with corresponding observations of the values of t, denoted t≡(t1,...,tN)T.

现在设想给定了一个训练集,这个训练集由N个观察值x组成,记为 x=(x1,x2...,xN)T,对应的,有一组观察值t,记为:t=(t1,...,tN)T.

Figure 1.2 shows a plot of a training set comprising N = 10data points. The input data set x in Figure 1.2 was generated by choosing values of xn, for n =1,...,N, spaced uniformly in range [0,1], and the target data set t was obtained by first computing the corresponding values of the function sin(2πx) and then adding a small level of random noise having a Gaussian distribution (the Gaussian distribution is discussed in Section 1.2.4) to each such point in order to obtain the corresponding value tn.


Our goal is to exploit this training set in order to make predictions of the value t of the target variable for some new value x of the input variable. As we shall see later, this involves implicitly trying to discover the underlying function sin(2πx). This is intrinsically a difficult problem as we have to generalize from a finite data set. Furthermore the observed data are corrupted with noise, and so for a given x there is uncertainty as to the appropriate value for t.


For the moment, however, we shall proceed rather informally and consider a simple approach based on curve fitting. In particular, we shall fit the data using a polynomial function of the form


where M is the order of the polynomial, and xj denotes x raised to the power of j. The polynomial coefficients w0,...,wM are collectively denoted by the vector w. Note that, although the polynomial function y(x,w) is a nonlinear function of x, it is a linear function of the coefficients w. Functions, such as the polynomial, which are linear in the unknown parameters have important properties and are called linear model。


The values of the coefficients will be determined by fitting the polynomial to the training data. This can be done by minimizing an error function that measures the misfit between the function y(x,w), for any given value of w, and the training set data points. One simple choice of error function, which is widely used, is given by the sum of the squares of the errors between the predictions y(xn,w) for each data point xn and the corresponding target values tn, so that we minimize


For the moment we simply note that it is a nonnegative quantity that would be zero if, and only if, the function y(x,w) were to pass exactly through each training data point。


We can solve the curve fitting problem by choosing the value of w for which E(w) is as small as possible. Because the error function is a quadratic function of the coefficients w, its derivatives with respect to the coefficients will be linear in the elements of w, and so the minimization of the error function has a unique solution, denoted by w*, which can be found in closed form. The resulting polynomial is given by the function y(x,w*).




