美文网首页
linear regression

linear regression

作者: 习惯了千姿百态 | 来源:发表于2018-08-04 13:17 被阅读0次

1.hypothesis

one variableh_{\theta}(x)=\theta_0+\theta_1*x
multivariable$h_{\theta}(x)=\theta_0+\theta_1*x_1+\theta_2*x_2+...
general:h_{\theta}(x)=\theta^TX=\theta_0x_0+\theta_1x1+\theta_2x2+...+\theta_nx_n(x_0\equiv1)

2.cost function

J(\theta_0,\theta_1,...,\theta_n)=\frac{1}{2m}\sum_{i=1}^{m}(h_{\theta}(x^{(i)})-y^{(i)})^2
m:the number of training samples
n:the number of features

x:the set of training samples (m*(n+1))
y:the output of training samples (m*1)

h_\theta(x):hypthesis
\theta_i:parameter
  we should to find suitable \theta to make J(\theta) minimize,there are two methods.

3.normal equation

\theta=(X^TX)^{-1}X^Ty
Derivation
to get the minimum, we can get the formula below:
\frac{\partial}{\partial\theta_j}J(\theta_j)=0
further:
\begin{aligned} \\&J(\theta_0,...,\theta_n)=\frac{1}{2m}(X\theta-y)^T(X\theta-y)\\& =\frac{1}{2m}(Y^TY-Y^TX\theta-\theta^T X^TY+\theta^TX^TX\theta) \\&\therefore \frac{\partial}{\partial\theta_j}J(\theta_j) \\&=\frac{1}{2m}(\frac{\partial Y^TY}{\partial\theta}-\frac{\partial{Y^TX\theta}}{\partial\theta}-\frac{\partial\theta^TX^TY}{\partial\theta}+\frac{\partial\theta^TX^TX\theta}{\partial\theta})=0 \end{aligned}

we know:
\frac{dAB}{dB}=A^T (1)
\frac{dX^TAX}{dX}=AX+A^TX (2)
\frac{dB^TA}{dB}=\frac{d(B^TA)^T}{dB}=\frac{dA^TB}{dB}=A (3)
the equation (2) will be \frac{dX^TAX}{dX}=2AX when A is a symmetric matrix. the proof of equation 2

obviously, \frac{\partial Y^TY}{\partial \theta}=0
use the equation (1), we can get \frac{\partial{Y^TX\theta}}{\partial\theta}=(Y^TX)^T=X^TY
use the equation (3),we can get \frac{\partial\theta^T X^T Y}{\partial \theta}=X^TY
use the equation (2),we can get \frac{\partial\theta^TX^TX\theta}{\partial \theta}=2X^TX\theta

in summary:
\begin{aligned} \\&\frac{\partial}{\partial\theta_j}J(\theta_j) =\frac{1}{2m}(2X^TX\theta-2X^TY)=0 \end{aligned}
\Rightarrow \theta=(X^TX)^{-1}X^TY

4.gradient descent

repeat until converge{
\theta_j:=\theta_j-\alpha\frac{\partial}{\partial\theta_j}J(\theta_0...\theta_n)
}
\Rightarrow
reperat until converge{
\begin{aligned} \\&\theta_j:=\theta_j-\alpha\frac{1}{m}\sum_{i=1}^{m}((h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}) \end{aligned}
}
feature scaling and mean normalize
x_j^i=\frac{x_j^i-\mu_j}{S_j}
which \mu_j is the mean of x_j(feature j) and S_j is the Standard deviation or (max-min)

相关文章

网友评论

      本文标题:linear regression

      本文链接:https://www.haomeiwen.com/subject/prupvftx.html