2. 代价函数

作者: 玄语梨落 | 来源:发表于2020-08-15 08:32 被阅读0次

    Cost Function

    Model Representation

    m = Number of training examples
    x's = "input" variable / features
    y's = "output" variable / "targer" variable

    h instand for hypothesis h represent a function

    How do we represent h?

    h_\Theta (x)=\Theta_0+\Theta_1 x

    Cost Function

    • minimize
    • 代价函数(平方误差函数,平方误差代价函数)解决回归问题最好的手段

    Hypothesis: h_\Theta(x)=\Theta_0+\Theta_1x
    Cost Function: J(\Theta_0,\Theta_1)=\frac{1}{2m}\sum_{i=1}^{m}(h_\Theta(x^{(i)})-y^{(i)})^2
    Goal:\mathop{minimize}\limits_{\Theta_0,\Theta_1}J(\Theta_0,\Theta_1)

    Cost Function

    Simplified:
    h_\Theta(x)=\Theta_1 x

    对于线性回归来说

    • 一维的代价函数是一个弓形
    • 三维的代价函数和一维的代价函数一样都是一个弓形

    contour plot or contour figure:轮廓图

    Gradient descent

    Gradient descent can be used in more common cost function ,not only in two parameter

    Outline:
    Start with some \Theta_0,\Theta_1
    Keep changing \Theta_0,\Theta_1 to reduce J(\Theta_0,\Theta_1)
    Until we hopefully end up at a minimum

    Gradient descent algorithm:

    repeta until convergence {\Theta_j:=\Theta_j-\alpha\frac{\partial}{\partial\Theta_j}J(\Theta_0,\Theta_1)(for j=0 and j=1)}

    \alpha is a mumber called learning rate, which control the length of our step in Gradient descent.

    Correct: Simultaneous update

    temp0:=\Theta_0 - \alpha\frac{\partial}{\partial\Theta_0}J(\Theta_0,\Theta_1)
    temp1:=\Theta_1 - \alpha\frac{\partial}{\partial\Theta_1}J(\Theta_0,\Theta_1)
    \Theta_0:=temp0
    \Theta_1:=temp1

    Incorrect:

    temp0:=\Theta_0 - \alpha\frac{\partial}{\partial\Theta_0}J(\Theta_0,\Theta_1)
    \Theta_0:=temp0
    temp1:=\Theta_1 - \alpha\frac{\partial}{\partial\Theta_1}J(\Theta_0,\Theta_1)
    \Theta_1:=temp1

    同时更新

    Gradient descent's characteristics

    \Theta_1:=\Theta_1-\alpha\frac{\partial}{\partial\Theta_1}J(\Theta_1)

    if \alpha is too small, gradient descent can be slow.
    if \alpha is too large, gradient descent can overshoot the minimum. It ma fail to coverge ,or even diverge.
    As we approach a local minimum, gradient descent will automatically take smaller steps. So, no need to decraser \alpha over time.

    Gradient Descent For Liner Reg

    convex function
    "Batch" Gradient Descent: Each step of gradient descent uses all the training examples.

    相关文章

      网友评论

        本文标题:2. 代价函数

        本文链接:https://www.haomeiwen.com/subject/tsxgdktx.html