美文网首页
M.L.-Classification and Represen

M.L.-Classification and Represen

作者: 爱上梦的鱼 | 来源:发表于2017-03-31 12:45 被阅读0次

    1.Logistic Regression(classification regression)


    Linear Regression may be not suited well for some classification problem,such as classifying the email `which is spam or not ,or judging the cancer's condition depend on its size.

    So,there is another algorithm——logistic regression,which has several features Xi,and the output y only two conditions——zero or one.

    Hypothesis Representation


    In the linear regression,the hypothesis result is θ'x which can be larger than 1 or smaller than 0,so we use sigmoid function to modify the hypothesis result during 1 and 0.

    Decision Boundary



    The decision boundary is the line that separates the area where y = 0 and where y = 1. It is created by our hypothesis function.

    decision boundary can be linear or nonlinear ,sometimes even complicated curve.

    As we can seen above,if we define:

    h(z) > 0.5  —>  y = 1 ;

    h(z) < 0.5  —>  y = 0 ;

    which means, z > 0 is the boundary.

    so,if z = θ'x ,then θ'x > 0 is the boundary which divide the area into two parts——y = 0 and y = 1; θ'x = θ0*x0 + θ1*x1 + θ2*x2 (this is a linear boundary)

    Cost Function


    We cannot use the same cost function that we use for linear regression because the Logistic Function will cause the output to be wavy, causing many local optima. In other words, it will not be a convex function.

    so, we define the cost function of logistic regression as this :

    c.f of logistic function

    We can rewrite the cost equation into the form:

    cost(h(x),y) = -ylog(h(x)) - (1-y)log(1-h(x))

    Gradient Descent


    The form is same as the gradient descent of linear regression.

    A vectorized implementation is:

    vectorized

    Advanced Optimization


    "Conjugate gradient", "BFGS", and "L-BFGS" are more sophisticated, faster ways to optimize θ that can be used instead of gradient descent. We suggest that you should not write these more sophisticated algorithms yourself (unless you are an expert in numerical computing) but use the libraries instead, as they're already tested and highly optimized. Octave provides them.

    2.Multi-class Classification: One-vs-others


    if we have more than two categories,instead of y = {0,1} we will expand our definition so that y = {0,1...n}.We divide our problem into n+1 (+1 because the index starts at 0) binary classification problems.

    one vs all

    To summarize:

    Train a logistic regression classifier hθ(x)for each class to predict the probability that y = i .

    To make a prediction on a new x, pick the class that maximizes hθ(x).

    3.PROBLEM : Over-fitting


    The hypothesis function may predict the examples in the training set very well,but can not predict the unseen data well.

    three conditions with different features

    As is shown in the picture above,the first curve has few features so it does not fit the data well,which called "under-fitting" or "high bias".The second curve is right well.And the last curve fitting all the examples in the training set but it looks like a unreasonable and complicate drawing may can not predict the unseen data.So,under this condition,the curve is called "over-fitting" or "high-variance" .

    What are the reasons of over-fitting?

    1).too many features

    2).too complicate hypothesis function

    How to solve it?

    1).reduce the features

    2).regularization (正则化)

    .Keep all the features, but reduce the magnitude of parameters θj.

    .Regularization works well when we have a lot of slightly useful features.

    Cost Function


    modified cost function 

    the regular formula:

    regularization parameter

    Regularized Linear Regression


    It will change the form of gradient descent and normal equation.

    Gradient Descent

    modified gradient descent

    Normal Equation

    modified normal equation 

    Recall that if m < n, then X'X is non-invertible. However, when we add the term λ⋅L, then X'X+ λ⋅L becomes invertible.

    Regularized Logistic Regression


    We can regularize logistic regression in a similar way that we regularize linear regression.

    regularized cost function

    so,the gradient descent function is changed as following:

    regularized gradient descent

    相关文章

      网友评论

          本文标题:M.L.-Classification and Represen

          本文链接:https://www.haomeiwen.com/subject/kemmottx.html