美文网首页
logistical regression

logistical regression

作者: 习惯了千姿百态 | 来源:发表于2018-08-07 23:24 被阅读0次

    1.logistic regression model

    1.1 classification

    want 0\le h_\theta(x)\le1
    h_\theta(x)=g(\theta^Tx)
    g(z)=\frac{1}{1+e^{-z}}
    h_\theta(x)=P(y=1|x;\theta)--estimated\ probaility\ that\ y=1\ on\ input\ x \ parameterized\ by \theta

    1.2 cost function

    Cost(h_\theta(x),y)=\left\{ \begin{aligned} &\ -log(h_\theta(x))\qquad\ \ \ if \ y=1 \\ &\ -log(1-h_\theta(x))\quad if \ y=0 \end{aligned} \right.
    J(\theta)=\frac{1}{m} \sum_{i=1}^m Cost(h_\theta(x^{(i)},y^{(i)})

    y=1
    y=0

    simplified cost function:
    Cost(h_\theta(x),y)=-ylog(h_\theta(x))-(1-y)log(1-h_\theta(x))

    gradient descent:
    repeat {
    \theta_j:=\theta_j-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}
    }

    1.3Optimization algorithms:

    • Gradient descent
    • Conjugate gradient
    • BFGS
    • L-BFGS
      cost function
    function [jVal, gradient] = costFunction(theta)
      jVal = [...code to compute J(theta)...];
      gradient = [...code to compute derivative of J(theta)...];
    end
    

    Then we can use octave's "fminunc()" optimization algorithm along with the "optimset()" function that creates an object containing the options we want to send to "fminunc()"

    options = optimset('GradObj', 'on', 'MaxIter', 100);
    initialTheta = zeros(2,1);
       [optTheta, functionVal, exitFlag] = fminunc(@costFunction, initialTheta, options);
    

    1.4 multiclass classfication

    Train a logistic regression classifier h_\theta(x)for each class to predict the probability that y = i .To make a prediction on a new x, pick the class that maximizes h_\theta (x)

    1.5 how to solve overFitting

    • reduce number of features
    • regularization
    1.5.1 Regularized Linear Regression

    small values for parameters \theta_0,\theta_1...\theta_n
    ①simpler hypothesis
    ②less prone to overfitting

    J(\theta)=\frac{1}{2m}\big[\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2+\lambda\sum_{i=1}^n\theta_j^2\big]
    if \lambda is too large,then it will result in underfitting, because \theta_0,\theta_1...\theta_n will be close to 0 at this moment.

    gradient descent:
    repeat{
    \theta_0:=\theta_0-\alpha/m*\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)}
    \theta_j:=\theta_j-\alpha/m*\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}+\frac{\lambda}{m}\theta_j\quad (j=1...n)
    }

    Normal Equation:


    normal equation

    1.5.2 Regularized logistic Regression

    J(\theta)=-\big[1/m\sum_{i=1}^m(y^{(i)}log(h_\theta(x^{(i)}))-(1-y^{(i)})log(1-h_\theta(x^{(i)})))\big]+\frac{\lambda}{2m}\sum_{j=1}^n\theta_j^2
    attention:\theta_0=0

    gradient descent:
    repeat{
    \theta_0:=\theta_0-\alpha/m*\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)}
    \theta_j:=\theta_j-\alpha/m*\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}+\frac{\lambda}{m}\theta_j\quad (j=1...n)
    }


    Appendices

    the derivation of cost function
    first:
    h_\theta(x)=P(y=1|x;\theta) =1-P(y=0|x;\theta) \ \ --estimated\ probaility\ that\ y=1\ on\ input\ x \ parameterized\ by \theta
    h_\theta(X)=g(z)=\frac{1}{1+e^{-z}}\quad\quad z=\theta ^TX
    so we can get the general formula:
    P(y|x;\theta)=g(z)^y(1-g(z))^{(1-y)}\qquad (y={0,1})----(1)
    then use the Maximum likelihood estimation(MLE):
    note that L(\theta)=\prod_{i=1}^nP(y^{(i)}|x^{(i)};\theta)----(2)
    substitute equation (1) into equation (2):
    L(\theta)=\prod_{i=1}^ng(z^{(i)})^{y{(i)}}(1-g(z^{(i)}))^{1-y^{(i)}}----(3)

    on equation (3) on both sides of the natural logarithm:
    ln(L(\theta))=\sum_{i=1}^n\big[y^{(i)}ln(g(z^{(i)}))+(-y^{(i)})ln(1-g(z^{(i)}))----(4)

    we know that MLE's goal is to get the best \theta that makes equation (4) max, so we let
    J(\theta)=-\frac{1}{m}ln(L(\theta))=-\frac{1}{m}\sum_{i=1}^n\big[y^{(i)}ln(g(z^{(i)}))+(-y^{(i)})ln(1-g(z^{(i)}))----(5)

    next, we will to get the deviation\frac{\partial J}{\partial \theta_j}:
    \begin{aligned} \\&\frac{\partial J}{\partial \theta_j}=-\sum_{i=1}^n\big[y^{(i)}\frac{1}{g(z^{(i)})}+(1-y^{(i)})\frac{-1}{1-g(z^{(i)})}\big]\frac{\partial g(z^{(i)})}{\partial \theta_j} \\& \\&=-\sum_{i=1}^n\big[\frac{y^{(i)}}{g(z^{(i)})}-\frac{(1-y^{(i)})}{1-g(z^{(i)})}\big]g(z^{(i)})(1-g(z^{(i)}))x_j^{(i)} \\&=-\sum_{i=1}^n\big[y^{(i)}(1-g(z^{(i)}))-(1-y^{(i)})g(z^{(i)})\big]x_j^{(i)} \end{aligned}
    so further,we can calculate :
    \theta_j:=\theta_j-\alpha\frac{\partial J}{\partial \theta_j}

    相关文章

      网友评论

          本文标题:logistical regression

          本文链接:https://www.haomeiwen.com/subject/nfnmvftx.html