logistical regression

作者: 习惯了千姿百态 | 来源:发表于2018-08-07 23:24 被阅读0次

logistical regression
Logistical department
tensorflow 已经完成高级别的模型封装种类
十.Tensorflow反向传播
机器学习技法笔记：06 Support Vector Regre
Logistic Regression
M.L.-Classification and Represen
R-建模及预测
TensorFlow Deep Learning (II) -
机器学习笔记1_逻辑回归

1.logistic regression model

1.1 classification

want $0\le h_\theta(x)\le1$
$h_\theta(x)=g(\theta^Tx)$
$g(z)=\frac{1}{1+e^{-z}}$
$h_\theta(x)=P(y=1|x;\theta)--estimated\ probaility\ that\ y=1\ on\ input\ x \ parameterized\ by \theta$

1.2 cost function

$Cost(h_\theta(x),y)=\left\{ \begin{aligned} &\ -log(h_\theta(x))\qquad\ \ \ if \ y=1 \\ &\ -log(1-h_\theta(x))\quad if \ y=0 \end{aligned} \right.$
$J(\theta)=\frac{1}{m} \sum_{i=1}^m Cost(h_\theta(x^{(i)},y^{(i)})$

y=1

y=0

simplified cost function:
$Cost(h_\theta(x),y)=-ylog(h_\theta(x))-(1-y)log(1-h_\theta(x))$

gradient descent:
repeat {
$\theta_j:=\theta_j-\alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}$
}

1.3Optimization algorithms:

Gradient descent
Conjugate gradient
BFGS
L-BFGS
cost function

function [jVal, gradient] = costFunction(theta)
  jVal = [...code to compute J(theta)...];
  gradient = [...code to compute derivative of J(theta)...];
end

Then we can use octave's "fminunc()" optimization algorithm along with the "optimset()" function that creates an object containing the options we want to send to "fminunc()"

options = optimset('GradObj', 'on', 'MaxIter', 100);
initialTheta = zeros(2,1);
   [optTheta, functionVal, exitFlag] = fminunc(@costFunction, initialTheta, options);

1.4 multiclass classfication

Train a logistic regression classifier $h_\theta(x)$ for each class to predict the probability that y = i .To make a prediction on a new x, pick the class that maximizes $h_\theta (x)$

1.5 how to solve overFitting

reduce number of features
regularization

1.5.1 Regularized Linear Regression

small values for parameters $\theta_0,\theta_1...\theta_n$
①simpler hypothesis
②less prone to overfitting

$J(\theta)=\frac{1}{2m}\big[\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})^2+\lambda\sum_{i=1}^n\theta_j^2\big]$
if $\lambda$ is too large,then it will result in underfitting, because $\theta_0,\theta_1...\theta_n$ will be close to 0 at this moment.

gradient descent:
repeat{
$\theta_0:=\theta_0-\alpha/m*\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)}$
$\theta_j:=\theta_j-\alpha/m*\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}+\frac{\lambda}{m}\theta_j\quad (j=1...n)$
}

Normal Equation:

normal equation

1.5.2 Regularized logistic Regression

$J(\theta)=-\big[1/m\sum_{i=1}^m(y^{(i)}log(h_\theta(x^{(i)}))-(1-y^{(i)})log(1-h_\theta(x^{(i)})))\big]+\frac{\lambda}{2m}\sum_{j=1}^n\theta_j^2$
attention: $\theta_0=0$

Appendices

the derivation of cost function
first：
$h_\theta(x)=P(y=1|x;\theta) =1-P(y=0|x;\theta) \ \ --estimated\ probaility\ that\ y=1\ on\ input\ x \ parameterized\ by \theta$
$h_\theta(X)=g(z)=\frac{1}{1+e^{-z}}\quad\quad z=\theta ^TX$
so we can get the general formula:
$P(y|x;\theta)=g(z)^y(1-g(z))^{(1-y)}\qquad (y={0,1})----(1)$
then use the Maximum likelihood estimation(MLE):
note that $L(\theta)=\prod_{i=1}^nP(y^{(i)}|x^{(i)};\theta)----(2)$
substitute equation (1) into equation (2):
$L(\theta)=\prod_{i=1}^ng(z^{(i)})^{y{(i)}}(1-g(z^{(i)}))^{1-y^{(i)}}----(3)$

on equation (3) on both sides of the natural logarithm:
$ln(L(\theta))=\sum_{i=1}^n\big[y^{(i)}ln(g(z^{(i)}))+(-y^{(i)})ln(1-g(z^{(i)}))----(4)$

we know that MLE's goal is to get the best $\theta$ that makes equation (4) max, so we let
$J(\theta)=-\frac{1}{m}ln(L(\theta))=-\frac{1}{m}\sum_{i=1}^n\big[y^{(i)}ln(g(z^{(i)}))+(-y^{(i)})ln(1-g(z^{(i)}))----(5)$

next, we will to get the deviation $\frac{\partial J}{\partial \theta_j}$ :
$\begin{aligned} \\&\frac{\partial J}{\partial \theta_j}=-\sum_{i=1}^n\big[y^{(i)}\frac{1}{g(z^{(i)})}+(1-y^{(i)})\frac{-1}{1-g(z^{(i)})}\big]\frac{\partial g(z^{(i)})}{\partial \theta_j} \\& \\&=-\sum_{i=1}^n\big[\frac{y^{(i)}}{g(z^{(i)})}-\frac{(1-y^{(i)})}{1-g(z^{(i)})}\big]g(z^{(i)})(1-g(z^{(i)}))x_j^{(i)} \\&=-\sum_{i=1}^n\big[y^{(i)}(1-g(z^{(i)}))-(1-y^{(i)})g(z^{(i)})\big]x_j^{(i)} \end{aligned}$
so further,we can calculate :
$\theta_j:=\theta_j-\alpha\frac{\partial J}{\partial \theta_j}$

logistical regression
1.logistic regression model 1.1 classification want 1.2 c...
Logistical department
I'm writing to inform you that the courier services must ...
tensorflow 已经完成高级别的模型封装种类
linear regression logistic regression linear classificati...
十.Tensorflow反向传播
A Regression Example We create a regression example as fo...
机器学习技法笔记：06 Support Vector Regre
Roadmap Kernel Ridge Regression Support Vector Regression...
Logistic Regression
Logistic Regression Identical part of Linear Regression i...
M.L.-Classification and Represen
1.Logistic Regression（classification regression） Linear R...
R-建模及预测
Linear Regression[#linear-regression]Model[#model]Predict...
TensorFlow Deep Learning (II) -
1. What is regression Regression is normally the first al...
机器学习笔记1_逻辑回归
@[toc] 1 Logistic Regression Logistic Regression 逻辑回归，简称L...