6. Classification

作者: 玄语梨落 | 来源:发表于2020-08-17 13:57 被阅读0次

6. Classification
模式識別與分類的區別
斯坦福CS224n笔记系列4--Word Window分类与神经
a04.Andrew-ML03-分类、逻辑回归
Andrew NG Coursera教程学习笔记-Week3
Classification
document classification
ACM论文分类检索
2018-10-27 classification问题与为什么不
推荐算法会使用的模型

Classification

Logistic Regression: $0\le h_\theta(x)\le 1$

Hypothesis Representation

Want $0\le h_\theta(x)\le 1$
Hypothesis: $h_\theta(x)=g(\theta^Tx)\ g(z)=\frac{1}{1+e^{-z}}$
Sigmoid funciton( $g(z)=\frac{1}{1+e^{-z}}$ ) = Logistic funciton
$h_\theta(x)=P(y=1|x;\theta)$ probability that $y=1$ given $x$ , parameterized $\theta$

Decision Boundary

$h_\theta\ge 0.5$ ,means $z\ge 0$ , then $\theta^Tx\ge 0$
the line which represnt $h_\theta(x)=0.5$ names decision boundary.
The decision boundary is a property of the hypothesis not a property of the data set.
We use data set to find a $\theta$ ,each $\theta$ defines a decision boundary.

Cost Function

How to fit the parameters theta for logitic Regression.
Liner Regression:
$j(\theta)=\frac{1}{m}\sum_{i=1}^m\frac{1}{2}(h_\theta(x^{(i)})-y^{(i)})^2 \newline Cost(h_\theta(x),y)=\frac{1}{2}(h_\theta(x)-y)^2 \newline j(\theta)=\frac{1}{m}\sum_{i=1}^mCost(h_\theta(x),y)$
When used for logistic regression, this function will be a no-convex function.
$Cost(h_\theta(x),y)=\left\{ \begin{aligned} &-\log(h_\theta(x))(y=1) \\ &-\log(1-h_\theta(x))(y=0) \end{aligned}\right.$

The topic of convexity analysis is beyond the scope of this course.

Simplified cost function and gradient descent

$Cost(h_\theta(x),y)=\left\{ \begin{aligned} &-\log(h_\theta(x))(y=1) \\ &-\log(1-h_\theta(x))(y=0) \end{aligned}\right. \newline Cost(h_\theta(x),y)=-y\log(h_\theta(x))-(1-y)\log(1-h_\theta(x))$

the priniclple of maximum likelihood estimation
代价函数：
$J(\theta)=-\frac{1}{m}[\sum_{i=1}^my^{(i)}\log h_\theta(x^{(i)})+(1-y^{(i)})\log(1-h_\theta(x^{(i)}))]$
Want $min_\theta J(\theta)$ :
Repetat {
$\theta_j:=\theta_j-\alpha\sum_{i=1}^m(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}$
}
Because of the $h_\theta(x)$ of the logistic regression is not as same as the liner regression ,the algorithms are no the same thing.

Advanced Optimization

Given $\theta$ , we have code that can compute

$J(\theta)$
$\frac{\delta}{\delta \theta_j}J(\theta)$

Optimizatin algorithms:

Gradient descent
Conjugate gradient
BFGS
L-BFGS

Advantages:

No need to manuallly pick $\alpha$
Often faster than gradient descent.

Disadvantages:

more complex

An example:

the function

function[jVal,gradient] = costFunction(theta)

jVal = (theta(1)-5)^2+(theta(2)-5)^2

gradient = zeros(2,1);
gradient(1) = 2*(theta(1)-5);
gradient(2) = 2*(theta(2)-5);

code in Octave:

Options = optimset('GradObj','on','MaxIter','100');
initialTheta = zeros(2,1)
[optTheta,functionVal,exitFlag] = fminunc(@costFunction,initialTheta,options)

The function 'fminunc' is not a gradient descent, but can be seen as it.
There must be at lest 2 paragramers or 2 dimension ( $\theta$ ) in using 'fminunc'. To get more information, use 'help fminunc'.

Multiclass classification

one versus all classification (one versus reset)

Train a logistic regression classifier $h_\theta^{(i)}(x)$ for each class $i$ to predict the probability that $y=i$ .
On a new input $x$ , to make a prediction, pick the class $i$ that maximizes.

6. Classification
Classification Logistic Regression: Hypothesis Representa...
模式識別與分類的區別
Classification:Classification aims to divide the items in...
斯坦福CS224n笔记系列4--Word Window分类与神经
1. Classification background 1.1 Classification setup ...
a04.Andrew-ML03-分类、逻辑回归
Classification and Representation 01.Classification 要尝试分类...
Andrew NG Coursera教程学习笔记-Week3
Classification and Representation Classification 此小节主要讲了哪...
Classification
分类问题问题如果用y=b+w1x1+w2x2回归拟合class1 是1class2 是-1得出的曲线会因为噪点而出...
document classification
document classification
ACM论文分类检索
ACM Computing Classification System
2018-10-27 classification问题与为什么不
1.classification 分类问题 2.用回归的方法来解决classification问题如果分类问题当...
推荐算法会使用的模型
Classification Classification分类的主要目的就是为我们的数据记录打上标签。分类模型主要...