梯度下降求解逻辑回归

作者: ForgetThatNight | 来源:发表于2018-07-15 17:19 被阅读184次

梯度下降求解逻辑回归
梯度下降求解逻辑回归
梯度下降求解逻辑回归
用人话讲明白梯度下降Gradient Descent（以求解多元
梯度下降求解逻辑回归python
使用梯度下降方法求解逻辑回归
逻辑斯蒂回归
吴恩达第二次作业代码
Python机器学习随笔之非线性分类的logistic回归拟合及
逻辑回归专题

我们将建立一个逻辑回归模型来预测一个学生是否被大学录取。假设你是一个大学系的管理员，你想根据两次考试的结果来决定每个申请人的录取机会。你有以前的申请人的历史数据，你可以用它作为逻辑回归的训练集。对于每一个培训例子，你有两个考试的申请人的分数和录取决定。为了做到这一点，我们将建立一个分类模型，根据考试成绩估计入学概率。

the data

34.62365962451697,78.0246928153624,0
30.28671076822607,43.89499752400101,0
35.84740876993872,72.90219802708364,0
60.18259938620976,86.30855209546826,1
79.0327360507101,75.3443764369103,1
45.08327747668339,56.3163717815305,0
61.10666453684766,96.51142588489624,1
75.02474556738889,46.55401354116538,1
76.09878670226257,87.42056971926803,1
84.43281996120035,43.53339331072109,1
95.86155507093572,38.22527805795094,0
75.01365838958247,30.60326323428011,0
82.30705337399482,76.48196330235604,1
69.36458875970939,97.71869196188608,1
39.53833914367223,76.03681085115882,0
53.9710521485623,89.20735013750205,1
69.07014406283025,52.74046973016765,1
67.94685547711617,46.67857410673128,0
70.66150955499435,92.92713789364831,1
76.97878372747498,47.57596364975532,1
67.37202754570876,42.83843832029179,0
89.67677575072079,65.79936592745237,1
50.534788289883,48.85581152764205,0
34.21206097786789,44.20952859866288,0
77.9240914545704,68.9723599933059,1
62.27101367004632,69.95445795447587,1
80.1901807509566,44.82162893218353,1
93.114388797442,38.80067033713209,0
61.83020602312595,50.25610789244621,0

#三大件
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

import os
path = 'data' + os.sep + 'LogiReg_data.txt'
# 看数据文件第一行并不是列名，所以这里强行占用第一行，并自定义列名 
#第一列为科目一成绩 第二列为科目二成绩 第三列为是否录取
pdData = pd.read_csv(path, header=None, names=['Exam 1', 'Exam 2', 'Admitted'])
pdData.head()

pdData.shape
#输出(100, 3)

# returns the subset of rows such Admitted = 1, i.e. the set of *positive* examples
# 标记为正例
positive = pdData[pdData['Admitted'] == 1] 
# returns the subset of rows such Admitted = 0, i.e. the set of *negative* examples
# 标记为负例
negative = pdData[pdData['Admitted'] == 0] 

fig, ax = plt.subplots(figsize=(10,5))# 指定长度和宽度
# 分别画出两个类别的分布
ax.scatter(positive['Exam 1'], positive['Exam 2'], s=30, c='b', marker='o', label='Admitted')
ax.scatter(negative['Exam 1'], negative['Exam 2'], s=30, c='r', marker='x', label='Not Admitted')
ax.legend()
ax.set_xlabel('Exam 1 Score')
ax.set_ylabel('Exam 2 Score')

运行结果

The logistic regression -- 本质是二分类任务

目标：建立分类器

参数θ为y=ax+b中的a和b 阈值设置为0.5，大于0.5为正例

设定阈值，根据阈值判断录取结果

要完成的模块

sigmoid : 映射到概率的函数
model : 返回预测结果值
cost : 根据参数计算损失
gradient : 计算每个参数的梯度方向
descent : 进行参数更新
accuracy: 计算精度

`sigmoid` 函数

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

nums = np.arange(-10, 10, step=1) #creates a vector containing 20 equally spaced values from -10 to 10
fig, ax = plt.subplots(figsize=(12,4))
ax.plot(nums, sigmoid(nums), 'r')

运行结果

sigmoid

完成预测函数

def model(X, theta):
    
    return sigmoid(np.dot(X, theta.T)) # numpy的矩阵运算

image.png

 # in a try / except structure so as not to return an error if the block si executed several times
# 添加一个列，列名为ones 值全为1
pdData.insert(0, 'Ones', 1)
# set X (training data) and y (target variable)
orig_data = pdData.as_matrix() # convert the Pandas representation of the data to an array useful for further computations
cols = orig_data.shape[1]
X = orig_data[:,0:cols-1]
y = orig_data[:,cols-1:cols]

# convert to numpy arrays and initalize the parameter array theta
#X = np.matrix(X.values)
#y = np.matrix(data.iloc[:,3:4].values) #np.array(y.values)
# 初始化一个参数的矩阵占位 一行三列
theta = np.zeros([1, 3])

输入验证

X[:5]

输出样本第一列为1 第二列为科目一第三列为科目二

array([[  1.        ,  34.62365962,  78.02469282],
       [  1.        ,  30.28671077,  43.89499752],
       [  1.        ,  35.84740877,  72.90219803],
       [  1.        ,  60.18259939,  86.3085521 ],
       [  1.        ,  79.03273605,  75.34437644]])

输入验证

y[:5]

输出结果

array([[ 0.],
       [ 0.],
       [ 0.],
       [ 1.],
       [ 1.]])

输入

theta

输出

Out[12]:
array([[ 0.,  0.,  0.]])

输入

X.shape, y.shape, theta.shape

输出

((100, 3), (100, 1), (1, 3))

损失函数

定义损失函数，即图中的方程式

def cost(X, y, theta):
  # 左边的式子
    left = np.multiply(-y, np.log(model(X, theta)))
    right = np.multiply(1 - y, np.log(1 - model(X, theta)))
    return np.sum(left - right) / (len(X))

输入

cost(X, y, theta)

输出

0.69314718055994529

计算梯度

def gradient(X, y, theta):
#定义梯度  占位
    grad = np.zeros(theta.shape)
#求error
    error = (model(X, theta)- y).ravel()
# 求参数θ  θ1 θ2 θ3
    for j in range(len(theta.ravel())): #for each parmeter
        term = np.multiply(error, X[:,j])
        grad[0, j] = np.sum(term) / len(X)
    
    return grad

Gradient descent

比较3中不同梯度下降方法

STOP_ITER = 0 按照迭代次数停止
STOP_COST = 1 损失函数的变化量
STOP_GRAD = 2 根据梯度变化量

def stopCriterion(type, value, threshold):
    #设定三种不同的停止策略
    if type == STOP_ITER:        return value > threshold
    elif type == STOP_COST:      return abs(value[-1]-value[-2]) < threshold
    elif type == STOP_GRAD:      return np.linalg.norm(value) < threshold

import numpy.random
#洗牌--为了加强泛化能力  打乱数据的原有顺序
def shuffleData(data):
    np.random.shuffle(data)# 通过numpy的函数洗牌
    cols = data.shape[1]
    X = data[:, 0:cols-1]
    y = data[:, cols-1:]
    return X, y

import time
# thresh 为迭代阈值 次数迭代阈值为迭代次数 损失迭代中阈值为损失值 batchSize=0为随机梯度下降
def descent(data, theta, batchSize, stopType, thresh, alpha):
    #梯度下降求解
    
    init_time = time.time()
    i = 0 # 迭代次数
    k = 0 # batch
    X, y = shuffleData(data)
    grad = np.zeros(theta.shape) # 计算的梯度
    costs = [cost(X, y, theta)] # 损失值

    
    while True:
        grad = gradient(X[k:k+batchSize], y[k:k+batchSize], theta)
        k += batchSize #取batch数量个数据
        if k >= n: 
            k = 0 
            X, y = shuffleData(data) #重新洗牌
        theta = theta - alpha*grad # 参数更新
        costs.append(cost(X, y, theta)) # 计算新的损失
        i += 1 

        if stopType == STOP_ITER:       value = i
        elif stopType == STOP_COST:     value = costs
        elif stopType == STOP_GRAD:     value = grad
        if stopCriterion(stopType, value, thresh): break
    
    return theta, i-1, costs, grad, time.time() - init_time

def runExpe(data, theta, batchSize, stopType, thresh, alpha):
    #import pdb; pdb.set_trace();
    theta, iter, costs, grad, dur = descent(data, theta, batchSize, stopType, thresh, alpha)
    name = "Original" if (data[:,1]>2).sum() > 1 else "Scaled"
    name += " data - learning rate: {} - ".format(alpha)
    if batchSize==n: strDescType = "Gradient"
    elif batchSize==1:  strDescType = "Stochastic"
    else: strDescType = "Mini-batch ({})".format(batchSize)
    name += strDescType + " descent - Stop: "
    if stopType == STOP_ITER: strStop = "{} iterations".format(thresh)
    elif stopType == STOP_COST: strStop = "costs change < {}".format(thresh)
    else: strStop = "gradient norm < {}".format(thresh)
    name += strStop
    print ("***{}\nTheta: {} - Iter: {} - Last cost: {:03.2f} - Duration: {:03.2f}s".format(
        name, theta, iter, costs[-1], dur))
    fig, ax = plt.subplots(figsize=(12,4))
    ax.plot(np.arange(len(costs)), costs, 'r')
    ax.set_xlabel('Iterations')
    ax.set_ylabel('Cost')
    ax.set_title(name.upper() + ' - Error vs. Iteration')
    return theta

不同的停止策略

设定迭代次数

#选择的梯度下降方法是基于所有样本的
n=100
runExpe(orig_data, theta, n, STOP_ITER, thresh=5000, alpha=0.000001)

输出结果

***Original data - learning rate: 1e-06 - Gradient descent - Stop: 5000 iterations
Theta: [[-0.00027127  0.00705232  0.00376711]] - Iter: 5000 - Last cost: 0.63 - Duration: 1.18s
Out[29]:
array([[-0.00027127,  0.00705232,  0.00376711]])

输出结果

根据损失值停止

设定阈值 1E-6, 差不多需要110 000次迭代

runExpe(orig_data, theta, n, STOP_COST, thresh=0.000001, alpha=0.001)

输出结果

***Original data - learning rate: 0.001 - Gradient descent - Stop: costs change < 1e-06
Theta: [[-5.13364014  0.04771429  0.04072397]] - Iter: 109901 - Last cost: 0.38 - Duration: 24.47s
Out[27]:
array([[-5.13364014,  0.04771429,  0.04072397]])

输出结果

根据梯度变化停止

设定阈值 0.05,差不多需要40 000次迭代

runExpe(orig_data, theta, n, STOP_GRAD, thresh=0.05, alpha=0.001)

输出结果

***Original data - learning rate: 0.001 - Gradient descent - Stop: gradient norm < 0.05
Theta: [[-2.37033409  0.02721692  0.01899456]] - Iter: 40045 - Last cost: 0.49 - Duration: 10.79s
Out[28]:
array([[-2.37033409,  0.02721692,  0.01899456]])

输出结果

对比不同的梯度下降方法

Stochastic descent

runExpe(orig_data, theta, 1, STOP_ITER, thresh=5000, alpha=0.001)

输出结果

***Original data - learning rate: 0.001 - Stochastic descent - Stop: 5000 iterations
Theta: [[-0.38504802  0.09357723 -0.01034717]] - Iter: 5000 - Last cost: 1.59 - Duration: 0.42s
Out[30]:
array([[-0.38504802,  0.09357723, -0.01034717]])

输出结果

有点爆炸。。。很不稳定,再来试试把学习率调小一些，不会收敛，怎么样去修改？调低alpha和增加迭代次数

runExpe(orig_data, theta, 1, STOP_ITER, thresh=15000, alpha=0.000002)

输出结果

***Original data - learning rate: 2e-06 - Stochastic descent - Stop: 15000 iterations
Theta: [[-0.00202012  0.01009114  0.00103943]] - Iter: 15000 - Last cost: 0.63 - Duration: 1.10s
Out[31]:
array([[-0.00202012,  0.01009114,  0.00103943]])

速度快，但稳定性差，需要很小的学习率

Mini-batch descent

runExpe(orig_data, theta, 16, STOP_ITER, thresh=15000, alpha=0.001)

输出结果

***Original data - learning rate: 0.001 - Mini-batch (16) descent - Stop: 15000 iterations
Theta: [[-1.0352224   0.01668297  0.0124234 ]] - Iter: 15000 - Last cost: 0.57 - Duration: 1.44s
Out[33]:
array([[-1.0352224 ,  0.01668297,  0.0124234 ]])

浮动仍然比较大，我们来尝试下对数据进行标准化将数据按其属性(按列进行)减去其均值，然后除以其方差。最后得到的结果是，对每个属性/每列来说所有数据都聚集在0附近，方差值为1

利用sklearn来做归一化：数据减去均值再除以方差

from sklearn import preprocessing as pp

scaled_data = orig_data.copy()
scaled_data[:, 1:3] = pp.scale(orig_data[:, 1:3])

runExpe(scaled_data, theta, n, STOP_ITER, thresh=5000, alpha=0.001)

输出结果

***Scaled data - learning rate: 0.001 - Gradient descent - Stop: 5000 iterations
Theta: [[ 0.3080807   0.86494967  0.77367651]] - Iter: 5000 - Last cost: 0.38 - Duration: 1.13s
Out[34]:
array([[ 0.3080807 ,  0.86494967,  0.77367651]])

它好多了！原始数据，只能达到达到0.61，而我们得到了0.38个在这里！

所以对数据做预处理是非常重要的

输入

runExpe(scaled_data, theta, n, STOP_GRAD, thresh=0.02, alpha=0.001)

输出

***Scaled data - learning rate: 0.001 - Gradient descent - Stop: gradient norm < 0.02
Theta: [[ 1.071  2.63   2.411]] - Iter: 59422 - Last cost: 0.22 - Duration: 12.00s
Out[25]:
array([[ 1.071,  2.63 ,  2.411]])

更多的迭代次数会使得损失下降的更多！

theta = runExpe(scaled_data, theta, 1, STOP_GRAD, thresh=0.002/5, alpha=0.001)

输出

***Scaled data - learning rate: 0.001 - Stochastic descent - Stop: gradient norm < 0.0004
Theta: [[ 1.14848169  2.79268789  2.5667383 ]] - Iter: 72637 - Last cost: 0.22 - Duration: 7.05s

随机梯度下降更快，但是我们需要迭代的次数也需要更多，所以还是用batch的比较合适！！！

runExpe(scaled_data, theta, 16, STOP_GRAD, thresh=0.002*2, alpha=0.001)

输出结果

***Scaled data - learning rate: 0.001 - Mini-batch (16) descent - Stop: gradient norm < 0.004
Theta: [[ 1.17096801  2.83171736  2.61095087]] - Iter: 3940 - Last cost: 0.21 - Duration: 0.50s
Out[36]:
array([[ 1.17096801,  2.83171736,  2.61095087]])

image.png

精度

#设定阈值为0.5 大于0.5为正值
def predict(X, theta):
    return [1 if x >= 0.5 else 0 for x in model(X, theta)]

scaled_X = scaled_data[:, :3]
y = scaled_data[:, 3]
predictions = predict(scaled_X, theta)
correct = [1 if ((a == 1 and b == 1) or (a == 0 and b == 0)) else 0 for (a, b) in zip(predictions, y)]
accuracy = (sum(map(int, correct)) % len(correct))
print ('accuracy = {0}%'.format(accuracy))

输出结果

accuracy = 89%

梯度下降求解逻辑回归
需要数据集可以私信我为了便于展示结果，省去繁琐的复制粘贴，采用notebook截屏的方式，带来不便还请谅解。
梯度下降求解逻辑回归
我们将建立一个逻辑回归模型来预测一个学生是否被大学录取。假设你是一个大学系的管理员，你想根据两次考试的结果来决定每...
梯度下降求解逻辑回归
模块：线性回归函数 sigmoid函数 cost损失函数二分类逻辑回归梯度计算 1、线性回归函数 2、 si...
用人话讲明白梯度下降Gradient Descent（以求解多元
文章目录1.梯度2.多元线性回归参数求解3.梯度下降4.梯度下降法求解多元线性回归梯度下降算法在机器学习中出现频...
梯度下降求解逻辑回归python
#导入数据分析的三大件import numpyas npimport pandasas pdimport matp...
使用梯度下降方法求解逻辑回归
逻辑斯蒂回归
一句话总结逻辑回归：逻辑回归假设数据服从伯努利分布，通过极大似然函数的方法，运用梯度下降来求解参数，来达到将数据...
吴恩达第二次作业代码
输出逻辑回归模型的梯度下降输出正则化逻辑回归输出
Python机器学习随笔之非线性分类的logistic回归拟合及
编者注：本文采用梯度下降法来求解的logistic回归，关于其思想以及编程原理见本人之前文章《梯度下降法求解线性回...
逻辑回归专题
1|一句话概括逻辑回归逻辑回归假设数据服从伯努利分布，通过极大化似然函数的方法，运用梯度下降来求解参数，来达到将...