美文网首页
梯度下降求解逻辑回归

梯度下降求解逻辑回归

作者: azorazz | 来源:发表于2019-10-03 22:34 被阅读0次

    模块:

    • 线性回归函数
    • sigmoid函数
    • cost损失函数
    • 二分类逻辑回归
    • 梯度计算
    1、线性回归函数

    f(x) = wx + b

    def model(X, theta):
        return sigmoid(np.dot(X, theta.T))
    
    2、 sigmoid函数

    g(z) = \frac{1}{1+e^{-z}}

    def sigmoid(z):
        return 1 / (1 + np.exp(-z))
    
    3、 二分类逻辑回归:

    对于每一个样本:
    预测分类 为类别1 的概率函数:
    h_{1}(x) = \frac{1}{1+e^{-(wx+b)}}
    预测分类 为类别0 的概率函数:
    h_{0}(x) = 1-h_{1}(x) = \frac{e^{-(wx+b)}}{1+e^{-(wx+b)}}
    合并函数(y=1):
    h_{\theta}(x) = h_{1}(x)^{y}*h_{0}(x)^{1-y}
    所有样本累乘取得似然函数:
    L(\theta )=\prod_{i=1}^{m}P(y_{i}|x_{i};\theta) = \prod_{i=1}^{m}(h_{\theta}(x_{i}))^{y_{i}}(1-h_{\theta}(x_{i}))^{1-y_{i}}
    对数似然:
    l(\theta) = L(\theta )=\sum_{i=1}^{m}(y_{i}\mathrm{log}h_{\theta}(x_{i})+(1-y_{i})\mathrm{log}(1-h_{\theta}(x_{i})))

    4、cost损失函数

    损失函数即是对数似然加负号
    1)对数似然求的是每个样本的 预测值==真实值 的概率
    2)损失函数是每个样本的 预测值 != 真实值 的衡量标准
    J(\theta) = \frac{1}{n}\sum_{i=1}^{n}-l(\theta) = \frac{1}{n}\sum_{i=1}^{n}(-1*\sum_{i=1}^{m}(y_{i}\mathrm{log}h_{\theta}(x_{i})+(1-y_{i})\mathrm{log}(1-h_{\theta}(x_{i}))))

    def cost(X, y, theta):
        left = np.multiply(-y, np.log(model(X, theta)))
        right = np.multiply(1 - y, np.log(1 - model(X, theta)))
        return np.sum(left - right) / (len(X))
    
    5、梯度计算

    \frac{\partial J}{\partial \theta_j}=-\frac{1}{m}\sum_{i=1}^n (y_i - h_\theta (x_i))x_{ij}

    def gradient(X, y, theta):
        grad = np.zeros(theta.shape)
        error = (model(X, theta)- y).ravel()
        for j in range(len(theta.ravel())): #for each parmeter
            term = np.multiply(error, X[:,j])
            grad[0, j] = np.sum(term) / len(X)
        
        return grad
    

    总结:

    • 当样本数很大时,使用批量下降法很费时
    • 小的学习率可以使cost下降比较稳定
    • 对数据预处理可以使cost下降比较稳定
    • 更多的迭代次数可以使cost损失下降的更多
    • 对于大数据,使用小批量数据梯度下降法不仅能解决速度问题,还可以多增加迭代次数

    完整代码:

    import time
    import numpy as np
    import pandas as pd
    
    pdData = pd.read_csv('LogiReg_data.txt', header=None, names=['Exam 1', 'Exam 2', 'Admitted'])
    ## 数据预处理
    pdData.insert(0, 'Ones', 1)  # 特征 (1, x1, x2, x3, ... , xn)
    data = pdData.values
    X = data[:, :-1]
    y = data[:, -1]
    theta = np.zeros(X.shape[1])
    
    STOP_ITER = 0
    STOP_COST = 1
    STOP_GRAD = 2
    
    
    def stopCriterion(type, value, threshold):
        # 设定三种不同的停止策略
        if type == STOP_ITER:
            return value > threshold
        elif type == STOP_COST:
            return abs(value[-1] - value[-2]) < threshold
        elif type == STOP_GRAD:
            return np.linalg.norm(value) < threshold
    
    
    def shuffleData(data):
        np.random.shuffle(data)
        X = data[:, :-1]
        y = data[:, -1]
        return X, y
    
    
    def sigmoid(x):
        return 1 / (1 + np.exp(-x))
    
    
    def model(X, theta):
        return sigmoid(np.dot(X, theta.T))  
    
    
    def cost(X, y, theta):
        left = np.multiply(-y, np.log(model(X, theta)))
        right = np.multiply(1 - y, np.log(1 - model(X, theta)))
        return np.sum(left - right) / len(X)
    
    
    def gradient(X, y, theta):
        grad = np.zeros(theta.shape)
        error = model(X, theta) - y
        for j in range(len(theta)):
            term = np.multiply(error, X[:, j])
            grad[j] = np.sum(term) / len(X)
    
        return grad
    
    
    def descent(data, theta, batchSize, stopType, thresh, alpha):
        # 1 < batchSize < n 时,使用 小批量梯度下降法
        # batchSize = n 时,使用 批量梯度下降法
        # batchSize = 1 时,使用 随机梯度下降法
        init_time = time.time()
        m, n = data.shape
        i = 0
        k = 0
        X, y = shuffleData(data)
        grad = np.zeros(theta.shape)
        costs = [cost(X, y, theta)]
    
        while True:
            grad = gradient(X[k:k + batchSize], y[k:k + batchSize], theta)
            k += batchSize
            if k >= n:
                k = 0
                X, y = shuffleData(data)
            theta = theta - alpha * grad
            costs.append(cost(X, y, theta))  # 计算新的损失
            i += 1
    
            if stopType == STOP_ITER:
                value = i
            elif stopType == STOP_COST:
                value = costs
            elif stopType == STOP_GRAD:
                value = grad
            if stopCriterion(stopType, value, thresh): break
    
        return theta, i - 1, costs, grad, time.time() - init_time
    
    
    def runExpe(data, theta, batchSize, stopType, thresh, alpha):
        theta, iter, costs, grad, dur = descent(data, theta, batchSize, stopType, thresh, alpha)
        name = "Original" if (data[:, 1] > 2).sum() > 1 else "Scaled"
        name += " data - learning rate: {} - ".format(alpha)
        if batchSize == n:
            strDescType = "Gradient"
        elif batchSize == 1:
            strDescType = "Stochastic"
        else:
            strDescType = "Mini-batch ({})".format(batchSize)
        name += strDescType + " descent - Stop: "
        if stopType == STOP_ITER:
            strStop = "{} iterations".format(thresh)
        elif stopType == STOP_COST:
            strStop = "costs change < {}".format(thresh)
        else:
            strStop = "gradient norm < {}".format(thresh)
        name += strStop
        print("***{}\nTheta: {} - Iter: {} - Last cost: {:03.2f} - Duration: {:03.2f}s".format(
            name, theta, iter, costs[-1], dur))
        return theta
    
    n = 100
    
    runExpe(data, theta, n, STOP_ITER, thresh=5000, alpha=0.000001)
    

    相关文章

      网友评论

          本文标题:梯度下降求解逻辑回归

          本文链接:https://www.haomeiwen.com/subject/fubdpctx.html