美文网首页
Logistic回归

Logistic回归

作者: 线断木偶人 | 来源:发表于2020-03-23 14:28 被阅读0次

    1.Sigmoid函数
    y={\frac{1}{1+e^{-z}}}

    Sigmoid函数的图像如下,取值[0,1]


    #sigmoid函数
    def sigmoid(inX):
        return 1.0 / (1 + np.exp(-inX))
    

    2.Sigmoid函数如何用于二分类
    为了实现logistic回归分类器,可以在每个特征值上乘以一个回归系数w,然后把所有的结果值相加得到z值,将z值带入上面的sigmoid函数中,会输出一个在[0,1]的数值,分类的结果:z>0.5,输出1,反正输出0。
    2.1 输入样本:X=(x0,x1,x2 ...xn)
    2.2 将已知的输入样本转变为z; x --> z
    相应的回归系数W=(w0,w1,w2...wn)
    样本特征值和相应系数相乘求和:
    z=w_0x_0 + w_1x_1 + w_2x_2 +... + w_nx_n
    {W 就是要求的值}
    2.3 带入sigmoid 函数

    3.怎么求W?
    求最佳回归系数W=(w0,w1,w2...wn)
    3.1 梯度上升发
    要找到某函数的最大值,最好的方法沿着该函数的梯度方法去寻找。

    函数f(x,y)的梯度记为:
    \nabla f(x,y)= {\begin{pmatrix} {\frac {\partial f(x,y)} {\partial x}} \\ {\frac {\partial f(x,y)} {\partial y}} \end{pmatrix}}

    这个梯度意味着f沿着x方向移动{\frac {\partial f(x,y)} {\partial x}},沿着y方向移动{\frac {\partial f(x,y)} {\partial y}}
    其中函数f(x,y)必须在待计算的点上有定义并且可微分。

    梯度上升算法计算公式:
    w := w + \alpha \nabla _wf(w)
    其中\alpha 为学习步长

    4.下面用代码表示

    import numpy as np
    
    def loadDataSet():
        dataMat = []
        labelMat = []
        fr = open('testSet.txt')
        for line in fr.readlines():
            lineArr = line.strip().split()
            dataMat.append([float(lineArr[0]),float(lineArr[1])])
            labelMat.append(int(lineArr[2]))
        return dataMat,labelMat
    """
    函数说明:sigmoi函数
    """
    def sigmoid(inX):
        return 1.0 / (1 + np.exp(-inX))
    
    
    """
    函数说明:梯度上升算法
    """
    
    def gradAscent(dataMatIn,classLabels):
        dataMatrix = np.mat(dataMatIn)                          #转换成numpy的mat
        labelMat = np.mat(classLabels).transpose()              #转换成numpy的mat并装置
        m, n = np.shape(dataMatrix)                             #返回矩阵的形状
        alpha = 0.001                                           #移动步长,也就是学习速率,控制更新的幅度
        maxCycles= 500                                          #最大迭代次数
        weights =np.ones((n,1))
        for k in range(maxCycles):
            h = sigmoid(dataMatrix * weights)                   #矩阵相乘
            error = labelMat -h
            weights = weights + alpha * dataMatrix.transpose() * error
        return weights.getA()
    
    if __name__ == "__main__":
        A,B=loadDataSet()
        t = gradAscent(A,B)
        print(t)
    

    结果:
    [[ 0.08108752]
    [-0.1233496 ]]
    即求出W1,W2的值

    5.画决策边界

    import numpy as np
    import matplotlib.pyplot as plt
    
    def loadDataSet():
        dataMat = []
        labelMat = []
        fr = open('testSet.txt')
        for line in fr.readlines():
            lineArr = line.strip().split()
            dataMat.append([1.0,float(lineArr[0]),float(lineArr[1])])
            labelMat.append(int(lineArr[2]))
        return dataMat,labelMat
    """
    函数说明:sigmoi函数
    """
    def sigmoid(inX):
        return 1.0 / (1 + np.exp(-inX))
    
    
    """
    函数说明:梯度上升算法
    """
    
    def gradAscent(dataMatIn,classLabels):
        dataMatrix = np.mat(dataMatIn)                          #转换成numpy的mat
        labelMat = np.mat(classLabels).transpose()              #转换成numpy的mat并装置
        m, n = np.shape(dataMatrix)                             #返回矩阵的形状
        alpha = 0.001                                           #移动步长,也就是学习速率,控制更新的幅度
        maxCycles= 500                                          #最大迭代次数
        weights =np.ones((n,1))
        for k in range(maxCycles):
            h = sigmoid(dataMatrix * weights)                   #矩阵相乘
            error = labelMat -h
            weights = weights + alpha * dataMatrix.transpose() * error
        return weights.getA()
    
    """
    函数说明:画决策边界
    """
    def plotBestFit(weights):
        dataMat, labelMat = loadDataSet()
        dataArr = np.array(dataMat)
        n = np.shape(dataMat)[0]
        xcord1 = []
        ycord1 = []
        xcord2 = []
        ycord2 = []
        for i in range(n):
            if int(labelMat[i]) == 1:
                xcord1.append(dataArr[i,1])
                ycord1.append(dataArr[i,2])
            else:
                xcord2.append(dataArr[i,1])
                ycord2.append(dataArr[i,2])
        fig = plt.figure()
        ax =fig.add_subplot(111)
        ax.scatter(xcord1, ycord1, s = 20, c = 'red', marker = 's',alpha=.5)
        ax.scatter(xcord2, ycord2, s = 20, c = 'green',alpha=.5)
        x = np.arange(-3.0, 3.0, 0.1)
        y = (-weights[0] - weights[1] * x) / weights[2]
        ax.plot(x, y)
        plt.title('BestFit')  # 绘制title
        plt.xlabel('X1')
        plt.ylabel('X2')  # 绘制label
        plt.show()
    
    if __name__ == "__main__":
        A,B=loadDataSet()
        weights = gradAscent(A,B)
        print(weights)
        plotBestFit(weights)
    

    结果还不错。


    相关文章

      网友评论

          本文标题:Logistic回归

          本文链接:https://www.haomeiwen.com/subject/otkjyhtx.html