美文网首页
逻辑回归

逻辑回归

作者: dingtom | 来源:发表于2019-10-23 17:37 被阅读0次

逻辑回归类似于线性回归,适用于因变量不是一个数值字的情况 (例如,一个“是/否”的响应)。它虽然被称为回归,但却是基于根据回归的分类,将因变量分为两类。

如上所述,逻辑回归用于预测二分类的输出。例如,如果信用卡公司构建一个模型来决定是否通过向客户的发行信用卡申请,它将预测客户的信用卡是否会“违约”。 首先对变量之间的关系进行线性回归以构建模型,分类的阈值假设为0.5。 然后将Logistic函数应用于回归分析,得到两类的概率。该函数给出了事件发生和不发生概率的对数。最后,根据这两类中较高的概率对变量进行分类。

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from sklearn.datasets.samples_generator import make_classification
# n_features :特征个数= n_informative() + n_redundant + n_repeated
# n_informative:多信息特征的个数
# n_redundant:冗余信息,informative特征的随机线性组合
# n_repeated :重复信息,随机提取n_informative和n_redundant 特征
# n_classes:分类类别
# n_clusters_per_class :某一个类别是由几个cluster构成的

X, Y = make_classification(n_samples=100, n_features=2, n_redundant=0, n_informative=2, random_state=1, n_clusters_per_class=2)
Y = Y.reshape((-1, 1))
# print(X.shape, Y.shape, Y[:10])  # (100, 2) (100, 1)

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.25, random_state = 0)

# 将类别数据数字化
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
onehotencoder = OneHotEncoder(sparse = False)
Y = onehotencoder.fit_transform(Y)
#.toarray()

#### 特征缩放
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

### 步骤2 | 逻辑回归模型
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression()
classifier.fit(X_train, y_train)

y_pred = classifier.predict(X_test)

for i, j in zip(y_pred, y_test):
    print(i, j)

numpy

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets.samples_generator import make_classification

class logistic_regression():    
    def __init__(self):        
        pass
    def create_data(self):
        X, labels = make_classification(n_samples=100, n_features=2, n_redundant=0, n_informative=2, random_state=1, n_clusters_per_class=2)
        labels = labels.reshape((-1, 1))
        offset = int(X.shape[0] * 0.9)
        X_train, y_train = X[:offset], labels[:offset]
        X_test, y_test = X[offset:], labels[offset:]        
        return X_train, y_train, X_test, y_test    
        
    def plot_logistic(self, X_train, y_train, params):
        n = X_train.shape[0]
        xcord1 = []
        ycord1 = []
        xcord2 = []
        ycord2 = []        
        for i in range(n):            
            if y_train[i] == 1:
                xcord1.append(X_train[i][0])
                ycord1.append(X_train[i][1])            
            else:
                xcord2.append(X_train[i][0])
                ycord2.append(X_train[i][1])
        fig = plt.figure()
        ax = fig.add_subplot(111)
        ax.scatter(xcord1, ycord1, s=32, c='red')
        ax.scatter(xcord2, ycord2, s=32, c='green')
        x = np.arange(-1.5, 3, 0.1)
        y = (-params['b'] - params['W'][0] * x) / params['W'][1]
        ax.plot(x, y)
        plt.xlabel('X1')
        plt.ylabel('X2')
        plt.show()
    # sigmoid 函数
    def sigmoid(self, x):
        z = 1 / (1 + np.exp(-x))        
        return z    
    # 模型参数初始化函数
    def initialize_params(self, dims):
        W = np.zeros((dims, 1))
        b = 0
        return W, b    
    # 逻辑回归模型主体部分,包括模型计算公式、损失函数和参数的梯度公式
    def logistic(self, X, y, W, b):
        num_train = X.shape[0]
        num_feature = X.shape[1]

        pre = self.sigmoid(np.dot(X, W) + b)
        cost = -1 / num_train * np.sum(y * np.log(pre) + (1 - y) * np.log(1 - pre))

        dW = np.dot(X.T, (pre - y)) / num_train
        db = np.sum(pre - y) / num_train
        cost = np.squeeze(cost)        
        return pre, cost, dW, db    
    # 基于梯度下降的参数更新训练过程
    def logistic_train(self, X, y, learning_rate, epochs):    
        # 初始化模型参数
        W, b = self.initialize_params(X.shape[1])  
        cost_list = []  
        # 迭代训练
        for i in range(epochs):       
            # 计算当前次的模型计算结果、损失和参数梯度
            pre, cost, dW, db = self.logistic(X, y, W, b)    
            # 参数更新
            W = W -learning_rate * dW
            b = b -learning_rate * db        
            # 记录损失
            if i % 100 == 0:
                cost_list.append(cost)   
                print('epoch %d cost %f' % (i, cost)) 
        # 保存参数
        params = {            
            'W': W,            
            'b': b
        }        
        # 保存梯度
        grads = {            
            'dW': dW,            
            'db': db
        }           
        return cost_list, params, grads    
    # 对测试数据的预测函数
    def predict(self, X, params):
        y_prediction = self.sigmoid(np.dot(X, params['W']) + params['b'])        
        for i in range(len(y_prediction)):            
            if y_prediction[i] > 0.5:
                y_prediction[i] = 1
            else:
                y_prediction[i] = 0

        return y_prediction    
    def accuracy(self, y_test, y_pred):
        correct_count = 0
        for i in range(len(y_test)):            
            for j in range(len(y_pred)):                
                if y_test[i] == y_pred[j] and i == j:
                    correct_count += 1

        accuracy_score = correct_count / len(y_test)        
        return accuracy_score    
        
    
if __name__ == "__main__":
    model = logistic_regression()
    X_train, y_train, X_test, y_test = model.create_data()
    print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)
    cost_list, params, grads = model.logistic_train(X_train, y_train, 0.01, 1000)
    print(params)
    y_train_pred = model.predict(X_train, params)
    accuracy_score_train = model.accuracy(y_train, y_train_pred)
    print('train accuracy is:', accuracy_score_train)
    y_test_pred = model.predict(X_test, params)
    accuracy_score_test = model.accuracy(y_test, y_test_pred)
    print('test accuracy is:', accuracy_score_test)
    model.plot_logistic(X_train, y_train, params)

(90, 2) (90, 1) (10, 2) (10, 1)
epoch 0 cost 0.693147
epoch 100 cost 0.521480
epoch 200 cost 0.416359
epoch 300 cost 0.347951
epoch 400 cost 0.300680
epoch 500 cost 0.266327
epoch 600 cost 0.240328
epoch 700 cost 0.220002
epoch 800 cost 0.203687
epoch 900 cost 0.190306
{'W': array([[ 2.04608084],
[-0.03964634]]), 'b': 0.12335926234285087}
train accuracy is: 0.9666666666666667
test accuracy is: 1.0

相关文章

  • 机器学习day7-逻辑回归问题

    逻辑回归 逻辑回归,是最常见最基础的模型。 逻辑回归与线性回归 逻辑回归处理的是分类问题,线性回归处理回归问题。两...

  • ML03-逻辑回归(下部分)

    本文主题-逻辑回归(下部分):逻辑回归的应用背景逻辑回归的数学基础逻辑回归的模型与推导逻辑回归算法推导梯度下降算法...

  • ML02-逻辑回归(上部分)

    本文主题-逻辑回归(上部分):逻辑回归的应用背景逻辑回归的数学基础逻辑回归的模型与推导逻辑回归算法推导梯度下降算法...

  • 逻辑回归模型

    1.逻辑回归介绍2.机器学习中的逻辑回归3.逻辑回归面试总结4.逻辑回归算法原理推导5.逻辑回归(logistic...

  • Task 01|基于逻辑回归的分类预测

    知识背景 关于逻辑回归的几个问题 逻辑回归相比线性回归,有何异同? 逻辑回归和线性回归最大的不同点是逻辑回归解决的...

  • 11. 分类算法-逻辑回归

    逻辑回归 逻辑回归是解决二分类问题的利器 逻辑回归公式 sklearn逻辑回归的API sklearn.linea...

  • 机器学习100天-Day4-6逻辑回归

    逻辑回归(Logistic Regression) 什么是逻辑回归 逻辑回归被用于对不同问题进行分类。在这里,逻辑...

  • SKlearn_逻辑回归小练习

    逻辑回归 逻辑回归(Logistic regression 或logit regression),即逻辑模型(英语...

  • R glm

    R 逻辑回归 R 怎么做逻辑回归

  • 逻辑斯蒂回归在二分类中的应用

    逻辑回归简介 逻辑斯蒂回归(logistic regression,又称“对数几率回归”)是经典的分类方法。逻辑斯...

网友评论

      本文标题:逻辑回归

      本文链接:https://www.haomeiwen.com/subject/cepructx.html