机器学习 Day 5 | 实现 Logistic Regress

作者: raphah | 来源:发表于2018-08-12 20:57 被阅读13次

    机器学习第5天 结合昨天的知识实现逻辑回归

    数据集地址

    https://www.xiehaoo.com/media/record/pinke/2018/08/Social_Network_Ads.csv

    该数据集包含了社交网络中用户的信息。这些信息涉及用户ID,性别,年龄以及预估薪资。一家汽车公司刚刚推出了他们新型的豪华SUV,我们尝试预测哪些用户会购买这种全新SUV。并且在最后一列用来表示用户是否购买。我们将建立一种模型来预测用户是否购买这种SUV,该模型基于两个变量,分别是年龄和预计薪资。因此我们的特征矩阵将是这两列。我们尝试寻找用户年龄与预估薪资之间的某种相关性,以及他是否购买SUV的决定。

    
    User ID Gender Age EstimatedSalary Purchased
    
    0    15624510    Male  19            19000          0
    
    1    15810944    Male  35            20000          0
    
    2    15668575  Female  26            43000          0
    
    3    15603246  Female  27            57000          0
    
    4    15804002    Male  19            76000          0
    
    5    15728773    Male  27            58000          0
    
    6    15598044  Female  27            84000          0
    
    7    15694829  Female  32          150000          1
    
    8    15600575    Male  25            33000          0
    
    9    15727311  Female  35            65000          0
    
    10  15570769  Female  26            80000          0
    
    11  15606274  Female  26            52000          0
    
    12  15746139    Male  20            86000          0
    
    13  15704987    Male  32            18000          0
    
    14  15628972    Male  18            82000          0
    
    15  15697686    Male  29            80000          0
    
    16  15733883    Male  47            25000          1
    
    17  15617482    Male  45            26000          1
    
    18  15704583    Male  46            28000          1
    
    19  15621083  Female  48            29000          1
    
    20  15649487    Male  45            22000          1
    
    21  15736760  Female  47            49000          1
    
    22  15714658    Male  48            41000          1
    
    23  15599081  Female  45            22000          1
    
    24  15705113    Male  46            23000          1
    
    25  15631159    Male  47            20000          1
    
    26  15792818    Male  49            28000          1
    
    27  15633531  Female  47            30000          1
    
    28  15744529    Male  29            43000          0
    
    29  15669656    Male  31            18000          0
    
    ..        ...    ...  ...              ...        ...
    
    370  15611430  Female  60            46000          1
    
    371  15774744    Male  60            83000          1
    
    372  15629885  Female  39            73000          0
    
    373  15708791    Male  59          130000          1
    
    374  15793890  Female  37            80000          0
    
    375  15646091  Female  46            32000          1
    
    376  15596984  Female  46            74000          0
    
    377  15800215  Female  42            53000          0
    
    378  15577806    Male  41            87000          1
    
    379  15749381  Female  58            23000          1
    
    380  15683758    Male  42            64000          0
    
    381  15670615    Male  48            33000          1
    
    382  15715622  Female  44          139000          1
    
    383  15707634    Male  49            28000          1
    
    384  15806901  Female  57            33000          1
    
    385  15775335    Male  56            60000          1
    
    386  15724150  Female  49            39000          1
    
    387  15627220    Male  39            71000          0
    
    388  15672330    Male  47            34000          1
    
    389  15668521  Female  48            35000          1
    
    390  15807837    Male  48            33000          1
    
    391  15592570    Male  47            23000          1
    
    392  15748589  Female  45            45000          1
    
    393  15635893    Male  60            42000          1
    
    394  15757632  Female  39            59000          0
    
    395  15691863  Female  46            41000          1
    
    396  15706071    Male  51            23000          1
    
    397  15654296  Female  50            20000          1
    
    398  15755018    Male  36            33000          0
    
    399  15594041  Female  49            36000          1
    
    [400 rows x 5 columns]
    
    

    所有代码

    import numpy as numpy
    import matplotlib.pyplot as plt
    import pandas as pd
    from sklearn.cross_validation import train_test_split
    from sklearn.preprocessing import StandardScaler
    from sklearn.linear_model import LogisticRegression
    from sklearn.metrics import confusion_matrix
    
    dataset = pd.read_csv('/Users/xiehao/Desktop/100-Days-Of-ML-Code-master/datasets/Social_Network_Ads.csv')
    #数据预处理
    X = dataset.iloc[:, [2, 3]].values
    Y = dataset.iloc[:,4].values
    X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.25, random_state = 0)
    #特征缩放
    sc = StandardScaler()
    X_train = sc.fit_transform(X_train)
    X_test = sc.transform(X_test)
    #将逻辑回归应用于训练集
    classifier = LogisticRegression()
    classifier.fit(X_train, y_train)
    #预测测试集结果
    y_pred = classifier.predict(X_test)
    #生成混淆矩阵
    cm = confusion_matrix(y_test, y_pred)
    

    第一步:数据预处理
    老规矩

    #导入数据集
    dataset = pd.read_csv('Social_Network_Ads.csv')
    X = dataset.iloc[:, [2, 3]].values
    Y = dataset.iloc[:,4].values
    #将数据集分成训练集和测试集,比例是1:4
    X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size = 0.25, random_state = 0)
    #特征缩放
    sc = StandardScaler()
    X_train = sc.fit_transform(X_train)
    X_test = sc.transform(X_test)
    

    第二步:逻辑回归模型
    该项工作的库将会是一个线性模型库,之所以被称为线性是因为逻辑回归是一个线性分类器,这意味着我们在二维空间中,我们两类用户(购买和不购买)将被一条直线分割。然后导入逻辑回归类。下一步我们将创建该类的对象,它将作为我们训练集的分类器。

    #使用 LogisticRegression类中的fit对象
    classifier = LogisticRegression()
    classifier.fit(X_train, y_train)
    

    第三步:预测测试集结果

    y_pred = classifier.predict(X_test)
    >>print(y_pred)
      [0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0
     0 0 1 0 0 0 0 1 0 0 1 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0
     0 0 1 0 1 1 1 1 0 0 1 1 0 1 0 0 0 1 0 0 0 0 0 0 1 1]
    
    

    第四步:评估预测
    我们预测了测试集。 现在我们将评估逻辑回归模型是否正确的学习和理解。因此这个混淆矩阵将包含我们模型的正确和错误的预测。

    cm = confusion_matrix(y_test, y_pred)
    >>print(cm)
      [[65  3]
       [ 8 24]]
    
    

    感谢原作者 Avik-Jain 以及 zhyongquan的汉化

    相关文章

      网友评论

        本文标题:机器学习 Day 5 | 实现 Logistic Regress

        本文链接:https://www.haomeiwen.com/subject/yihebftx.html