美文网首页深度学习
2020机器学习决策树(3)

2020机器学习决策树(3)

作者: zidea | 来源:发表于2020-03-19 21:17 被阅读0次
    machine_learning.jpg

    CART

    Loss = \min_{j,s} [\min_{c1} L(y^{(i)},c_1) + \min_{c2} L(y^{(i)},c_2)]\, \tag{1}

    • c_1 表示 R_1 均值
    • j 选择 x 某一个的维度
    • L 是距离函数
    • s 表示选取的分割位置

    \hat{y} = \frac{1}{|R_i|} \sum_{x_i \in R_1} y^{(i)}
    我们在变量 x 的 j 维度上,选取一点 s 将数据在 j 维度上分为两个区域 R_1R_2\hat{y} 表示 x 在 R_i 区域上求 y^{(i)} 平局值。我们选取点要让上面目标函数(1) 最小。\hat{y}y^{(i)} 之间距离可是绝对值或者是方差

    \sum_i (y^{(i)} - \hat{y})^2

    Loss = \min_{j,s} [\min_{c1} \sum_{x \in R_1(j,s)}(y^{(i)} - c_1)^2 + \min_{c2} \sum_{x \in R_2(j,s)} (y^{(i)} - c_2)^2]

    \begin{cases} R_1(j,s) = \{ x | x_j \le s \} \\ R_2(j,s) = \{ x | x_j > s \} \end{cases}

    c_m = \frac{1}{N_m} \sum_{x \in R_m(j,s)} y^{(i)} \, m = 1,2

    \begin{cases} R_1 = \{ 0 \} & c_1 = 0\\ R_2 = \{ 1,2,\dots,9 \} & c_2 = 5 \end{cases}

    import matplotlib.pyplot as plt
    import matplotlib.image as mpimg
    import numpy as np
    import pandas as pd
    import requests
    import os
    from PIL import Image
    import sklearn
    from sklearn.datasets import make_hastie_10_2
    %matplotlib inline
    
    np.random.seed(0)
    X = np.random.normal(0,1,10)
    y = X * 2 + 3 + np.random.normal(0,0.5,10)
    
    plt.scatter(X,y)
    plt.axvline(x=0,color='r',linestyle='dashed')
    
    <matplotlib.lines.Line2D at 0x129d5df50>
    
    output_13_1.png
    # X
    X_copy = X.copy()
    X_copy = X_copy[X_copy<0]
    len = X_copy.size
    X_R_1,X_R_2 = np.split(np.sort(X),[len])
    
    X_R_1,X_R_2
    
    (array([-0.97727788, -0.15135721, -0.10321885]),
     array([0.40015721, 0.4105985 , 0.95008842, 0.97873798, 1.76405235,
            1.86755799, 2.2408932 ]))
    
    y_R_1,y_R_2 = np.split(np.sort(y),[len])
    y_R_1,y_R_2
    
    
    (array([1.2122814 , 2.59470645, 2.95009615]),
     array([3.39414913, 4.52745117, 5.33799483, 5.64721637, 6.60012648,
            6.9570476 , 7.54262391]))
    
    c_1 = 1/len * np.sum(y_R_1)
    c_2 = 1/(10 - len) * np.sum(y_R_2)
    c_1
    
    2.2523613342314497
    
    plt.scatter(X,y)
    plt.axvline(x=0,color='r',linestyle='dashed')
    plt.axhline(y=c_1,color='b',linestyle='dashed')
    plt.axhline(y=c_2,color='b',linestyle='dashed')
    plt.grid()
    
    output_17_0.png
    y_R_2 - c_2
    
    array([-2.32108079, -1.18777876, -0.3772351 , -0.06801356,  0.88489655,
            1.24181767,  1.82739398])
    

    Loss(1.5) = \sum_{i=1}^1 (y_i - c_1)^2 + \sum_{i=1}^9 (y_i - c_2)^2

    L = np.sum((y_R_1 - c_1 )**2) + np.sum((y_R_2 - c_2)**2)
    L
    
    14.29548867988851
    
    plt.scatter(X,y)
    plt.axvline(x=0.55,color='r',linestyle='dashed')
    
    <matplotlib.lines.Line2D at 0x126335fd0>
    
    output_21_1.png
    # X
    X_copy_1 = X.copy()
    X_copy_1 = X_copy_1[X_copy_1<0.55]
    len = X_copy_1.size
    len
    
    5
    
    X_R_1,X_R_2 = np.split(np.sort(X),[len])
    y_R_1,y_R_2 = np.split(np.sort(y),[len])
    c_1 = 1/len * np.sum(y_R_1)
    c_2 = 1/(10 - len) * np.sum(y_R_2)
    
    plt.scatter(X,y)
    plt.axvline(x=0.55,color='r',linestyle='dashed')
    plt.axhline(y=c_1,color='b',linestyle='dashed')
    plt.axhline(y=c_2,color='b',linestyle='dashed')
    plt.grid()
    
    output_23_0.png
    L = np.sum((y_R_1 - c_1 )**2) + np.sum((y_R_2 - c_2)**2)
    
    L
    
    9.179537778528713
    
    
    

    相关文章

      网友评论

        本文标题:2020机器学习决策树(3)

        本文链接:https://www.haomeiwen.com/subject/zsjryhtx.html