美文网首页
ML: KNN笔记

ML: KNN笔记

作者: ckawyh | 来源:发表于2016-07-17 12:56 被阅读0次

    使用Jupyter notebook

    %matplotlib qt
    import numpy as np
    from sklearn import metrics
    from sklearn.neighbors import KNeighborsClassifier
    
    1. 读取txt数据,最后一列为标签
    data = []
    labels = []
    with open('data\\datingTestSet.txt') as f:
        for line in f:
            tokens = line.strip().split('\t')
            data.append([float(tk) for tk in tokens[:-1]])
            labels.append(tokens[-1])
    

    data[1:10]
    np.unique(labels)
    array(['didntLike', 'largeDoses', 'smallDoses'],
    dtype='|S10')

    1. 处理字符标签为数字标签
    x = np.array(data)
    labels = np.array(labels)
    y = np.zeros(labels.shape)
    y[labels=='didntLike'] = 1
    y[labels=='smallDoses'] = 2
    y[labels=='largeDoses'] = 3
    
    1. 数据未归一化前
    model = KNeighborsClassifier(n_neighbors=3)
    model.fit(x,y)
    print(model)
    expected = y
    predicted = model.predict(x)
    print metrics.classification_report(expected,predicted,target_names=['didntLike','smallDoses','largeDoses'])
    print metrics.confusion_matrix(expected,predicted)
    

    结果:

    KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
    metric_params=None, n_jobs=1, n_neighbors=3, p=2,
    weights='uniform')
    precision recall f1-score support

    didntLike 0.89 0.85 0.87 342
    smallDoses 0.93 0.98 0.96 331
    largeDoses 0.82 0.83 0.82 327

    avg / total 0.88 0.88 0.88 1000

    [[289 0 53]
    [ 1 325 5]
    [ 33 24 270]]

    1. 数据归一化到[0-1范围]
    from sklearn import preprocessing
    min_max_scaler = preprocessing.MinMaxScaler()
    X_train_minmax = min_max_scaler.fit_transform(x)
    X_train_minmax
    array([[ 0.44832535,  0.39805139,  0.56233353],
           [ 0.15873259,  0.34195467,  0.98724416],
           [ 0.28542943,  0.06892523,  0.47449629],
           ..., 
           [ 0.29115949,  0.50910294,  0.51079493],
           [ 0.52711097,  0.43665451,  0.4290048 ],
           [ 0.47940793,  0.3768091 ,  0.78571804]])
    
    1. 拆分训练数据与测试数据
    from sklearn.cross_validation import train_test_split  
    ''''' 拆分训练数据与测试数据 '''  
    x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2)  
    
    1. 归一化后结果
      n_neighbors = 3 K近邻的K取值为3
    x_train, x_test, y_train, y_test = train_test_split(X_train_minmax, y, test_size = 0.2)  
    model = KNeighborsClassifier(n_neighbors=3)
    model.fit(x_train,y_train)
    print(model)
    expected = y_test
    predicted = model.predict(x_test)
    print metrics.classification_report(expected,predicted,target_names=['didntLike','smallDoses','largeDoses'])
    print metrics.confusion_matrix(expected,predicted)
    

    结果:

    KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
    metric_params=None, n_jobs=1, n_neighbors=3, p=2,
    weights='uniform')
    precision recall f1-score support

    didntLike 0.97 1.00 0.99 68
    smallDoses 0.93 1.00 0.96 51
    largeDoses 1.00 0.93 0.96 81

    avg / total 0.97 0.97 0.97 200

    [[68 0 0]
    [ 0 51 0]
    [ 2 4 75]]

    小结:
    归一化后的结果,与归一化前相差很大

    相关文章

      网友评论

          本文标题:ML: KNN笔记

          本文链接:https://www.haomeiwen.com/subject/ylaujttx.html