KNN实现

作者: yxwithu | 来源:发表于2017-11-26 15:27 被阅读0次

KNN近邻算法总结
Spark --基于DataFrame API实现KNN算法
KNN 代码
利用Python进行数字识别
KNN的python实现
KNN实现
Spark -- 基于RDD实现 KNN
K近邻(KNN)算法详解及Python实现
机器学习之初识KNN算法——针对泰坦尼克号生存记录建模的两种方法
2020-08-10--KNN01

线性查找最近邻方式：

def arrayKNN(inX, X_train, y_train, k):
    n_samples = X_train.shape[0]
    
    dist_mat = (np.tile(inX, (n_samples, 1)) - X_train) ** 2  #距离矩阵
    dist_sum = np.sum(dist_mat, axis = 1)  #距离之和
    sorted_dist_idx = dist_sum.argsort()   #按数值排序后的原index所在位置
    class_cnt_dict = {}
    for i in range(k):
        vote_class = y_train[sorted_dist_idx[i]]
        class_cnt_dict[vote_class] = class_cnt_dict.get(vote_class, 0) + 1
    sorted_class_cnt =  [(k, class_cnt_dict[k]) for k in sorted(class_cnt_dict, key=class_cnt_dict.get, reverse=True)]  #取其中一个
    return sorted_class_cnt[0][0]

kd tree方式

学习及实现了下面的实现代码
https://github.com/stefankoegl/kdtree/blob/master/kdtree.py
http://blog.csdn.net/winter_evening/article/details/70196080

说一下过程：

构建一个KD数，KD树是对k维空间的一个划分，其每个节点对应于k维空间划分中的一个超矩形区域（不明白为什么需要仔细详细看一下kd树原理）。
用这里的C++代码做一下说明

void innerGetClosest(NODE* pNode, PT point, PT& res, int& nMinDis)  
{  
    if (NULL == pNode)  
        return;  
    int nCurDis = abs(point.x - pNode->pt.x) + abs(point.y - pNode->pt.y);  
    if (nMinDis < 0 || nCurDis < nMinDis)  
    {  
        nMinDis = nCurDis;  
        res = pNode->pt;  
    }  
    if (pNode->splitX && point.x <= pNode->pt.x || !pNode->splitX && point.y <= pNode->pt.y)  
        innerGetClosest(pNode->pLft, point, res, nMinDis);  
    else  
        innerGetClosest(pNode->pRgt, point, res, nMinDis);  
    int rang = pNode->splitX ? abs(point.x - pNode->pt.x) : abs(point.y - pNode->pt.y);  
    if (rang > nMinDis)  
        return;  
    NODE* pGoInto = pNode->pLft;  
    if (pNode->splitX && point.x > pNode->pt.x || !pNode->splitX && point.y > pNode->pt.y)  
        pGoInto = pNode->pRgt;  
    innerGetClosest(pGoInto, point, res, nMinDis);  
}

首先计算当前节点（根节点）的距离
如果划分到左孩子，到左孩子处递归，划分到右孩子到右孩子处递归
亲兄弟节点对应的区域与最近子节点的范围有交叉，有则到亲兄弟节点递归，否则返回