美文网首页散文简友广场想法
数据挖掘:基于实例的分类器-最近邻KNN

数据挖掘:基于实例的分类器-最近邻KNN

作者: Cache_wood | 来源:发表于2022-04-13 00:25 被阅读0次

@[toc]

Instance Based Classifiers

Examples:

  • Rote-learner

    Memorizes entire training data and performs classification only if attributes of record match one of the training examples exactly.

  • Nearest neighbor

    Uses k "closest" points(nearest neighbors) for performing classification.

Nearest Neighbor Classifiers

Requires three things

  • The set of stored records.
  • Distance Metric to compute distance between records.
  • The value of k, the number of nearest neighbors to retrieve.

To classify an unknown record

  • Compute distance to other training records.
  • Identify k nearest neighbors.
  • Use class labels of nearest neighbors to determine the class label of unknown record(e.g., taking majority vote)

Definition of Nearest Neighbor

K-nearest neighbors of a record x are data points that have the k smallest distance to x.

Nearest Neighbor Classification

Compute distance between two points:

Euclidean distance :d(p,q) = \sqrt{\sum_i (p_i-q_i)^2}

Determine the class from nearest neighbor list

  • take the majority vote of class labels among the k-nearest neighbors.
  • weigh the vote according to distance(加权) w = 1/d^2

Choosing the value of k:

  • If k is too small, sensitive to noise points.
  • If k is too large, neighborhood may include points from other classes.

Scaling issues

  • Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes.

Problem with Euclidean measure:

  • High dimensional data: curse of dimensionality
  • Can produce counter-intuitive results.
  • Solution: Normalize the vectors to unit length

k-NN classifiers are lazy learners

  • It does not build models explicitly.
  • Unlike eager learners such as decision tree induction and rule-based systems.
  • Classifying unknown records are relatively expensive.

Example: PEBLS

PEBLS: Parallel Examplar-Based Learning System(Cost & Salzberg)

  • Works with both continuous and nominal features.

    • For nominal features, distance between two nominal features.
  • Each record is assigned a weight factor.

  • Number of nearest neighbor, k=1
    d(V_1,V_2) = \sum_i|\frac{n_{1i}}{n_1}-\frac{n_{2i}}{n_2}|

Distance between record X and record Y:
\Delta (X,Y) = w_Xw_Y\sum_{i=1}^d d(X_i,Y_i)^2
where: w_X = Number of times X is used for prediction/(Number of times X predicts correctly)

w_X if X makes accurate prediction most of the time.

w_X >1 if X is not reliable for making predicti

相关文章

  • 数据挖掘:基于实例的分类器-最近邻KNN

    @[toc] Instance Based Classifiers Examples: Rote-learnerM...

  • KNN算法:K最近邻分类算法(K-NearestNeighbor

    一、KNN算法概述 最近邻算法,或者说K最近邻(KNN,K-NearestNeighbor)分类算法是数据挖掘分类...

  • K-近邻

    K-近邻算法,(kNN,k-NearestNeighbor)分类算法是数据挖掘分类技术中最简单的方法之一。所谓K最...

  • 二:K近邻

    简介 K近邻算法,或者说K最近邻(kNN,k- NearestNeighbor)分类算法是数据挖掘分...

  • Unsupervised learner--k-Nearest

    K最近邻(k-Nearest Neighbor,KNN)分类算法 引入背景 最粗暴的分类器,记录所有的训练数据,当...

  • 机器学习十大算法之kNN

      邻近算法,或者说K最近邻(kNN,k-NearestNeighbor)分类算法是数据挖掘分类技术中最简单的方法...

  • 监督学习之最近邻算法

    一、概念理解 K最近邻(kNN,k-NearestNeighbor)分类算法是数据挖掘分类技术中最简单的方法之一。...

  • KNN算法介绍

    一、算法介绍 邻近算法,或者说K最近邻(kNN,k-NearestNeighbor)分类算法是数据挖掘分类技术中最...

  • 1-K近邻

    算法简介 邻近算法,或者说K最近邻(kNN,k-NearestNeighbor)分类算法是数据挖掘[https:/...

  • 邻近算法KNN

    邻近算法 邻近算法,或者说K最近邻(KNN,K-NearestNeighbor)分类算法是数据挖掘[https:/...

网友评论

    本文标题:数据挖掘:基于实例的分类器-最近邻KNN

    本文链接:https://www.haomeiwen.com/subject/tlhusrtx.html