美文网首页
K-medoid聚类

K-medoid聚类

作者: 微雨旧时歌丶 | 来源:发表于2018-08-16 08:40 被阅读0次

Note there are variants of the K-means algorithm that can work with non-Euclideance distance metrics (such as Levenshtein distance). K-medoids (aka PAM), for instance, can be applied to data with an arbitrary distance metric.

For example, using Pycluster's implementation of k-medoids, and nltk's implementation of Levenshtein distance,

import nltk.metrics.distance as distance
import Pycluster as PC

words = ['apple', 'Doppler', 'applaud', 'append', 'barker', 
         'baker', 'bismark', 'park', 'stake', 'steak', 'teak', 'sleek']

dist = [distance.edit_distance(words[i], words[j]) 
        for i in range(1, len(words))
        for j in range(0, i)]

labels, error, nfound = PC.kmedoids(dist, nclusters=3)
cluster = dict()
for word, label in zip(words, labels):
    cluster.setdefault(label, []).append(word)
for label, grp in cluster.items():
    print(grp)

结果

['apple', 'Doppler', 'applaud', 'append']
['stake', 'steak', 'teak', 'sleek']
['barker', 'baker', 'bismark', 'park']

相关文章

网友评论

      本文标题:K-medoid聚类

      本文链接:https://www.haomeiwen.com/subject/jgfnbftx.html