美文网首页
机器学习学习笔记--DBSCAN算法

机器学习学习笔记--DBSCAN算法

作者: 松爱家的小秦 | 来源:发表于2017-12-09 15:22 被阅读0次

    DBSCAN算法是基于密度的聚类算法,与划分和层次聚类方法不同,簇被定义为密度相连的点的最大集合 能够巴足够高密度的区域划分为簇并可以在噪声的空间数据库里发现任意形状的聚类

    print(__doc__)

    import numpy as np

    from sklearn.cluster import DBSCAN

    from sklearn import metrics

    from sklearn.datasets.samples_generator import make_blobs

    from sklearn.preprocessing import StandardScaler

    import matplotlib.pyplot as plt

    def show_dbscan():

    centers = [[1, 1], [-1, -1], [1, -1]]

    X, labels_true = make_blobs(n_samples=750, centers=centers, cluster_std=0.4,

    random_state=0)

    X = StandardScaler().fit_transform(X)

    db = DBSCAN(eps=0.3, min_samples=10).fit(X)

    #DBSCAN() 参数有这些 eps 同一个聚类集合里的两个样本的最大距离 min_samples 同一聚类集合中最小样本的个数

    #agorithm 算法 分为 auto ball_tree kd_tree brute /leaf_size 叶子节点的个数 n_jobs 并发任务数

    core_samples_mask = np.zeros_like(db.labels_, dtype=bool)

    core_samples_mask[db.core_sample_indices_] = True

    labels = db.labels_

    # Number of clusters in labels, ignoring noise if present.

    n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)

    print('Estimated number of clusters: %d' % n_clusters_)

    print("Homogeneity: %0.3f" % metrics.homogeneity_score(labels_true, labels))

    print("Completeness: %0.3f" % metrics.completeness_score(labels_true, labels))

    print("V-measure: %0.3f" % metrics.v_measure_score(labels_true, labels))

    print("Adjusted Rand Index: %0.3f"

    % metrics.adjusted_rand_score(labels_true, labels))

    print("Adjusted Mutual Information: %0.3f"

    % metrics.adjusted_mutual_info_score(labels_true, labels))

    print("Silhouette Coefficient: %0.3f"

    % metrics.silhouette_score(X, labels))

    # Black removed and is used for noise instead.

    unique_labels = set(labels)

    colors = plt.cm.Spectral(np.linspace(0, 1, len(unique_labels)))

    for k, col in zip(unique_labels, colors):

    if k == -1:

    # Black used for noise.

    col = 'k'

    class_member_mask = (labels == k)

    xy = X[class_member_mask & core_samples_mask]

    plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=col,

    markeredgecolor='k', markersize=14)

    xy = X[class_member_mask & ~core_samples_mask]

    plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=col,

    markeredgecolor='k', markersize=6)

    plt.title('Estimated number of clusters: %d' % n_clusters_)

    plt.show()

    if __name__ == '__main__':

    print  "Hello World!"

    show_dbscan()

    输出OUT:

    None

    Hello World!

    Estimated number of clusters: 3

    Homogeneity: 0.953

    Completeness: 0.883

    V-measure: 0.917

    Adjusted Rand Index: 0.952

    Adjusted Mutual Information: 0.883

    Silhouette Coefficient: 0.626

    相关文章

      网友评论

          本文标题:机器学习学习笔记--DBSCAN算法

          本文链接:https://www.haomeiwen.com/subject/zbauixtx.html