13. Clustering

作者: 玄语梨落 | 来源:发表于2020-08-21 08:10 被阅读0次

    Clustering

    Unsupervised learing introduction

    K-means algorithm

    Input:

    • K (number of clusters)
    • Training set \{x^{(1)},x^{(2)},...,x^{(n)}\}\quad x^{(i)}\in R^{n}

    K-means algroithm

    Randomly initialize K cluster centroids \mu_1,\mu_2,...,\mu_K \in R^{n}

    Repeat {
    for i = 1 to m
    c^{(i)}:=index (from 1 to K) of cluster centroid closest to x^{(i)}
    for k=1 to K
    \mu_K:=average (mean) of points assigned to cluster k
    }

    K-means for non-separated clusters

    Optimization Objective

    K-means optimization objective

    c^{(i)} = index of cluster (1,2,...,K) to which example x^{(i)} is currently assigned.
    \mu_k = cluster centriod k (\mu_k\in R^{n})
    \mu_c^{(i)} = cluster centroid of cluster to which example x^{(i)} has been assigned

    Distortion:
    J(c^{(i)},...,c^{(m)},\mu_1,...,\mu_k) = \frac{1}{m}\sum_{i=1}^m||x^{(i)}=\mu_c^{(i)}||^2

    Random Initalization

    • K < m
    • Randomly pick K trianing examples.
    • Set \mu_1,...,\mu_k equal to these K examples.

    Random initialization many times:

    For i=1 to 100 {
    Randomly initialize K-means.
    Run K-means. Get c^{(1)},...,c^{(m)},\mu_1,...,\mu_K
    Compute cost function (distortion)
    J(c^{(1)},...,c^{(m)},\mu_1,...,\mu_K)
    }

    Pick clustering that gave lowest cost J.

    K=2-10, if K is very large, Randomly initilization many times may help less.

    Choosing the number of clusters

    • by hand (see the data set's figure)
    • Elbow method: compute cost J form 1-..., and draw a figure of it. choose the number at the elbow.
    • Evaluate K-means based on a metric for how well it performs for that later purpose. Choose a K that serve the purpose.

    相关文章

      网友评论

        本文标题:13. Clustering

        本文链接:https://www.haomeiwen.com/subject/zoxgdktx.html