美文网首页大数据
Cluster Analysis with Iris Datas

Cluster Analysis with Iris Datas

作者: 不连续小姐 | 来源:发表于2018-12-19 23:18 被阅读0次

    Data Science Day 19:

    In Supervised Learning, we specify the possible categorical values and train the models for pattern recognition. However, *what if we don't have the existing classified data model to learn from? *

    [caption id="attachment_1074" align="alignnone" width="750"] image

    Radfotosonn / Pixabay[/caption]

    The case we model the data in order to discover the way it clusters, based on certain attributes is Unsupervised Learning.

    Clustering Analysis in one of the Unsupervised Techniques, it rather than learning by example, learn by observation.

    There are 3 types of clustering methods in general, Partitioning, Hierarchical, and Density-based clustering.

    1.Partitioning: n objects is grouped into k ≤ n disjoint clusters.
    Partitioning methods are based on a distance measure, it applies iterative relocation until some distance-based error metric is minimized.

    2.Hierarchical: either combining(agglomerative) or splitting(divisive) cluster based on some measure (distance, density or continuity), in a stepwise fashion.

    Agglomerative starts with each point in its own cluster and combine them in steps, and divisive starts with the data in one cluster and divide it up

    3. The density-based method is based on its density; it measures the cluster "goodness".

    Example with Iris Dataset

    1. Partitioning: K-Means=3


      image
    #Iris dataset
    iris=datasets.load_iris()
    x=iris.data
    y=iris.target
    
    #Plotting
    fig = plt.figure(1, figsize=(7,7))
    ax = Axes3D(fig, rect=[0, 0, 0.95, 1], elev=48, azim=134)
    ax.scatter(x[:, 3], x[:, 0], x[:, 2],
              c=labels.astype(np.float), edgecolor="k", s=50)
    ax.set_xlabel("Petal width")
    ax.set_ylabel("Sepal length")
    ax.set_zlabel("Petal length")
    plt.title("Iris Clustering K Means=3", fontsize=14)
    plt.show()
    
      2.   **Hierarchical **
    
    image
    #Hierachy Clustering 
    hier=linkage(x,"ward")
    max_d=7.08
    plt.figure(figsize=(25,10))
    plt.title('Iris Hierarchical Clustering Dendrogram')
    plt.xlabel('Species')
    plt.ylabel('distance')
    dendrogram(
        hier,
        truncate_mode='lastp',  
        p=50,                  
        leaf_rotation=90.,      
        leaf_font_size=8.,     
    )
    plt.axhline(y=max_d, c='k')
    plt.show()
    
     3. **Density-based method DBSCAN**
    
    image
    dbscan=DBSCAN()
    dbscan.fit(x)
    pca=PCA(n_components=2).fit(x)
    pca_2d=pca.transform(x)
    
    for i in range(0, pca_2d.shape[0]):
        if dbscan.labels_[i] == 0:
            c1 = plt.scatter(pca_2d[i, 0], pca_2d[i, 1], c='r', marker='+')
        elif dbscan.labels_[i] == 1:
            c2 = plt.scatter(pca_2d[i, 0], pca_2d[i, 1], c='g', marker='o')
        elif dbscan.labels_[i] == -1:
            c3 = plt.scatter(pca_2d[i, 0], pca_2d[i, 1], c='b', marker='*')
    
    plt.legend([c1, c2, c3], ['Cluster 1', 'Cluster 2', 'Noise'])
    plt.title('DBSCAN finds 2 clusters and Noise')
    plt.show()
    

    Thanks very much to Dr.Rumbaugh's clustering analysis notes!

    Happy studying! 😊

    相关文章

      网友评论

        本文标题:Cluster Analysis with Iris Datas

        本文链接:https://www.haomeiwen.com/subject/phxekqtx.html