ML - hw4

作者: 谢小帅 | 来源:发表于2019-01-21 10:14 被阅读26次

1. Spectral Clustering

(a) Spectral Clustering on synthesis data

(a.1) change threshold

neighbor = 20, threshold = 0.1

  • Almost all the points are labeled as blue Cluster 2. (red < bule)

neighbor = 20, threshold = 0.2

  • Enlarge the threshold to 0.2, the inner circle is labeled as Cluster 1, the outer ring is labeled more red Cluster 1 than blue Cluster 2. (red > bule)
  • We should reduce the threshold.
  • And the result is quite like the k-means result.

neighbor = 20, threshold = 0.15

  • A better result.
  • The threshold is quite depending on experience. If we can't visualize the data directly, we may get some bad results.

Conclusions:

  • If we can't decide the best threshold, we can use "fully connected" to set edge weights so that each weight is larger than 0. Gaussian Kernel Function, like RBF, is usually used to decide the edge weights.
(a.2) compare with K-means
  • K-means is based on the assumption that each cluster is spherical in vector space so that it can deal with data cluster distributed in spherical but can't handle the data from the problem a.
  • Spectral Clustering is based on the graph theory, it can handle the data from the problem a because the data cluster shape in that clustering algorithm is not limited.
  • Both Spectral Clustering and K-means need to decide K first.
  • Spectral Clustering transforms original data into a new low-dimension vector space by fusing original data features, which makes it more stable than K-means.
  • If the graph built by Spectral Clustering is not sparse, or the weight matrix is not sparse, this algorithm will be quite slow.

(b) Spectral Clustering on real-world data

  • when k = 3
Algorithm accuracy mutual info
K-means 0.572 -0.365
Spectral Clustering 0.816 0.371
  • when k = 5 (the largest k constructW supports)
Algorithm accuracy mutual info
K-means 0.535 0.017
Spectral Clustering 0.671 0.661

2. Principal Component Analysis

(a) hack CAPTCHA system

Below are the 5 original and rotated CAPTCHA samples.

  • Using point coordinates as features is very clever.
  • The 1st largest eigenvector is vertical to the long side of the text part because this projecting direction has the largest variance.
  • As the principal components are orthogonal, the 2nd largest eigenvector is exactly parallel to the long side. This character can be used to calculate the rotated angle.

(b) apply PCA on face image

(b.i) Eigenface
  • 40 largest eigenface
  • 40 smallest eigenface (just have a look)
(b.ii) Dimensionality reduction test error
  • KNN, k = 1
  • KNN, k = 3
  • KNN, k = 5

Conclusion:

  • The larger k, the larger test error. That's because of the character of KNN, k = 1 is an overfitting condition, more precise on train data. Maybe the train data and test data are quite similar.
(b.iii) Visualized face

Conclusion:

  • When k is smaller, the dimension reduces more, the result image tends to be smoother. In contrast, the larger k means more detailed infomation is reserved, so the result image tends to be sharper.
(b.iv) Apply LDA on this dataset
  • KNN, k = 1
  • KNN, k = 5
  • KNN, k = 10

Conclusion:

  • LDA has smaller test error than PCA.
  • LDA is supervised learning, the LDA function uses training features and ground truth as parameters, paying more attention to the classification labels.
  • Though both LDA and PCA project data to low dimension, PCA seeks the largest variance while LDA seeks the largest distance among different class data.
  • After dimension reduction, PCA feature dimension can equal the original feature dimension, while LDA feature dimension can only equal the class number of original data.
  • LDA can suffer overfitting too. But it's more precise with supervision.

相关文章

  • ML - hw4

    1. Spectral Clustering (a) Spectral Clustering on synthes...

  • hw4

    深度用户人群是有健身减脂需求的人群,所以主要提供专业的饮食搭配建议,健身训练建议,结合打卡监督等服务让用户产生更多...

  • 链表/super-sub class/stack heap in

    写在开头 今天完成了HW4。本次作业主要考查了: 链表的概念,及其基本操作; 父类(superclass)和子类(...

  • FIS HW4

    1. 一个剩余 N 年到期,每年付息 m 次的债券,票面价值为 F,票面利率为 CR,YTM 为 y,其价格可由以...

  • FTS HW4

    1. 假设 是 AR(1) 过程: , (a) 证明 (b) 求出这个序列用 和 表示的自相关函数 2. 考...

  • 呼吸SU:M 37° FLAELESS完美无瑕水乳霜套装

    套盒内含:水150ml+乳130ml+面霜20ml+魔法精华 30ml 无瑕水 20ml 无瑕乳 20ml 无瑕精...

  • 1月14日作息

    01.05 02.36(20ml) 06.00(30ml) 09.35(30ml) 12.05(25ml) 10....

  • 65 5.1记录拉两次

    吸奶 3:00 130ml 8:30 75ml 11:30 80ml 14:30 90ml 16:55 40ml ...

  • 60 4.26记录

    吸奶 4:00 120ml 9:48 85ml 12:02 50ml 14:47 80ml 17:51 85ml...

  • 50 4.23记录

    吸奶 1:30 70ml 8:25 95ml 10:45 100ml 13:57 80ml 16:30 90ml ...

网友评论

      本文标题:ML - hw4

      本文链接:https://www.haomeiwen.com/subject/ovkidqtx.html