ML - hw4

作者: 谢小帅 | 来源:发表于2019-01-21 10:14 被阅读26次

ML - hw4
hw4
链表/super-sub class/stack heap in
FIS HW4
FTS HW4
呼吸SU:M 37° FLAELESS完美无瑕水乳霜套装
1月14日作息
65 5.1记录拉两次
60 4.26记录
50 4.23记录

1. Spectral Clustering

(a) Spectral Clustering on synthesis data

(a.1) change threshold

neighbor = 20, threshold = 0.1

Almost all the points are labeled as blue Cluster 2. (red < bule)

neighbor = 20, threshold = 0.2

Enlarge the threshold to 0.2, the inner circle is labeled as Cluster 1, the outer ring is labeled more red Cluster 1 than blue Cluster 2. (red > bule)
We should reduce the threshold.
And the result is quite like the k-means result.

neighbor = 20, threshold = 0.15

A better result.
The threshold is quite depending on experience. If we can't visualize the data directly, we may get some bad results.

Conclusions：

If we can't decide the best threshold, we can use "fully connected" to set edge weights so that each weight is larger than 0. Gaussian Kernel Function, like RBF, is usually used to decide the edge weights.

(a.2) compare with K-means

K-means is based on the assumption that each cluster is spherical in vector space so that it can deal with data cluster distributed in spherical but can't handle the data from the problem a.
Spectral Clustering is based on the graph theory, it can handle the data from the problem a because the data cluster shape in that clustering algorithm is not limited.
Both Spectral Clustering and K-means need to decide K first.
Spectral Clustering transforms original data into a new low-dimension vector space by fusing original data features, which makes it more stable than K-means.
If the graph built by Spectral Clustering is not sparse, or the weight matrix is not sparse, this algorithm will be quite slow.

(b) Spectral Clustering on real-world data

when k = 3

Algorithm	accuracy	mutual info
K-means	0.572	-0.365
Spectral Clustering	0.816	0.371

when k = 5 (the largest k constructW supports)

Algorithm	accuracy	mutual info
K-means	0.535	0.017
Spectral Clustering	0.671	0.661

2. Principal Component Analysis

(a) hack CAPTCHA system

Below are the 5 original and rotated CAPTCHA samples.

Using point coordinates as features is very clever.
The 1st largest eigenvector is vertical to the long side of the text part because this projecting direction has the largest variance.
As the principal components are orthogonal, the 2nd largest eigenvector is exactly parallel to the long side. This character can be used to calculate the rotated angle.

(b) apply PCA on face image

(b.i) Eigenface

40 largest eigenface

40 smallest eigenface (just have a look)

(b.ii) Dimensionality reduction test error

KNN, k = 1

KNN, k = 3

KNN, k = 5

Conclusion：

The larger k, the larger test error. That's because of the character of KNN, k = 1 is an overfitting condition, more precise on train data. Maybe the train data and test data are quite similar.

(b.iii) Visualized face

Conclusion：

When k is smaller, the dimension reduces more, the result image tends to be smoother. In contrast, the larger k means more detailed infomation is reserved, so the result image tends to be sharper.

(b.iv) Apply LDA on this dataset

KNN, k = 1

KNN, k = 5

KNN, k = 10

Conclusion：

LDA has smaller test error than PCA.
LDA is supervised learning, the LDA function uses training features and ground truth as parameters, paying more attention to the classification labels.
Though both LDA and PCA project data to low dimension, PCA seeks the largest variance while LDA seeks the largest distance among different class data.
After dimension reduction, PCA feature dimension can equal the original feature dimension, while LDA feature dimension can only equal the class number of original data.
LDA can suffer overfitting too. But it's more precise with supervision.

网友评论

本文标题：ML - hw4

本文链接：https://www.haomeiwen.com/subject/ovkidqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

ML - hw4

1. Spectral Clustering

(a) Spectral Clustering on synthesis data

(a.1) change threshold

(a.2) compare with K-means

(b) Spectral Clustering on real-world data

2. Principal Component Analysis

(a) hack CAPTCHA system

(b) apply PCA on face image

(b.i) Eigenface

(b.ii) Dimensionality reduction test error

(b.iii) Visualized face

(b.iv) Apply LDA on this dataset

相关文章

ML - hw4

hw4

链表/super-sub class/stack heap in

FIS HW4

FTS HW4

呼吸SU:M 37° FLAELESS完美无瑕水乳霜套装

1月14日作息

65 5.1记录拉两次

60 4.26记录

50 4.23记录

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读