Machine Learning笔记 第08周

作者: 我的名字叫清阳 | 来源:发表于2016-03-15 06:41 被阅读929次

    注意,第七周是考试周,我其实把考试的的内容压缩进了第06周的笔记中去,所以第七周没有单独笔记。但是所有课堂内容没有缺失,特此声明。

    Week 08 tasks:

    • Lectures: unsupervised learning: clustering and expectation maximization.
    • Reading: Chapter 6 of Mitchell, and the supplementary notes on Expectation-Maximization and clustering.

    • Supervised learning: use labeled training data to generate labels to new instances. Can be seen as function approximation.
    • Unsupervised learning: make sense of unlabeled data. Data description, when new instance presents, describe it with the generated data description.

    Basic Clustering Problem

    Basic Clustering Problem
    • definition: given set of object X and inter object distance D(x,y) = d(y,x), we can partition the input and put x and y into the same cluster when PD(x) = PD(y)
    • extreme cases: when PD(x) = 1, then will be only one cluster; when PD(x) = x, then each x is a cluster (each cluster has only one item).
    Quiz 1: Single Linkage Clustering
    • Single Linkage Clustering: first define each object as a cluster, and then find out the closest two points and merge them ( like kNN). The distance function could be customized. Repeat N-k times can get k clusters
    • k is prior knowledge
    • quiz, what two points should be linked together next?
    SLC
    • the clustering process could be represented as a hierarchical tree structure
    • the distance function could be mean, median...
    Quiz 2: Running time of SLC
    • For SLC, we need to look at each pair can find the one has the smallest distance and this is a O(n2) process. And we need to do this about n times. So the answer is O(n3)
    Quiz 3: Problem of SLC
    • SLC could not clustering this problem correctly

    K-means

    K-means clustering
    • randomly initialize k centers
    • each center claims points to cluster the data
    • recompute the center by average the clustered points
    • repeat above until converge.
    • Question: does it always converge?

    K Means in Euclidean Space proof

    K Means in Euclidean Space 1
    • Terms: partition, cluster and center
    quiz 4: K Means as Optimization

    Not sure how to understand the definitions here.

    K-means
    • the error can never go up for partition part and re-centering part of the clustering process
    • given the that the partition/clustering space is finite, K-means will converge in finite time.
    quiz 4: K-means properties
    • it's possible for K-means to stuck at a local optima, but this can be solved by random restart ( like random hill climbing in the optimization)

    Soft Clustering

    Quiz 6:
    • K-means and SLC could not give the point d a definitive cluster
    • Solution: lean on possibility :)
    Soft Clustering
    • here ML is maximum likelihood.
    maximum likelihood Gaussian
    • Hidden variables are used to keep track which Gaussian distribution that x is generated from
    Expectation Maximization
    • there are expectation and maximization for the soft clustering. The probability of Xi in the i cluster is between 0 and 1.
    • soft clustering can be converted to K-means if push the probability to be 0s and 1s.
    EM example
    • given K, randomly generate k centers
    • with each iteration, the centers were recalculated and the probability of the each point is in a cluster is also updated until converge.
    Properties of EM
    • the reason that EM does not converge is because there are infinite number of configurations.
    • Local maxima problem can be solved by random restart
    • E, M might be different for different problems

    Clustering Properties

    Clustering Properties Quiz 7 Impossibility Theorm
    • It's not possible to have all three properties.
    Recap
    2016-03-02 初稿, SLC
    2016-03-14 K-Means
    2016-03-15 EM
    

    相关文章

      网友评论

        本文标题:Machine Learning笔记 第08周

        本文链接:https://www.haomeiwen.com/subject/cvankttx.html