美文网首页
K-Means和 PCA

K-Means和 PCA

作者: JaiUnChat | 来源:发表于2017-02-18 01:13 被阅读266次

    1 K-means Clustering

    首先从2D数据入手来得到一个直观的感受。

    1.1 计算均值

    训练集: {x(1),...,x(m)} (where x(i) ∈ Rn)
    已经给出的均值算法

    kMeansInitCentroids.m

    % Initialize centroids
    centroids = kMeansInitCentroids(X, K);
    for iter = 1:iterations
        % Cluster assignment step: Assign each data point to the
        % closest centroid. idx(i) corresponds to cˆ(i), the index
        % of the centroid assigned to example i
        idx = findClosestCentroids(X, centroids);
        % Move centroid step: Compute means based on centroid
        % assignments
        centroids = computeMeans(X, idx, K);
    end
    

    我们要做的是分步实现具体的算法

    1.1.1 找到最近的中心点

    对于每个训练集,用如下算法
    c(i) := j that minimizes ||x(i) − μj||2</suP>.
    c(i)是距离第i个样本x(i)最近的中心点,μj第j个中心点的值(位置)。注意,c(i)对应代码中的idx(i)。

    findClosestCentroids.m

    function idx = findClosestCentroids(X, centroids)
    %FINDCLOSESTCENTROIDS computes the centroid memberships for every example
    %   idx = FINDCLOSESTCENTROIDS (X, centroids) returns the closest centroids
    %   in idx for a dataset X where each row is a single example. idx = m x 1 
    %   vector of centroid assignments (i.e. each entry in range [1..K])
    %
    
    % Set K
    K = size(centroids, 1);
    
    % You need to return the following variables correctly.
    idx = zeros(size(X,1), 1);
    
    % ====================== YOUR CODE HERE ======================
    % Instructions: Go over every example, find its closest centroid, and store
    %               the index inside idx at the appropriate location.
    %               Concretely, idx(i) should contain the index of the centroid
    %               closest to example i. Hence, it should be a value in the 
    %               range 1..K
    %
    % Note: You can use a for-loop over the examples to compute this.
    %
                     
    for i = 1: size(X,1)
        A = X(i,:) - centroids;
        [v,index] = min(diag(A*A'));
        idx(i)=index;
    end
    
    
    
    
    % =============================================================
    
    end
    

    1.1.2 计算中心点均值

    算法的第二阶段就该计算均值了。对于每个中心店有:
    μk:= (1/Ck) Σ x(i) i∈Ck
    具体的来说,若果x(1)和x(3)都是离k=2(第二k点)的那个点最近,那么就需要更新μ2 = 1 /2 x``(x(3) + x(5)) (μ的下标即是第k个中心点)

    computeCentroids.m

    function centroids = computeCentroids(X, idx, K)
    %COMPUTECENTROIDS returns the new centroids by computing the means of the 
    %data points assigned to each centroid.
    %   centroids = COMPUTECENTROIDS(X, idx, K) returns the new centroids by 
    %   computing the means of the data points assigned to each centroid. It is
    %   given a dataset X where each row is a single data point, a vector
    %   idx of centroid assignments (i.e. each entry in range [1..K]) for each
    %   example, and K, the number of centroids. You should return a matrix
    %   centroids, where each row of centroids is the mean of the data points
    %   assigned to it.
    %
    
    % Useful variables
    [m n] = size(X);
    
    % You need to return the following variables correctly.
    centroids = zeros(K, n);
    
    
    % ====================== YOUR CODE HERE ======================
    % Instructions: Go over every centroid and compute mean of all points that
    %               belong to it. Concretely, the row vector centroids(i, :)
    %               should contain the mean of the data points assigned to
    %               centroid i.
    %
    % Note: You can use a for-loop over the centroids to compute this.
    %
    counter = zeros(K,n+1);
    for i = 1:m
         counter(idx(i),:) += [X(i,:) 1];   
    end
    
    
    centroids =  counter(:,1:n)./counter(:,n+1);
    
    
    
    %   =============================================================
    
    
    end
    

    1.3 随机初始化

    不会两次选取同一个k数

    kMeansInitCentroids.m

    % Initialize the centroids to be random examples
    % Randomly reorder the indices of examples
    randidx = randperm(size(X, 1));
    % Take the first K examples as centroids
    centroids = X(randidx(1:K), :); 
    

    1.4 均值压缩图片

    1.4.1 像素均值

    % Load 128x128 color image (bird small.png)
    A = imread('bird small.png');
    % You will need to have installed the image package to used
    % imread. If you do not have the image package installed, you
    % should instead change the following line to
    %
    % load('bird small.mat'); % Loads the image into the variable A
    

    2. 主成分分析法

    2.2 PCA算法实现

    在PCA之前,最好是先均值和正规化数据。

    计算公式:
    Σ= 1/m (X‘X)

    使用[U, S, V] = svd(Sigma) 函数计算。
    其中U是主成分,S是对角矩阵。

    pca.m

    function [U, S] = pca(X)
    %PCA Run principal component analysis on the dataset X
    %   [U, S, X] = pca(X) computes eigenvectors of the covariance matrix of X
    %   Returns the eigenvectors U, the eigenvalues (on diagonal) in S
    %
    
    % Useful values
    [m, n] = size(X);
    
    % You need to return the following variables correctly.
    U = zeros(n);
    S = zeros(n);
    
    % ====================== YOUR CODE HERE ======================
    % Instructions: You should first compute the covariance matrix. Then, you
    %               should use the "svd" function to compute the eigenvectors
    %               and eigenvalues of the covariance matrix. 
    %
    % Note: When computing the covariance matrix, remember to divide by m (the
    %       number of examples).
    %
    
    
    sigma = (1/m).*(X'*X);
    [U,S,V] = svd(sigma);
    
    
    
    
    %   =========================================================================
    
    end
    

    2.3.1 将数据投影到主成分

    projectData.m

    function Z = projectData(X, U, K)
    %PROJECTDATA Computes the reduced data representation when projecting only 
    %on to the top k eigenvectors
    %   Z = projectData(X, U, K) computes the projection of 
    %   the normalized inputs X into the reduced dimensional space spanned by
    %   the first K columns of U. It returns the projected examples in Z.
    %
    
    % You need to return the following variables correctly.
    Z = zeros(size(X, 1), K);
    
    % ====================== YOUR CODE HERE ======================
    % Instructions: Compute the projection of the data using only the top K 
    %               eigenvectors in U (first K columns). 
    %               For the i-th example X(i,:), the projection on to the k-th 
    %               eigenvector is given as follows:
    %                    x = X(i, :)';
    %                    projection_k = x' * U(:, k);
    %
    
    
    Z = X * U(:, K);
    
    
    % =============================================================
    
    end
    

    2.3.2 复原数据

    recoverData

    function X_rec = recoverData(Z, U, K)
    %RECOVERDATA Recovers an approximation of the original data when using the 
    %projected data
    %   X_rec = RECOVERDATA(Z, U, K) recovers an approximation the 
    %   original data that has been reduced to K dimensions. It returns the
    %   approximate reconstruction in X_rec.
    %
    
    % You need to return the following variables correctly.
    X_rec = zeros(size(Z, 1), size(U, 1));
    
    % ====================== YOUR CODE HERE ======================
    % Instructions: Compute the approximation of the data by projecting back
    %               onto the original space using the top K eigenvectors in U.
    %
    %               For the i-th example Z(i,:), the (approximate)
    %               recovered data for dimension j is given as follows:
    %                    v = Z(i, :)';
    %                    recovered_j = v' * U(j, 1:K)';
    %
    %               Notice that U(j, 1:K) is a row  vector.
    %               
    X_rec = Z*U(:,1:K)';
    
    
    % =============================================================
    
    end
    

    2.4 面部图像数据

    2.4.1 面部主成分分析

    在实例数据中,对面部图像运行PCA


    图像与数学分析一致,显示出了特征

    相关文章

      网友评论

          本文标题:K-Means和 PCA

          本文链接:https://www.haomeiwen.com/subject/falrwttx.html