编程作业(七)

作者: SmallRookie | 来源:发表于2017-10-07 19:43 被阅读94次

    K均值算法与主成分分析算法

    K均值分析算法

    在本部分练习中,你将实现K均值算法并将该算法用于图像压缩。最初,你通过使用2D数据集帮助你理解K均值算法。在此之后,你将使用K均值算法以减少颜色数量的方式压缩图像。

    任务一 实现K均值算法

    K均值算法是一种自动将相似数据聚类在一起的方法,其基本代码如下:

    % Initialize centroids
    centroids = kMeansInitCentroids(X, K);
    for iter = 1:iterations
        % Cluster assignment step: Assign each data point to the
        % closest centroid. idx(i) corresponds to cˆ(i), the index
        % of the centroid assigned to example i
        idx = findClosestCentroids(X, centroids);
    
        % Move centroid step: Compute means based on centroid
        % assignments
        centroids = computeMeans(X, idx, K);
    end
    

    上述代码中,循环分为了两步:1)将样本数据x(i)聚类至最近的聚类中心;2)计算每一聚类的均值再将样本数据重新分配。

    Step 1:寻找最近的聚类中心

    根据上述公式,我们在findClosestCentroids.m文件中添加如下代码:

    for i = 1 : length(idx)
    
        T = [];
        for j = 1 : K
            T = [T; X(i, :)];
        end
        [t, idx(i)] = min(sum((T - centroids) .^ 2, 2));  % sum(A, 2)表示对矩阵A按列求和
    end
    

    Step 2:计算每一聚类的均值

    根据上述公式,我们在computeCentroids.m文件中添加如下代码:

    for k = 1 : K
    
        T = find(idx == k);
        centroids(k, :) = sum(X(T, :)) / size(T, 1);   % size(A, 1)表示求矩阵A的行数
    end
    

    最后,我们运行该任务部分代码,其中ex7.m文件中的代码如下:

    %% ================= Part 1: Find Closest Centroids ====================
    %  To help you implement K-Means, we have divided the learning algorithm 
    %  into two functions -- findClosestCentroids and computeCentroids. In this
    %  part, you should complete the code in the findClosestCentroids function. 
    %
    fprintf('Finding closest centroids.\n\n');
    
    % Load an example dataset that we will be using
    load('ex7data2.mat');
    
    % Select an initial set of centroids
    K = 3; % 3 Centroids
    initial_centroids = [3 3; 6 2; 8 5];
    
    % Find the closest centroids for the examples using the
    % initial_centroids
    idx = [];
    idx = findClosestCentroids(X, initial_centroids);
    
    fprintf('Closest centroids for the first 3 examples: \n')
    fprintf(' %d', idx(1:3));
    fprintf('\n(the closest centroids should be 1, 3, 2 respectively)\n');
    
    fprintf('Program paused. Press enter to continue.\n');
    pause;
    

    运行结果为:

    任务二 随机初始化(该部分任务不需提交)

    在实际应用中,我们更为推荐对训练集进行随机初始化操作。我们在kMeansInitCentroids.m文件中添加如下代码:

    % Initialize the centroids to be random examples
    
    % Randomly reorder the indices of examples
    randidx = randperm(size(X, 1));
    % Take the first K examples as centroids
    centroids = X(randidx(1:K), :);
    

    上述代码通过使用randperm()函数随机排列了样本数据的索引,然后其基于索引的随机排列选择最初的K个样本数据。这种方式避免了出现两次随机选择相同样本数据的问题。

    任务三 使用K均值算法对图像压缩(该部分任务不需提交)

    128*128的原始图片

    在24位彩色图像中,每个像素表示为红色、绿色和蓝色强度值的三个8位无符号整数(其取值范围为0~255)。我们的图像包括数千种颜色,在这部分练习中,你需将颜色的数量减少至16种。通过这种方式,可以有效地压缩图像。在该方式下,你只需存储16个所选颜色地RGB值,并且对于图像中地每个像素,你只需存储该位置地颜色索引即可。

    在本练习中,你将使用K均值算法来选择用于表示压缩图像的16中颜色。具体来说,你应将原始图像中的每个像素视为样本数据,通过使用K均值算法找出最佳的16种颜色。

    在Octave或 Matlab中,我们使用如下代码来导入图像:

    % Load 128x128 color image (bird small.png)
    A = imread('bird small.png');
    
    % You will need to have installed the image package to used
    % imread. If you do not have the image package installed, you
    % should instead change the following line to
    %
    % load('bird small.mat'); % Loads the image into the variable A
    

    这创建了一个三维矩阵A,其前两个索引标识像素位置,最后一个索引表示红色、绿色或蓝色。例如:A(50, 33, 3)表示行50和列33处像素的蓝色强度。

    该部分代码如下:

    %% ============= Part 4: K-Means Clustering on Pixels ===============
    %  In this exercise, you will use K-Means to compress an image. To do this,
    %  you will first run K-Means on the colors of the pixels in the image and
    %  then you will map each pixel onto its closest centroid.
    %  
    %  You should now complete the code in kMeansInitCentroids.m
    %
    
    fprintf('\nRunning K-Means clustering on pixels from an image.\n\n');
    
    %  Load an image of a bird
    A = double(imread('bird_small.png'));
    
    % If imread does not work for you, you can try instead
    %   load ('bird_small.mat');
    
    A = A / 255; % Divide by 255 so that all values are in the range 0 - 1
    
    % Size of the image
    img_size = size(A);
    
    % Reshape the image into an Nx3 matrix where N = number of pixels.
    % Each row will contain the Red, Green and Blue pixel values
    % This gives us our dataset matrix X that we will use K-Means on.
    X = reshape(A, img_size(1) * img_size(2), 3);
    
    % Run your K-Means algorithm on this data
    % You should try different values of K and max_iters here
    K = 16; 
    max_iters = 10;
    
    % When using K-Means, it is important the initialize the centroids
    % randomly. 
    % You should complete the code in kMeansInitCentroids.m before proceeding
    initial_centroids = kMeansInitCentroids(X, K);
    
    % Run K-Means
    [centroids, idx] = runkMeans(X, initial_centroids, max_iters);
    
    fprintf('Program paused. Press enter to continue.\n');
    pause;
    
    
    %% ================= Part 5: Image Compression ======================
    %  In this part of the exercise, you will use the clusters of K-Means to
    %  compress an image. To do this, we first find the closest clusters for
    %  each example. After that, we 
    
    fprintf('\nApplying K-Means to compress an image.\n\n');
    
    % Find closest cluster members
    idx = findClosestCentroids(X, centroids);
    
    % Essentially, now we have represented the image X as in terms of the
    % indices in idx. 
    
    % We can now recover the image from the indices (idx) by mapping each pixel
    % (specified by its index in idx) to the centroid value
    X_recovered = centroids(idx,:);
    
    % Reshape the recovered image into proper dimensions
    X_recovered = reshape(X_recovered, img_size(1), img_size(2), 3);
    
    % Display the original image 
    subplot(1, 2, 1);
    imagesc(A); 
    title('Original');
    
    % Display compressed image side by side
    subplot(1, 2, 2);
    imagesc(X_recovered)
    title(sprintf('Compressed, with %d colors.', K));
    
    
    fprintf('Program paused. Press enter to continue.\n');
    pause;
    

    运行结果为:

    主成分分析算法

    在部分练习中,你将使用主成分分析算法实现降维。最初,你将在2D数据集上了解PCA算法,然后在一个大数据集上使用PCA算法。

    任务一 可视化数据集

    该部分代码如下:

    %% ================== Part 1: Load Example Dataset  ===================
    %  We start this exercise by using a small dataset that is easily to
    %  visualize
    %
    fprintf('Visualizing example dataset for PCA.\n\n');
    
    %  The following command loads the dataset. You should now have the 
    %  variable X in your environment
    load ('ex7data1.mat');
    
    %  Visualize the example dataset
    plot(X(:, 1), X(:, 2), 'bo');
    axis([0.5 6.5 2 8]); axis square;
    
    fprintf('Program paused. Press enter to continue.\n');
    pause;
    

    运行结果为:

    任务二 实现PCA算法

    PCA算法由两部分组成:首先计算数据的协方差矩阵,然后使用Octave或Matlab的svd()函数来计算特征向量U1, ···, Un

    不过在使用PCA算法之前,我们应当对训练集进行均值归一化操作,其具体实现代码可在featureNormalize.m文件中查看。


    求取协方差矩阵的数学公式为:

    svd()函数为:[U, S, V] = svd(Sigma)


    我们在pca.m文件中添加如下代码:

    Sigma = X' * X / m;
    [U, S, V] = svd(Sigma);
    

    该部分在ex7_pac.m文件的代码如下:

    %% =============== Part 2: Principal Component Analysis ===============
    %  You should now implement PCA, a dimension reduction technique. You
    %  should complete the code in pca.m
    %
    fprintf('\nRunning PCA on example dataset.\n\n');
    
    %  Before running PCA, it is important to first normalize X
    [X_norm, mu, sigma] = featureNormalize(X);
    
    %  Run PCA
    [U, S] = pca(X_norm);
    
    %  Compute mu, the mean of the each feature
    
    %  Draw the eigenvectors centered at mean of data. These lines show the
    %  directions of maximum variations in the dataset.
    hold on;
    drawLine(mu, mu + 1.5 * S(1,1) * U(:,1)', '-k', 'LineWidth', 2);
    drawLine(mu, mu + 1.5 * S(2,2) * U(:,2)', '-k', 'LineWidth', 2);
    hold off;
    
    fprintf('Top eigenvector: \n');
    fprintf(' U(:,1) = %f %f \n', U(1,1), U(2,1));
    fprintf('\n(you should expect to see -0.707107 -0.707107)\n');
    
    fprintf('Program paused. Press enter to continue.\n');
    pause;
    

    运行结果为:

    任务三 使用PCA算法降维

    这部分你需要将数据集X中的每个数据投影到主成分矩阵U中的前K个分量上。projectData.m文件中的代码如下:

    U_reduce = U(:, 1 : K);
    Z = X * U_reduce;
    

    任务四 重建数据的近似值

    recoverData.m添加的代码如下:

    U_reduce = U(:, 1 : K);
    X_rec = Z * U_reduce';
    

    任务三、四部分的运行结果如下:

    本次编程作业需要提交的部分到此结束。文档中的后续任务,此处不再叙述。请自行查阅和动手实践,谢谢!

    相关文章

      网友评论

        本文标题:编程作业(七)

        本文链接:https://www.haomeiwen.com/subject/qzffyxtx.html