美文网首页
Openclassroom Machine Learning

Openclassroom Machine Learning

作者: 赖子啊 | 来源:发表于2019-07-12 11:55 被阅读0次

    1、背景

    最近在看UFLDL Tutorial的时候,他说“最好有一点机器学习基础(具体而言,熟悉监督学习,逻辑斯特回归以及梯度下降法的思想),如果您不熟悉这些,我们建议您先去机器学习课程中去学习,并完成其中的第II,III,IV章节(即到逻辑斯特回归)。”SO,我便到了这个课程去看了看,并且看完了视频。这个就是网页主页:

    Machine Learning
    其中一共有9章,每一章都有个exercise,用matlab/octave编程练习,前6章有视频。下面就是课程结构:
    |——I. INTRODUCTION
       |——Welcome
       |——What is Machine Learning?
       |——Supervised Learning Introduction
       |——Unsupervised Learning Introduction
       |——Installing Octave
    |——II. LINEAR REGRESSION I
       |——Supervised Learning Introduction
       |——Model Representation
       |——Cost Function
       |——Gradient Descent
       |——Gradient Descent for Linear Regression
       |——Vectorized Implementation
       |——Exercise 2
    |——III. LINEAR REGRESSION II
       |——Feature Scaling
       |——Learning Rate
       |——Features and Polynomial Regression
       |——Normal Equations
       |——Exercise 3
    |——IV. LOGISTIC REGRESSION
       |——Classification
       |——Model
       |——Optimization Objective I
       |——Optimization Objective II
       |——Gradient Descent
       |——Newton's Method I
       |——Newton's Method II
       |——Gradient Descent vs Newton's Method
       |——Exercise 4
    |——V. REGULARIZATION
       |——The Problem Of Overfitting
       |——Optimization Objective
       |——Common Variations
       |——Regularized Linear Regression
       |——Regularized Logistic Regression
       |——Exercise 5
    |——VI. NAIVE BAYES
       |——Generative Learning Algorithms
       |——Text Classification
       |——Exercise 6
    |——VII. 
       |——Exercise 7
    |——VIII. 
       |——Exercise 8
    |——IX. 
       |——Exercise 9
    

    但是因为网页是全英的,视频也没有字幕,所以需要一点英语基础,但吴恩达老师的课里面大都看他写的东西就能够明白讲什么了,非常通俗易懂,形象生动。惟一一点不好就是:这个网站比较早(2010-2012),现在应该是不更新也不维护了。现在想学机器学习,都去coursera的machine learning或者网易云课堂有中英文字幕的吴恩达的机器学习,这两个都是对应的。好了,不多说了,我是照着上面那个学的,没想系统学机器学习,就掌握一点最前面的知识点

    2、正题

    这个是我的pdf版的笔记提取码:n8zn。感兴趣的可以对照着看视频,里面练习的时候也可以参考一下,里面有我的一些理论的推导。

    这门课需要如下基础知识:

    线性代数(矩阵运算)

    高等数学(求导)

    概率论(概率分布)

    matlab编程

    里面Exercise 2 - 5我都自己编通过了,这里是我的代码,练习题还挺好的,不仅有理论提示,还有参考solution解决方案,如果你的结果和他的不一样你就可以debugg,直到完全一样。最好自己先看视频,先编一下再看参考代码。

    2.1、myex2.m

    clear all
    x = load('ex2x.dat');
    y = load('ex2y.dat');
    figure
    plot(x,y,'o');
    xlabel('Age in years'),ylabel('Height in meters');
    m=length(y);
    x=[ones(m,1),x];
    alpha = 0.07;
    theta=[0;0];
    theta_record = theta;
    %%%%%%%%%%%%%%%%%%%%%%%%%
    %这是第一次写的,有点小问题,因为在改变了theta1后就用上改变theta2
    % for iteration = 1:1500
    %     for j=1:length(theta)
    %         temp = 0;
    %         for i=1:m
    %             temp = temp + (theta(1)*x(i,1)+theta(2)*x(i,2)-y(i))*x(i,j);
    %         end
    %         theta(j) = theta(j) - alpha/m*temp;
    %     end
    %     theta_record = [theta_record,theta];
    %     
    % end
    %%%%%%%%%%%%%%%%%%%%%%%%
    for iteration = 1:1500
        temp = zeros(length(theta),1);
        for j=1:length(theta)
            
            for i=1:m
                temp(j) = temp(j) + (theta(1)*x(i,1)+theta(2)*x(i,2)-y(i))*x(i,j);
            end
        end
        for n=1:length(theta)
            theta(n) = theta(n) - alpha/m*temp(n);
        end
        theta_record = [theta_record,theta];
    end
    hold on
    plot(x(:,2),x*theta,'-');
    legend('Training data','Linear Regression')
    theta
    %%%Prediction%%%%
    age1 = [1,3.5];
    age2 = [1,7];
    height_predict_1 = age1 * theta;
    height_predict_2 = age2 * theta;
    fprintf('The boy of age 3.5,his height is %f\n',height_predict_1);
    fprintf('The boy of age 7.0,his height is %f\n',height_predict_2);
    
    theta0_vals = linspace(-3, 3, 100);
    theta1_vals = linspace(-1, 1, 100);
    J_vals = zeros(length(theta0_vals), length(theta1_vals));   % initialize Jvals to 100x100 matrix of 0's
    for i = 1:length(theta0_vals)
          for j = 1:length(theta1_vals)
          t = [theta0_vals(i); theta1_vals(j)];
          J_vals(i,j) = (0.5/m).*(sum((x*t-y).^2));
        end
    end
    
    % Plot the surface plot
    % Because of the way meshgrids work in the surf command, we need to 
    % transpose J_vals before calling surf, or else the axes will be flipped
    J_vals = J_vals';
    figure;
    surf(theta0_vals, theta1_vals, J_vals)
    xlabel('\theta_0'); ylabel('\theta_1')
    % to see the approach to the global optimum more apparent
    figure;
    % Plot the cost function with 15 contours spaced logarithmically
    % between 0.01 and 100
    contour(theta0_vals, theta1_vals, J_vals, logspace(-2, 2, 15));
    xlabel('\theta_0'); ylabel('\theta_1');
    

    运行结果:

    theta =

    0.7502
    0.0639

    The boy of age 3.5,his height is 0.973742
    The boy of age 7.0,his height is 1.197334

    ex2_result.png

    2.2、myex3.m

    这个写的不好,刚开始写的时候没感觉,你们可以改成循环的(之后的几次练习我都贯彻了“简洁”的原则)

    clear all;close all; clc
    x = load('ex3x.dat');
    y = load('ex3y.dat');
    m = length(y);
    x = [ones(m,1),x];xx=x;yy=y;
    % 因为x数据scale不一,对其归一化处理
    sigma = std(x);
    mu = mean(x);
    x(:,2) = (x(:,2) - mu(2))./ sigma(2);
    x(:,3) = (x(:,3) - mu(3))./ sigma(3);
    
    MAX_ITR = 100;
    %% alpha = 0.01 
    theta = zeros(size(x(1,:)))'; % initialize fitting parameters
    alpha = 0.01;   %% My initial learning rate %%
    J = zeros(MAX_ITR, 1); 
    %% alpha = 0.03 
    theta1 = zeros(size(x(1,:)))';
    alpha1 = 0.03;   
    J1 = zeros(MAX_ITR, 1); 
    %% alpha = 0.1 
    theta2 = zeros(size(x(1,:)))';
    alpha2 = 0.1;   
    J2 = zeros(MAX_ITR, 1); 
    %% alpha = 0.3 
    theta3 = zeros(size(x(1,:)))';
    alpha3 = 0.3;   
    J3 = zeros(MAX_ITR, 1); 
    %% alpha = 1 
    theta4 = zeros(size(x(1,:)))';
    alpha4 = 1;   
    J4 = zeros(MAX_ITR, 1); 
    %% alpha = 1.3 
    theta5 = zeros(size(x(1,:)))';
    alpha5 = 1.3;   
    J5 = zeros(MAX_ITR, 1); 
    %% alpha = 1.4 
    theta6 = zeros(size(x(1,:)))';
    alpha6 = 1.4;   
    J6 = zeros(MAX_ITR, 1); 
    
    
    for num_iterations = 1:MAX_ITR
        %% alpha = 0.01 
        J(num_iterations) = (0.5/m).*(x * theta - y)'*(x * theta - y);%% Calculate my cost function  %%
        grad = (1/m).* x' * ((x * theta) - y);
        theta = theta - alpha .* grad; %% Result of gradient descent update %%
        %% alpha = 0.03 
        J1(num_iterations) = (0.5/m).*(x * theta1 - y)'*(x * theta1 - y);
        grad1 = (1/m).* x' * ((x * theta1) - y);
        theta1 = theta1 - alpha1 .* grad1; 
        %% alpha = 0.1 
        J2(num_iterations) = (0.5/m).*(x * theta2 - y)'*(x * theta2 - y);
        grad2 = (1/m).* x' * ((x * theta2) - y);
        theta2 = theta2 - alpha2 .* grad2; 
        %% alpha = 0.3 
        J3(num_iterations) = (0.5/m).*(x * theta3 - y)'*(x * theta3 - y);
        grad3 = (1/m).* x' * ((x * theta3) - y);
        theta3 = theta3 - alpha3 .* grad3; 
        %% alpha = 1 
        J4(num_iterations) = (0.5/m).*(x * theta4 - y)'*(x * theta4 - y);
        grad4 = (1/m).* x' * ((x * theta4) - y);
        theta4 = theta4 - alpha4 .* grad4; 
        %% alpha = 1.3
        J5(num_iterations) = (0.5/m).*(x * theta5 - y)'*(x * theta5 - y);
        grad5 = (1/m).* x' * ((x * theta5) - y);
        theta5 = theta5 - alpha5 .* grad5;
        %% alpha = 1.4 
        J6(num_iterations) = (0.5/m).*(x * theta6 - y)'*(x * theta6 - y);
        grad6 = (1/m).* x' * ((x * theta6) - y);
        theta6 = theta6 - alpha6 .* grad6;
    end
    fprintf('Finally,gradient descent with alpha= %.1f,after %d iterations,get:\n',alpha4,MAX_ITR);
    final_theta = theta4
    pdc_obj = [1,1650,3];
    pdc_obj(2) = (pdc_obj(2) - mu(2))./ sigma(2);
    pdc_obj(3) = (pdc_obj(3) - mu(3))./ sigma(3);
    prediction = pdc_obj * final_theta
    % Using normal equations to calculate theta:
    fprintf('Finally,using normal equations,get:\n');
    NE_theta = inv(xx'*xx)*(xx')*yy;
    NE_theta
    prediction = [1,1650,3] * NE_theta
    % now plot J
    % technically, the first J starts at the zero-eth iteration
    % but Matlab/Octave doesn't have a zero index
    figure;
    plot(0:49, J(1:50), 'b-','LineWidth',2); %% alpha = 0.01
    xlabel('Number of iterations');
    ylabel('Cost J');
    hold on;
    plot(0:49, J1(1:50), 'r-','LineWidth',2); %% alpha = 0.03
    plot(0:49, J2(1:50), 'g-','LineWidth',2); %% alpha = 0.1
    plot(0:49, J3(1:50), 'k-','LineWidth',2); %% alpha = 0.3
    plot(0:49, J4(1:50), 'b--','LineWidth',2); %% alpha = 1
    plot(0:49, J5(1:50), 'r--','LineWidth',2); %% alpha = 1.3
    legend('0.01', '0.03','0.1','0.3','1','1.3');
    hold off;
    figure;
    plot(0:49, J6(1:50), 'r--','LineWidth',2); %% alpha = 1.4
    xlabel('Number of iterations');
    ylabel('Cost J');
    legend('1.4');
    

    运行结果:

    这里不放了,去工作空间看比较准确

    ex3_result.png

    2.3、myex4

    clear all; close all; clc
    % in this code, almost use vectorized implement
    x = load('ex4x.dat'); 
    y = load('ex4y.dat');
    
    m = length(y);
    
    % Add intercept term to x
    x = [ones(m, 1), x];
    
    % find returns the indices of the
    % rows meeting the specified condition
    pos = find(y == 1); neg = find(y == 0);
    
    % Assume the features are in the 2nd and 3rd
    % columns of x
    plot(x(pos, 2), x(pos,3), '+'); hold on
    plot(x(neg, 2), x(neg, 3), 'ro');
    xlabel('Exam 1 score');
    ylabel('Exam 2 score');
    
    
    % To define sigmoid function through an inline expression:
    g = inline('1.0 ./ (1.0 + exp(-z))'); 
    % Usage: To find the value of the sigmoid 
    % evaluated at 2, call g(2),z can be a vector.
    
    MAX_ITR = 7;
    theta = zeros(size(x(1,:)))'; % initialize fitting parameters
    J = zeros(MAX_ITR, 1);
    
    for num_iterations = 1:MAX_ITR
        % calculate coss J, vectorized implement
        G = g(x * theta);
        G1 = 1 - G;
        S = log(G);
        V = log(G1);
        J(num_iterations) = (-1.0/m) .* (y' * S + (1 - y)' * V); % logistic regression cost function J
        
        % update theta
        grad_J = (1/m) .* x' * (G - y); % J gradient
        H = 0; % Hessian matrix initial
        for i = 1:m
            H = H + (1/m) .* G(i) * G1(i) .* (x(i,:)' * x(i,:));
        end
        theta = theta - inv(H) * grad_J; % use Newton's Method to update theta
    end
    theta
    pro = 1-g([1,20,80] * theta);
    fprintf('the probability that a student with a score of 20 on Exam 1 and a score of 80 on Exam 2 will not be admitted is:%.3f\n',pro);
    % plot the decision boundary line
    plot(x(:,2),-((theta(1)*x(:,1)+theta(2)*x(:,2))/(theta(3))));
    xlim([10,70]);ylim([40,100]);
    legend('Admitted','Not admitted','Decision boundary');
    hold off;
    figure;
    plot(0:MAX_ITR-1,J,'b--');hold on;
    plot(0:MAX_ITR-1,J,'r*');
    xlabel('Iteration');ylabel('J');
    

    运行结果:

    theta =

    -16.3787
    0.1483
    0.1589
    the probability that a student with a score of 20 on Exam 1 and a score of 80 on Exam 2 will not be admitted is:0.668

    ex4_result.png

    2.4

    2.4.1、myex5_1.m

    clear all; close all; clc
    
    % Regularization linear regression
    % Using Normal Equations
    x = load('ex5Linx.dat'); 
    y = load('ex5Liny.dat');
    x_nointercept = x;
    m = length(y);
    
    figure
    plot(x_nointercept,y,'ro','MarkerFaceColor','r');hold on;
    
    % Add intercept term to x
    x = [ones(m, 1), x, x.^2, x.^3, x.^4, x.^5];
    n = length(x(1,:));
    lambda = [0,1,10];
    theta_normal = zeros(n,length(lambda));
    norm_theta = zeros(1,length(lambda));
    for i=1:length(lambda)
        theta_normal(:,i) = (x' * x + lambda(i) * diag([0,ones(1,n-1)]))\x' * y;
        norm_theta(i) = norm(theta_normal(:,i));
    end
    theta_normal
    norm_theta
    x_test = linspace(-1,1,50)';
    x_test = [ones(length(x_test), 1), x_test, x_test.^2, x_test.^3, x_test.^4, x_test.^5];
    for i=1:length(lambda)
        plot(x_test(:,2),x_test * theta_normal(:,i),'--');
    end
    hold off;
    legend('Training data','5th order fit,\lambda=0','5th order fit,\lambda=1','5th order fit,\lambda=10');
    

    运行结果:

    theta_normal =

    0.4725 0.3976 0.5205
    0.6814 -0.4207 -0.1825
    -1.3801 0.1296 0.0606
    -5.9777 -0.3975 -0.1482
    2.4417 0.1753 0.0743
    4.7371 -0.3394 -0.1280

    norm_theta =

    8.1687 0.8098 0.5931

    ex5_1_result.png

    2.4.2、myex5_2.m

    clear all; close all; clc
    
    % Regularization logistic regression
    % Using Newton's Method
    x = load('ex5Logx.dat'); 
    y = load('ex5Logy.dat');
    m = length(y);
    x_expand = map_feature(x(:,1),x(:,2));
    
    % Find the indices for the 2 classes
    pos = find(y); neg = find(y == 0);
    
    % plot(x(pos, 1), x(pos, 2), 'k+','LineWidth',1.2)
    % hold on
    % plot(x(neg, 1), x(neg, 2), 'ko','MarkerFaceColor','y')
    % xlabel('u');ylabel('v');legend('y=1','y=0');hold off;
    
    % Newton's Method Iterations
    g = inline('1.0 ./ (1.0 + exp(-z))'); 
    MAX_ITR = 15;
    theta = zeros(size(x_expand(1,:)))'; % initialize fitting parameters
    lambda = [0,1,10];
    J = zeros(MAX_ITR, length(lambda));
    
    for choose_lambda = 1:length(lambda)
        for num_iterations = 1:MAX_ITR
            % Calculate coss J, vectorized implement
            G = g(x_expand * theta);
            G1 = 1 - G;
            S = log(G);
            V = log(G1);
            % Regularized logistic regression cost function J
            % Add the regularization term
            J(num_iterations,choose_lambda) = (-1.0/m) .* (y' * S + (1 - y)' * V) + (lambda(choose_lambda)/(2*m)).* (theta(2:end)' * theta(2:end)); 
    
            % Update theta
            grad_J_before = (1/m) .* x_expand' * (G - y); % J gradient
            extra_theta = [0;(lambda(choose_lambda)/m) .* theta(2:end)];
            grad_J = grad_J_before + extra_theta;
            H = 0; % Hessian matrix initial
            for i = 1:m
                H = H + (1/m) .* G(i) * G1(i) .* (x_expand(i,:)' * x_expand(i,:));
            end
            H = H + (lambda(choose_lambda)/m) .* diag([0,ones(1,length(theta)-1)]);
            theta = theta - H \ grad_J; % use Newton's Method to update theta
    
        end
        norm_theta(choose_lambda) = norm(theta);
        % Plot decision boundary 
        % Define the ranges of the grid
        u = linspace(-1, 1.5, 200);
        v = linspace(-1, 1.5, 200);
    
        % Initialize space for the values to be plotted
        z = zeros(length(u), length(v));
        % Evaluate z = theta*x over the grid
        for i = 1:length(u)
            for j = 1:length(v)
                % Notice the order of j, i here!
                z(j,i) = map_feature(u(i), v(j))*theta;
            end
        end
    
        % Because of the way that contour plotting works
        % in Matlab, we need to transpose z, or
        % else the axis orientation will be flipped!
        z = z';
        % Plot z = 0 by specifying the range [0, 0]
        figure
        plot(x(pos, 1), x(pos, 2), 'k+','LineWidth',1.2)
        hold on
        plot(x(neg, 1), x(neg, 2), 'ko','MarkerFaceColor','y')
        xlabel('u');ylabel('v');
        contour(u,v,z, [0, 0], 'LineWidth', 2)
        legend('y = 1', 'y = 0', 'Decision boundary');
        hold off;
        title(sprintf('\\lambda = %g', lambda(choose_lambda)), 'FontSize', 14);
    end
    J
    norm_theta
    fprintf('Want to see detailed value,go to workspace!\n')
    

    运行结果:


    ex5_2_result.png

    其中我还得再学习的理论方面包括(估计有点难度,就怕找不到合适的资料,谁有资源也可以推荐一下):

    logistic regression的使用
    Normal Equations(包括正则化前后的)
    多元牛顿迭代法的理论支撑(包括正则化前后的)
    

    ---可以到这里看看教程,对前两个问题会有些解释(2019/7/21更新)-----

    有问题的也可以加Q询问

    相关文章

      网友评论

          本文标题:Openclassroom Machine Learning

          本文链接:https://www.haomeiwen.com/subject/vldekctx.html