Openclassroom Machine Learning

作者: 赖子啊 | 来源:发表于2019-07-12 11:55 被阅读0次

Openclassroom Machine Learning
Machine learning booooks
00 Machine Learning Introduction
【ML】Machine learning model
Machine Learning @ Python
The Fundamentals of Machine Lear
Coursera.MachineLearning.Week10
周志华推荐阅读材料
Al:what is Al
目录

1、背景

最近在看UFLDL Tutorial的时候，他说“最好有一点机器学习基础（具体而言，熟悉监督学习，逻辑斯特回归以及梯度下降法的思想），如果您不熟悉这些，我们建议您先去机器学习课程中去学习，并完成其中的第II，III，IV章节（即到逻辑斯特回归）。”SO，我便到了这个课程去看了看，并且看完了视频。这个就是网页主页：

Machine Learning
其中一共有9章，每一章都有个exercise，用matlab/octave编程练习，前6章有视频。下面就是课程结构：

|——I. INTRODUCTION
   |——Welcome
   |——What is Machine Learning?
   |——Supervised Learning Introduction
   |——Unsupervised Learning Introduction
   |——Installing Octave
|——II. LINEAR REGRESSION I
   |——Supervised Learning Introduction
   |——Model Representation
   |——Cost Function
   |——Gradient Descent
   |——Gradient Descent for Linear Regression
   |——Vectorized Implementation
   |——Exercise 2
|——III. LINEAR REGRESSION II
   |——Feature Scaling
   |——Learning Rate
   |——Features and Polynomial Regression
   |——Normal Equations
   |——Exercise 3
|——IV. LOGISTIC REGRESSION
   |——Classification
   |——Model
   |——Optimization Objective I
   |——Optimization Objective II
   |——Gradient Descent
   |——Newton's Method I
   |——Newton's Method II
   |——Gradient Descent vs Newton's Method
   |——Exercise 4
|——V. REGULARIZATION
   |——The Problem Of Overfitting
   |——Optimization Objective
   |——Common Variations
   |——Regularized Linear Regression
   |——Regularized Logistic Regression
   |——Exercise 5
|——VI. NAIVE BAYES
   |——Generative Learning Algorithms
   |——Text Classification
   |——Exercise 6
|——VII. 
   |——Exercise 7
|——VIII. 
   |——Exercise 8
|——IX. 
   |——Exercise 9

但是因为网页是全英的，视频也没有字幕，所以需要一点英语基础，但吴恩达老师的课里面大都看他写的东西就能够明白讲什么了，非常通俗易懂，形象生动。惟一一点不好就是：这个网站比较早（2010-2012），现在应该是不更新也不维护了。现在想学机器学习，都去coursera的machine learning或者网易云课堂有中英文字幕的吴恩达的机器学习，这两个都是对应的。好了，不多说了，我是照着上面那个学的，没想系统学机器学习，就掌握一点最前面的知识点。

2、正题

这个是我的pdf版的笔记，提取码：n8zn。感兴趣的可以对照着看视频，里面练习的时候也可以参考一下，里面有我的一些理论的推导。

这门课需要如下基础知识：

线性代数（矩阵运算）

高等数学（求导）

概率论（概率分布）

matlab编程

里面Exercise 2 - 5我都自己编通过了，这里是我的代码，练习题还挺好的，不仅有理论提示，还有参考solution解决方案，如果你的结果和他的不一样你就可以debugg，直到完全一样。最好自己先看视频，先编一下再看参考代码。

2.1、myex2.m

clear all
x = load('ex2x.dat');
y = load('ex2y.dat');
figure
plot(x,y,'o');
xlabel('Age in years'),ylabel('Height in meters');
m=length(y);
x=[ones(m,1),x];
alpha = 0.07;
theta=[0;0];
theta_record = theta;
%%%%%%%%%%%%%%%%%%%%%%%%%
%这是第一次写的，有点小问题，因为在改变了theta1后就用上改变theta2
% for iteration = 1:1500
%     for j=1:length(theta)
%         temp = 0;
%         for i=1:m
%             temp = temp + (theta(1)*x(i,1)+theta(2)*x(i,2)-y(i))*x(i,j);
%         end
%         theta(j) = theta(j) - alpha/m*temp;
%     end
%     theta_record = [theta_record,theta];
%     
% end
%%%%%%%%%%%%%%%%%%%%%%%%
for iteration = 1:1500
    temp = zeros(length(theta),1);
    for j=1:length(theta)
        
        for i=1:m
            temp(j) = temp(j) + (theta(1)*x(i,1)+theta(2)*x(i,2)-y(i))*x(i,j);
        end
    end
    for n=1:length(theta)
        theta(n) = theta(n) - alpha/m*temp(n);
    end
    theta_record = [theta_record,theta];
end
hold on
plot(x(:,2),x*theta,'-');
legend('Training data','Linear Regression')
theta
%%%Prediction%%%%
age1 = [1,3.5];
age2 = [1,7];
height_predict_1 = age1 * theta;
height_predict_2 = age2 * theta;
fprintf('The boy of age 3.5,his height is %f\n',height_predict_1);
fprintf('The boy of age 7.0,his height is %f\n',height_predict_2);

theta0_vals = linspace(-3, 3, 100);
theta1_vals = linspace(-1, 1, 100);
J_vals = zeros(length(theta0_vals), length(theta1_vals));   % initialize Jvals to 100x100 matrix of 0's
for i = 1:length(theta0_vals)
      for j = 1:length(theta1_vals)
      t = [theta0_vals(i); theta1_vals(j)];
      J_vals(i,j) = (0.5/m).*(sum((x*t-y).^2));
    end
end

% Plot the surface plot
% Because of the way meshgrids work in the surf command, we need to 
% transpose J_vals before calling surf, or else the axes will be flipped
J_vals = J_vals';
figure;
surf(theta0_vals, theta1_vals, J_vals)
xlabel('\theta_0'); ylabel('\theta_1')
% to see the approach to the global optimum more apparent
figure;
% Plot the cost function with 15 contours spaced logarithmically
% between 0.01 and 100
contour(theta0_vals, theta1_vals, J_vals, logspace(-2, 2, 15));
xlabel('\theta_0'); ylabel('\theta_1');

运行结果：

theta =

0.7502
0.0639

The boy of age 3.5,his height is 0.973742
The boy of age 7.0,his height is 1.197334

ex2_result.png

2.2、myex3.m

这个写的不好，刚开始写的时候没感觉，你们可以改成循环的（之后的几次练习我都贯彻了“简洁”的原则）

clear all;close all; clc
x = load('ex3x.dat');
y = load('ex3y.dat');
m = length(y);
x = [ones(m,1),x];xx=x;yy=y;
% 因为x数据scale不一，对其归一化处理
sigma = std(x);
mu = mean(x);
x(:,2) = (x(:,2) - mu(2))./ sigma(2);
x(:,3) = (x(:,3) - mu(3))./ sigma(3);

MAX_ITR = 100;
%% alpha = 0.01 
theta = zeros(size(x(1,:)))'; % initialize fitting parameters
alpha = 0.01;   %% My initial learning rate %%
J = zeros(MAX_ITR, 1); 
%% alpha = 0.03 
theta1 = zeros(size(x(1,:)))';
alpha1 = 0.03;   
J1 = zeros(MAX_ITR, 1); 
%% alpha = 0.1 
theta2 = zeros(size(x(1,:)))';
alpha2 = 0.1;   
J2 = zeros(MAX_ITR, 1); 
%% alpha = 0.3 
theta3 = zeros(size(x(1,:)))';
alpha3 = 0.3;   
J3 = zeros(MAX_ITR, 1); 
%% alpha = 1 
theta4 = zeros(size(x(1,:)))';
alpha4 = 1;   
J4 = zeros(MAX_ITR, 1); 
%% alpha = 1.3 
theta5 = zeros(size(x(1,:)))';
alpha5 = 1.3;   
J5 = zeros(MAX_ITR, 1); 
%% alpha = 1.4 
theta6 = zeros(size(x(1,:)))';
alpha6 = 1.4;   
J6 = zeros(MAX_ITR, 1); 


for num_iterations = 1:MAX_ITR
    %% alpha = 0.01 
    J(num_iterations) = (0.5/m).*(x * theta - y)'*(x * theta - y);%% Calculate my cost function  %%
    grad = (1/m).* x' * ((x * theta) - y);
    theta = theta - alpha .* grad; %% Result of gradient descent update %%
    %% alpha = 0.03 
    J1(num_iterations) = (0.5/m).*(x * theta1 - y)'*(x * theta1 - y);
    grad1 = (1/m).* x' * ((x * theta1) - y);
    theta1 = theta1 - alpha1 .* grad1; 
    %% alpha = 0.1 
    J2(num_iterations) = (0.5/m).*(x * theta2 - y)'*(x * theta2 - y);
    grad2 = (1/m).* x' * ((x * theta2) - y);
    theta2 = theta2 - alpha2 .* grad2; 
    %% alpha = 0.3 
    J3(num_iterations) = (0.5/m).*(x * theta3 - y)'*(x * theta3 - y);
    grad3 = (1/m).* x' * ((x * theta3) - y);
    theta3 = theta3 - alpha3 .* grad3; 
    %% alpha = 1 
    J4(num_iterations) = (0.5/m).*(x * theta4 - y)'*(x * theta4 - y);
    grad4 = (1/m).* x' * ((x * theta4) - y);
    theta4 = theta4 - alpha4 .* grad4; 
    %% alpha = 1.3
    J5(num_iterations) = (0.5/m).*(x * theta5 - y)'*(x * theta5 - y);
    grad5 = (1/m).* x' * ((x * theta5) - y);
    theta5 = theta5 - alpha5 .* grad5;
    %% alpha = 1.4 
    J6(num_iterations) = (0.5/m).*(x * theta6 - y)'*(x * theta6 - y);
    grad6 = (1/m).* x' * ((x * theta6) - y);
    theta6 = theta6 - alpha6 .* grad6;
end
fprintf('Finally,gradient descent with alpha= %.1f,after %d iterations,get:\n',alpha4,MAX_ITR);
final_theta = theta4
pdc_obj = [1,1650,3];
pdc_obj(2) = (pdc_obj(2) - mu(2))./ sigma(2);
pdc_obj(3) = (pdc_obj(3) - mu(3))./ sigma(3);
prediction = pdc_obj * final_theta
% Using normal equations to calculate theta:
fprintf('Finally,using normal equations,get:\n');
NE_theta = inv(xx'*xx)*(xx')*yy;
NE_theta
prediction = [1,1650,3] * NE_theta
% now plot J
% technically, the first J starts at the zero-eth iteration
% but Matlab/Octave doesn't have a zero index
figure;
plot(0:49, J(1:50), 'b-','LineWidth',2); %% alpha = 0.01
xlabel('Number of iterations');
ylabel('Cost J');
hold on;
plot(0:49, J1(1:50), 'r-','LineWidth',2); %% alpha = 0.03
plot(0:49, J2(1:50), 'g-','LineWidth',2); %% alpha = 0.1
plot(0:49, J3(1:50), 'k-','LineWidth',2); %% alpha = 0.3
plot(0:49, J4(1:50), 'b--','LineWidth',2); %% alpha = 1
plot(0:49, J5(1:50), 'r--','LineWidth',2); %% alpha = 1.3
legend('0.01', '0.03','0.1','0.3','1','1.3');
hold off;
figure;
plot(0:49, J6(1:50), 'r--','LineWidth',2); %% alpha = 1.4
xlabel('Number of iterations');
ylabel('Cost J');
legend('1.4');

运行结果：

这里不放了，去工作空间看比较准确

ex3_result.png

2.3、myex4

clear all; close all; clc
% in this code, almost use vectorized implement
x = load('ex4x.dat'); 
y = load('ex4y.dat');

m = length(y);

% Add intercept term to x
x = [ones(m, 1), x];

% find returns the indices of the
% rows meeting the specified condition
pos = find(y == 1); neg = find(y == 0);

% Assume the features are in the 2nd and 3rd
% columns of x
plot(x(pos, 2), x(pos,3), '+'); hold on
plot(x(neg, 2), x(neg, 3), 'ro');
xlabel('Exam 1 score');
ylabel('Exam 2 score');


% To define sigmoid function through an inline expression:
g = inline('1.0 ./ (1.0 + exp(-z))'); 
% Usage: To find the value of the sigmoid 
% evaluated at 2, call g(2),z can be a vector.

MAX_ITR = 7;
theta = zeros(size(x(1,:)))'; % initialize fitting parameters
J = zeros(MAX_ITR, 1);

for num_iterations = 1:MAX_ITR
    % calculate coss J, vectorized implement
    G = g(x * theta);
    G1 = 1 - G;
    S = log(G);
    V = log(G1);
    J(num_iterations) = (-1.0/m) .* (y' * S + (1 - y)' * V); % logistic regression cost function J
    
    % update theta
    grad_J = (1/m) .* x' * (G - y); % J gradient
    H = 0; % Hessian matrix initial
    for i = 1:m
        H = H + (1/m) .* G(i) * G1(i) .* (x(i,:)' * x(i,:));
    end
    theta = theta - inv(H) * grad_J; % use Newton's Method to update theta
end
theta
pro = 1-g([1,20,80] * theta);
fprintf('the probability that a student with a score of 20 on Exam 1 and a score of 80 on Exam 2 will not be admitted is:%.3f\n',pro);
% plot the decision boundary line
plot(x(:,2),-((theta(1)*x(:,1)+theta(2)*x(:,2))/(theta(3))));
xlim([10,70]);ylim([40,100]);
legend('Admitted','Not admitted','Decision boundary');
hold off;
figure;
plot(0:MAX_ITR-1,J,'b--');hold on;
plot(0:MAX_ITR-1,J,'r*');
xlabel('Iteration');ylabel('J');

运行结果：

theta =

-16.3787
0.1483
0.1589
the probability that a student with a score of 20 on Exam 1 and a score of 80 on Exam 2 will not be admitted is:0.668

ex4_result.png

2.4

2.4.1、myex5_1.m

clear all; close all; clc

% Regularization linear regression
% Using Normal Equations
x = load('ex5Linx.dat'); 
y = load('ex5Liny.dat');
x_nointercept = x;
m = length(y);

figure
plot(x_nointercept,y,'ro','MarkerFaceColor','r');hold on;

% Add intercept term to x
x = [ones(m, 1), x, x.^2, x.^3, x.^4, x.^5];
n = length(x(1,:));
lambda = [0,1,10];
theta_normal = zeros(n,length(lambda));
norm_theta = zeros(1,length(lambda));
for i=1:length(lambda)
    theta_normal(:,i) = (x' * x + lambda(i) * diag([0,ones(1,n-1)]))\x' * y;
    norm_theta(i) = norm(theta_normal(:,i));
end
theta_normal
norm_theta
x_test = linspace(-1,1,50)';
x_test = [ones(length(x_test), 1), x_test, x_test.^2, x_test.^3, x_test.^4, x_test.^5];
for i=1:length(lambda)
    plot(x_test(:,2),x_test * theta_normal(:,i),'--');
end
hold off;
legend('Training data','5th order fit,\lambda=0','5th order fit,\lambda=1','5th order fit,\lambda=10');

运行结果：

theta_normal =

0.4725 0.3976 0.5205
0.6814 -0.4207 -0.1825
-1.3801 0.1296 0.0606
-5.9777 -0.3975 -0.1482
2.4417 0.1753 0.0743
4.7371 -0.3394 -0.1280

norm_theta =

8.1687 0.8098 0.5931

ex5_1_result.png

2.4.2、myex5_2.m

clear all; close all; clc

% Regularization logistic regression
% Using Newton's Method
x = load('ex5Logx.dat'); 
y = load('ex5Logy.dat');
m = length(y);
x_expand = map_feature(x(:,1),x(:,2));

% Find the indices for the 2 classes
pos = find(y); neg = find(y == 0);

% plot(x(pos, 1), x(pos, 2), 'k+','LineWidth',1.2)
% hold on
% plot(x(neg, 1), x(neg, 2), 'ko','MarkerFaceColor','y')
% xlabel('u');ylabel('v');legend('y=1','y=0');hold off;

% Newton's Method Iterations
g = inline('1.0 ./ (1.0 + exp(-z))'); 
MAX_ITR = 15;
theta = zeros(size(x_expand(1,:)))'; % initialize fitting parameters
lambda = [0,1,10];
J = zeros(MAX_ITR, length(lambda));

for choose_lambda = 1:length(lambda)
    for num_iterations = 1:MAX_ITR
        % Calculate coss J, vectorized implement
        G = g(x_expand * theta);
        G1 = 1 - G;
        S = log(G);
        V = log(G1);
        % Regularized logistic regression cost function J
        % Add the regularization term
        J(num_iterations,choose_lambda) = (-1.0/m) .* (y' * S + (1 - y)' * V) + (lambda(choose_lambda)/(2*m)).* (theta(2:end)' * theta(2:end)); 

        % Update theta
        grad_J_before = (1/m) .* x_expand' * (G - y); % J gradient
        extra_theta = [0;(lambda(choose_lambda)/m) .* theta(2:end)];
        grad_J = grad_J_before + extra_theta;
        H = 0; % Hessian matrix initial
        for i = 1:m
            H = H + (1/m) .* G(i) * G1(i) .* (x_expand(i,:)' * x_expand(i,:));
        end
        H = H + (lambda(choose_lambda)/m) .* diag([0,ones(1,length(theta)-1)]);
        theta = theta - H \ grad_J; % use Newton's Method to update theta

    end
    norm_theta(choose_lambda) = norm(theta);
    % Plot decision boundary 
    % Define the ranges of the grid
    u = linspace(-1, 1.5, 200);
    v = linspace(-1, 1.5, 200);

    % Initialize space for the values to be plotted
    z = zeros(length(u), length(v));
    % Evaluate z = theta*x over the grid
    for i = 1:length(u)
        for j = 1:length(v)
            % Notice the order of j, i here!
            z(j,i) = map_feature(u(i), v(j))*theta;
        end
    end

    % Because of the way that contour plotting works
    % in Matlab, we need to transpose z, or
    % else the axis orientation will be flipped!
    z = z';
    % Plot z = 0 by specifying the range [0, 0]
    figure
    plot(x(pos, 1), x(pos, 2), 'k+','LineWidth',1.2)
    hold on
    plot(x(neg, 1), x(neg, 2), 'ko','MarkerFaceColor','y')
    xlabel('u');ylabel('v');
    contour(u,v,z, [0, 0], 'LineWidth', 2)
    legend('y = 1', 'y = 0', 'Decision boundary');
    hold off;
    title(sprintf('\\lambda = %g', lambda(choose_lambda)), 'FontSize', 14);
end
J
norm_theta
fprintf('Want to see detailed value,go to workspace!\n')

运行结果：

ex5_2_result.png

其中我还得再学习的理论方面包括（估计有点难度，就怕找不到合适的资料，谁有资源也可以推荐一下）：

logistic regression的使用
Normal Equations(包括正则化前后的)
多元牛顿迭代法的理论支撑(包括正则化前后的)

---可以到这里看看教程，对前两个问题会有些解释（2019/7/21更新）-----

有问题的也可以加Q询问

Openclassroom Machine Learning
1、背景最近在看UFLDL Tutorial的时候，他说“最好有一点机器学习基础（具体而言，熟悉监督学习，逻辑斯...
Machine learning booooks
Machine learning Pattern Recognition and Machine Learning...
00 Machine Learning Introduction
Machine Learning Introduction What's the Machine Learning...
【ML】Machine learning model
What are machine learning models? A machine learning mode...
Machine Learning @ Python
Machine Learning（机器学习） Machine learning typically impleme...
The Fundamentals of Machine Lear
How would you define Machine Learning? Machine Learning i...
Coursera.MachineLearning.Week10
Machine Learning Week10 : Large Scale Machine Learning La...
周志华推荐阅读材料
1 机器学习入门教材 Machine Learning machine-learning-the-art-and-...
Al:what is Al
what is Al: 1.Machine Learning 1.1Machine Learning 深度学习应用...
目录
Machine Learning Deep Learning Transfer Learning Reinforc...