introduction to machine learning

作者: 张亿锋 | 来源:发表于2018-10-18 00:20 被阅读0次

00 Machine Learning Introduction
Chapter 1 - Machine Learning Bas
机器学习week1
2018-12-04
第1周学习笔记-Coursera机器学习-吴恩达
对机器学习进行深入理解的一本好书
Machine Learning @ Coursera
introduction to machine learning
Machine Learning Introduction
Introduction to Machine Learning

Course Requirements and Grading

Lab(30%)

Python
Synthetic data
2 deliverables, distributed over moodle

Theory exercises(0/20)

close to the end(early December)

Final exam(70%)

Theory questions(judgement-oriented)
Simulate running algorithms by hand

Meeting hours

Office:104B, 68-72 Gower street
Meeting hours: Tuesday, 14:00-15:00

Prerequisites:

Linear Algebra; Calculus; Probability; Programming

Machine Learning

data -> maodel ->prediction

Least squares model

least squares solution for linear regression

$D$ : probleim dimension, e.g. 1D, 2D( can visualize)

$N$ : training set size

Training set: input-output pairs $S=\{\boldsymbol x_{i},y_{i}\},i=1,\dots,N$ where, $\boldsymbol x_{i}=\{x_{i1},\dots,x_{iD}\}^{T}\in \mathbb{R}^{D},y_{i}\in \mathbb{R}$ , generally can be $\boldsymbol x$

$\boldsymbol w$ : weight, $\boldsymbol w=\{w_{1},\dots,w_{D}\}^{T}\in \mathbb{R}^{D}$

$\epsilon_{i}$ : noise

other notation:

$X=\{\boldsymbol{x_{1}, x_{2}, \dots, x_{N}} \}^{T}=\{\boldsymbol{x_{1}^{T}; x_{2}^{T}; \dots; x_{N}^{T}} \}$

Remark: ";" represent column vector

$\boldsymbol y=\{y_{1},\dots,y_{N}\}^{T}$

$\boldsymbol \epsilon=\{\epsilon_{1},\dots,\epsilon_{N} \}^{T}$

linear regression model

$\boldsymbol y=X\boldsymbol w+\epsilon \quad or \quad \boldsymbol y^{T}=\boldsymbol w^{T}X^{T}+\boldsymbol \epsilon^{T}$

that is $y_{i}=\boldsymbol x_{i}^{T} \boldsymbol w +\epsilon_{i},\ \ \ or \ \ \ y_{i}=\boldsymbol w^{T}\boldsymbol x_{i}+\epsilon_{i},i=1,\dots,N$

Loss function: $L(w)=\sum_{i=1}^{N}(y_{i}-\boldsymbol w^{T}x_{i})^{2}$

goal: $\min\limits_{w} \ L(w)$

Least squares solution for linear regression: $w^{*}=(X^{T}X)^{-1} X^{T}\boldsymbol y$

Generalized linear regression model

$\boldsymbol x \rightarrow [\boldsymbol \phi(\boldsymbol x)]=[ \phi_{1}(\boldsymbol x),\dots,\phi_{M}(\boldsymbol x) ]^{T}$ where $\phi_{i}(\boldsymbol x),i=1,\dots,M$ can be other form besides $x_{i}$ ( if $x_{i}$ , and $M=D$ , it is just the linear regression model )

If $D=1,\ \ and\ \ \phi_{i}(x)=x^{i-1}$ , then it is k-th degree ploynomial fitting

If the highest order of $\phi_{i}(\boldsymbol x)$ is 2, then it is second-order polynomials fitting

set $\boldsymbol x \rightarrow [\boldsymbol \phi(\boldsymbol x)]=[ \phi_{1}(\boldsymbol x),\dots,\phi_{M}(\boldsymbol x) ]^{T}$ where $\phi_{i}(\boldsymbol x),i=1,\dots,M$ can be other form besides $x_{i}$ ( if $x_{i}$ , and $M=D$ , it is just the linear regression model )

If $D=1,\ \ and\ \ \phi_{i}(x)=x^{i-1}$ , then it is k-th degree ploynomial fitting
If the highest order of $\phi_{i}(\boldsymbol x)$ is 2, then it is second-order polynomials fitting
set $\Phi=[\boldsymbol\phi(\boldsymbol{x_{1}})^{T};\boldsymbol\phi(\boldsymbol{x_{2}})^{T};\dots;\boldsymbol\phi(\boldsymbol{x_{N}})^{T}]$
then the model is:
$\boldsymbol y=\Phi\boldsymbol w+\epsilon$
Least squares solution for generalized linear regression: $w^{*}=(\Phi^{T}\Phi)^{-1} \Phi^{T}\boldsymbol y$

approximations

If $N>D$ (e.g. 30 points, 2 dimensions): overdetermined system
If $N<D$ (e.g. 30 points, 3000 dimensions): underdetermined system (overfitting)

How to control complexity( Regularized linear regression)

1.use vector norm (L2, L1, Lp norm) to measure residual vector
Remark: different norm represent different regularized linear regression, here we use L2 norm

2.rewrite loss function: $L(\boldsymbol{w})=||\boldsymbol{y}-X\boldsymbol{w}||^{2}+\lambda||\boldsymbol{w}||^{2}$
this is ridge regression, a.k.a, L2-regularized linear regression
Remark: $\lambda$ is "hyperparameter", select $\lambda$ with cross-validation(use cross-validation for diff values of $\lambda$ -- pick value minimizes cross-validation error)

Cross-validation: least glorious, most effective of all methods (teacher said)

3.Least squares solution for ridge regression: $w^{*}=(X^{T}X+\lambda\boldsymbol{I})^{-1} X^{T}\boldsymbol y$

00 Machine Learning Introduction
Machine Learning Introduction What's the Machine Learning...
Chapter 1 - Machine Learning Bas
Introduction to Machine Learning and Using Numpy
机器学习week1
introduction machine learning: Field of study that gives ...
2018-12-04
Introduction What is machine learning In this video we wi...
第1周学习笔记-Coursera机器学习-吴恩达
Introduction 1.Machine Learning definition Arthur Samuel(...
对机器学习进行深入理解的一本好书
《Probabilistic Machine Learning: An Introduction》完整目录 ht...
Machine Learning @ Coursera
Machine Learning by Andrew Ng Week 1 - Introduction Super...
introduction to machine learning
Course Requirements and Grading Lab(30%) Python Synthetic...
Machine Learning Introduction
统计与机器学习概论付彦煒 yanweifu@fudan.edu.cn http://yanweifu.githu...
Introduction to Machine Learning
下载地址：Introduction to Machine Learning with R一本免费的电子书，胜过一杯...