美文网首页
introduction to machine learning

introduction to machine learning

作者: 张亿锋 | 来源:发表于2018-10-18 00:20 被阅读0次

Course Requirements and Grading

Lab(30%)

  • Python

  • Synthetic data

  • 2 deliverables, distributed over moodle


Theory exercises(0/20)

  • close to the end(early December)

Final exam(70%)

  • Theory questions(judgement-oriented)

  • Simulate running algorithms by hand


Meeting hours

  • Office:104B, 68-72 Gower street

  • Meeting hours: Tuesday, 14:00-15:00


Prerequisites:

Linear Algebra; Calculus; Probability; Programming


Machine Learning

data -> maodel ->prediction


Least squares model

least squares solution for linear regression

D​: probleim dimension, e.g. 1D, 2D( can visualize)

N​: training set size

Training set: input-output pairs S=\{\boldsymbol x_{i},y_{i}\},i=1,\dots,N​ where,\boldsymbol x_{i}=\{x_{i1},\dots,x_{iD}\}^{T}\in \mathbb{R}^{D},y_{i}\in \mathbb{R}​,   generally can be \boldsymbol x​

\boldsymbol w​: weight, \boldsymbol w=\{w_{1},\dots,w_{D}\}^{T}\in \mathbb{R}^{D}​

\epsilon_{i}​: noise

other notation:

X=\{\boldsymbol{x_{1}, x_{2}, \dots, x_{N}} \}^{T}=\{\boldsymbol{x_{1}^{T}; x_{2}^{T}; \dots; x_{N}^{T}} \}​

Remark: ";" represent column vector

\boldsymbol y=\{y_{1},\dots,y_{N}\}^{T}​

\boldsymbol \epsilon=\{\epsilon_{1},\dots,\epsilon_{N} \}^{T}​


linear regression model

\boldsymbol y=X\boldsymbol w+\epsilon \quad or \quad \boldsymbol y^{T}=\boldsymbol w^{T}X^{T}+\boldsymbol \epsilon^{T}

that is y_{i}=\boldsymbol x_{i}^{T} \boldsymbol w +\epsilon_{i},\ \ \ or \ \ \ y_{i}=\boldsymbol w^{T}\boldsymbol x_{i}+\epsilon_{i},i=1,\dots,N

Loss function: L(w)=\sum_{i=1}^{N}(y_{i}-\boldsymbol w^{T}x_{i})^{2}

goal: \min\limits_{w} \ L(w)

Least squares solution for linear regression: w^{*}=(X^{T}X)^{-1} X^{T}\boldsymbol y


Generalized linear regression model

\boldsymbol x \rightarrow [\boldsymbol \phi(\boldsymbol x)]=[ \phi_{1}(\boldsymbol x),\dots,\phi_{M}(\boldsymbol x) ]^{T} where \phi_{i}(\boldsymbol x),i=1,\dots,M can be other form besides x_{i} ( if x_{i}, and M=D, it is just the linear regression model )

If D=1,\ \ and\ \ \phi_{i}(x)=x^{i-1}, then it is k-th degree ploynomial fitting

If the highest order of \phi_{i}(\boldsymbol x) is 2, then it is second-order polynomials fitting

set \boldsymbol x \rightarrow [\boldsymbol \phi(\boldsymbol x)]=[ \phi_{1}(\boldsymbol x),\dots,\phi_{M}(\boldsymbol x) ]^{T} where \phi_{i}(\boldsymbol x),i=1,\dots,M can be other form besides x_{i} ( if x_{i}, and M=D, it is just the linear regression model )

If D=1,\ \ and\ \ \phi_{i}(x)=x^{i-1}, then it is k-th degree ploynomial fitting
If the highest order of \phi_{i}(\boldsymbol x) is 2, then it is second-order polynomials fitting
set \Phi=[\boldsymbol\phi(\boldsymbol{x_{1}})^{T};\boldsymbol\phi(\boldsymbol{x_{2}})^{T};\dots;\boldsymbol\phi(\boldsymbol{x_{N}})^{T}]
then the model is:
\boldsymbol y=\Phi\boldsymbol w+\epsilon
Least squares solution for generalized linear regression: w^{*}=(\Phi^{T}\Phi)^{-1} \Phi^{T}\boldsymbol y


approximations

If N>D (e.g. 30 points, 2 dimensions): overdetermined system
If N<D (e.g. 30 points, 3000 dimensions): underdetermined system (overfitting)


How to control complexity( Regularized linear regression)

1.use vector norm (L2, L1, Lp norm) to measure residual vector
Remark: different norm represent different regularized linear regression, here we use L2 norm

2.rewrite loss function: L(\boldsymbol{w})=||\boldsymbol{y}-X\boldsymbol{w}||^{2}+\lambda||\boldsymbol{w}||^{2}
this is ridge regression, a.k.a, L2-regularized linear regression
Remark: \lambda is "hyperparameter", select \lambda with cross-validation(use cross-validation for diff values of \lambda -- pick value minimizes cross-validation error)

Cross-validation: least glorious, most effective of all methods (teacher said)

3.Least squares solution for ridge regression: w^{*}=(X^{T}X+\lambda\boldsymbol{I})^{-1} X^{T}\boldsymbol y

相关文章

网友评论

      本文标题:introduction to machine learning

      本文链接:https://www.haomeiwen.com/subject/vvidzftx.html