14. Dimensionality Reduction

作者: 玄语梨落 | 来源:发表于2020-09-04 21:48 被阅读0次

Dimensionality Reduction

Motivation I: Data compression

Reduce data from 2D to 1D:

Reduce data from 3D to 2D:

Motivation II: Data Visualization

Principal Component Analysis (PCA) problem formulation

Reduce from 2D to 1D: Find a direction (a vector u^{(i)}\in R^n) onto which to project the data so as to minimize the projection error.

Reduce from nD to kD: Find k vectors \mu^{(1)},\mu^{(2)},...,\mu^{(k)} onto which to project data, so as to minimize the projection error.

PCA is not liner regression

Principal Componenet Analysis algorithm

Data preprocessing

Training set: x^{(1)},x^{(2)},...,x^{(m)}
Preprocessiong (feature scaling/mean normalization):
\mu_j=\frac{1}{m}\sum_{i=1}^mx_j^{(i)}
Replace each x_j^{(i)} with x_j-\mu_j.
If defferent features on different scales (e.g., x_1 = size of house, x_2 = number of bedrooms), scale features to have comparable range of values.

Pricipal Component Analysis (PCA) algroithm

Reduce data from nD to kD
Compute "covariance matrix":
\Sigma=\frac{1}{m}\sum_{i=1}^n(x^{(i)})(x^{(i)})^T
Compute "eigenvectors" of matrix \Sigma:
[U,S,V] = svd(Sigma)

U will be a n\times n matrix, what we should do is to take the first k columns of U.Then we get U_{reduce}
Z=U_{reduce}^Tx

Choosing the number of principal components

Choosing k

Average squared projection error: \frac{1}{m}\sum_{(i=1)}^m||x^{(i)}-x_{approx}^{(i)}||^2
Total variation in the data: \frac{1}{m}\sum_{(i=1)}^m||x^{(i)}||^2
Typically, choose k to be smallest value so that
\frac{\frac{1}{m}\sum_{i=1}^m||x^{(i)}-x_{approx}^{(i)}||^2}{\frac{1}{m}\sum_{i=1}^m||x^{(i)}||^2}\le 0.01 \qquad (1\%)
"99% of variance is retained"

[U,S,V]=svd(Sigma)
For given k

1-\frac{\sum_{i=1}^kS_{ii}}{\sum_{i=1}^nS_{ii}}\le 0.01
or
\frac{\sum_{i=1}^kS_{ii}}{\sum_{i=1}^nS_{ii}}\ge 0.99

Reconstruction from compressed representation

z=U_{reduce}^Tx
x_{approx}=U_{reduce}\cdot z

Advice for applying PCA

Supervised learning speedup

  1. Extract inputs: get unlabled dataset
  2. PCA: get new training set

Mapping x^{(i)}\to z^{(i)} should be defined by running PCA only on the training set. This mapping can be applied as well to the examples x_{cv}^{(i)} and x_{test}^{(i)} in the cross validation and test sets.

Applying of PCA

  • Compression
    • Reduce memory/disk needed to store data
    • Speed up learing algorithm
  • Visualization

Bad use of PCA: To prevent overfitting

PCA is sometimes used where it shouldn't be
Design of ML system

Before using PCA, first try running whatever you want to do with the original/raw data.

相关文章

网友评论

    本文标题:14. Dimensionality Reduction

    本文链接:https://www.haomeiwen.com/subject/yaxgdktx.html