14. Dimensionality Reduction

作者: 玄语梨落 | 来源:发表于2020-09-04 21:48 被阅读0次

14. Dimensionality Reduction
14.降维（Dimensionality reduction)
Dimensionality Reduction
MDR，Multifactor Dimensionality R
2018-07-11笔记（3）取样&清洗
4、Dimensionality reduction
多因子降维法(MDR，Multifactor Dimension
特征工程
基于空间-光谱的高光谱图像降维（Huang et al. 181
机器学习：降维工具 - PCA

Dimensionality Reduction

Motivation I: Data compression

Reduce data from 2D to 1D:

Reduce data from 3D to 2D:

Motivation II: Data Visualization

Principal Component Analysis (PCA) problem formulation

Reduce from 2D to 1D: Find a direction (a vector $u^{(i)}\in R^n$ ) onto which to project the data so as to minimize the projection error.

Reduce from nD to kD: Find k vectors $\mu^{(1)},\mu^{(2)},...,\mu^{(k)}$ onto which to project data, so as to minimize the projection error.

PCA is not liner regression

Principal Componenet Analysis algorithm

Data preprocessing

Training set: $x^{(1)},x^{(2)},...,x^{(m)}$
Preprocessiong (feature scaling/mean normalization):
$\mu_j=\frac{1}{m}\sum_{i=1}^mx_j^{(i)}$
Replace each $x_j^{(i)}$ with $x_j-\mu_j$ .
If defferent features on different scales (e.g., $x_1$ = size of house, $x_2$ = number of bedrooms), scale features to have comparable range of values.

Pricipal Component Analysis (PCA) algroithm

Reduce data from nD to kD
Compute "covariance matrix":
$\Sigma=\frac{1}{m}\sum_{i=1}^n(x^{(i)})(x^{(i)})^T$
Compute "eigenvectors" of matrix $\Sigma$ :
[U,S,V] = svd(Sigma)

U will be a $n\times n$ matrix, what we should do is to take the first k columns of U.Then we get $U_{reduce}$
$Z=U_{reduce}^Tx$

Choosing the number of principal components

Choosing k

Average squared projection error: $\frac{1}{m}\sum_{(i=1)}^m||x^{(i)}-x_{approx}^{(i)}||^2$
Total variation in the data: $\frac{1}{m}\sum_{(i=1)}^m||x^{(i)}||^2$
Typically, choose k to be smallest value so that
$\frac{\frac{1}{m}\sum_{i=1}^m||x^{(i)}-x_{approx}^{(i)}||^2}{\frac{1}{m}\sum_{i=1}^m||x^{(i)}||^2}\le 0.01 \qquad (1\%)$
"99% of variance is retained"

[U,S,V]=svd(Sigma)
For given k

$1-\frac{\sum_{i=1}^kS_{ii}}{\sum_{i=1}^nS_{ii}}\le 0.01$
or
$\frac{\sum_{i=1}^kS_{ii}}{\sum_{i=1}^nS_{ii}}\ge 0.99$

Reconstruction from compressed representation

$z=U_{reduce}^Tx$
$x_{approx}=U_{reduce}\cdot z$

Advice for applying PCA

Supervised learning speedup

Extract inputs: get unlabled dataset
PCA: get new training set

Mapping $x^{(i)}\to z^{(i)}$ should be defined by running PCA only on the training set. This mapping can be applied as well to the examples $x_{cv}^{(i)}$ and $x_{test}^{(i)}$ in the cross validation and test sets.

Applying of PCA

Compression
- Reduce memory/disk needed to store data
- Speed up learing algorithm
Visualization

Bad use of PCA: To prevent overfitting

PCA is sometimes used where it shouldn't be
Design of ML system

Before using PCA, first try running whatever you want to do with the original/raw data.

14. Dimensionality Reduction
Dimensionality Reduction Motivation I: Data compression R...
14.降维（Dimensionality reduction)
第八周 - Lecture 14降维的目的：减少内存和存储加快运算速度可视化（降到2维或3维） PCA方法...
Dimensionality Reduction
MDR，Multifactor Dimensionality R
MDR，Multifactor Dimensionality Reduction （多因子降维法） to dete...
2018-07-11笔记（3）取样&清洗
【关键词：Sampling，Dimensionality Reduction】取样（Sampling）：关键点...
4、Dimensionality reduction
原文链接Chapter 9 Dimensionality reduction[https://osca.bioco...
多因子降维法(MDR，Multifactor Dimension
多因子降维法（MDR，multifactor dimensionality reduction） MDR是近年统计...
特征工程
数据预处理方法 scikit-learn模块降维模块 Dimensionality reduction (dec...
基于空间-光谱的高光谱图像降维（Huang et al. 181
论文链接现有降维（Dimensionality Reduction, DR）方法总结主成分分析（Princip...
机器学习：降维工具 - PCA
降维（dimensionality reduction）就是减少数据特征的维度作用使得数据集更易使用降...