
figure cited here, recommend reading: A step by step explanation of Principal Component Analysis
PCA,Principal Component Analysis, is a dimensionality-reduction method.
It can reduce the number of variables of a data set, using one or more components to represent the original data.
Principal components are constructed as linear combinations of the initial variables.
Geometrically speaking, principal components are new axes with the most spread out projection of all the data points.
The more spread out, the more variance they carry, the more information they can keep, so PCA can reduce the dimensionality and preserve as much information as possible.
Step 1: Standardization
This step transforms all the variables to the same scale, because PCA is quite sensitive regarding the variances of the initial variables.
Step 2: Compute the Covariance Matrix
This matrix can reflect relationships among all the variables, and high correlation means redundant information.
Step 3: Compute the eigenvectors and eigenvalues of the covariance matrix
The eigenvectors of the Covariance matrix are Principal Components,since these directions have the most variance, and eigenvalues are the amount of variance carried in each Principal Component.
Step 4: Keep p components
Rank the eigenvalues from highest to lowest, for example, PC1 may carry 95% of the variance and PC2 carries 5%. We can keep all components or discard some of lesser significance ones.
网友评论