当你有一个数据集,每一条数据都M种属性,然后你想知道M种属性对数据集的影响的时候。你需要用到协方差矩阵。
求协方差矩阵之前请一定要知道协方差矩阵是干嘛的,是表示属性之间关系的矩阵,协方差矩阵的规模只与属性数量有关,和数据总量无关。blog.sciencenet.cn/blog-455004-805926.html 这里讲的很清楚。
python代码如下:
class PCA:
def avg(self,data):
avgData = [0]*len(data[0])
for i in range(0,len(data)):
for t in range(0,len(data[i])):
avgData[t] += data[i][t]
for i in range(0,len(avgData)):
avgData[i] = float(avgData[i])/len(data)
return avgData
def getCovMatrix(self,data,avg):
covData = [[0 for i in range(len(data[0]))] for i in range(len(data[0]))]
for i in range(0,len(data[0])):
for t in range(0,len(data[0])):
covData[i][t] = self.getCov(data,i,t,avg)
return covData
def getCov(self,data,col1,col2,avg):
cov = 0;
for i in range(0,len(data)):
cov += (data[i][col1]-avg[col1])*(data[i][col2]-avg[col2])
#print cov/(len(data)-1)
return cov/(len(data)-1)
data = [[-1,-1,1],[-2,-1,4],[-3,-2,-2],[1,1,1],[2,1,2],[3,2,1],[1,2,4]]
example = PCA()
avgdata = example.avg(data)
print example.getCovMatrix(data,avgdata)
网友评论