美文网首页机器学习实战
机器学习实战-利用SVD简化数据

机器学习实战-利用SVD简化数据

作者: mov觉得高数好难 | 来源:发表于2017-08-28 17:35 被阅读0次

    利用SVD(Singular Value Decomposition),即奇异值分解,我们可以用更小的数据集来表示原始数据集。这样做,其实是去除了噪声和冗余信息。

    奇异值分解
    优点:简化数据,去除噪声,提高算法的结果
    缺点:数据的转化可能难以理解
    使用数据类型:数值型数据

    最早的SVD应用之一就是信息检索,我们称利用SVD的方法为隐性语义索引(Latent Semantic Indexing,LSI),或隐性语义分析(Latent Semantic Analysis,LSA)。
    SVD的另一个应用就是推荐系统。利用SVD可以从数据中构建一个主题空间,如果再在该空间下计算其相似度。
    SVD是矩阵分解的一种类型,而矩阵分解是将数据矩阵分解为多个独立部分的过程。
    NumPy中有一个称为linalg的线性代数工具箱。

    In [94]: from numpy import *
    
    In [95]: U, Sigma,VT = linalg.svd([[1,1],[7,7]])
    
    In [96]: U
    Out[96]: 
    array([[-0.14142136, -0.98994949],
           [-0.98994949,  0.14142136]])
    
    In [97]: Sigma
    Out[97]: array([  1.00000000e+01,   2.82797782e-16])
    
    In [98]: VT
    Out[98]: 
    array([[-0.70710678, -0.70710678],
           [ 0.70710678, -0.70710678]])
    

    Sigma以行向量array([10.,0.])返回,而非[[10,0],[0,0]]。这种返回方式节省空间。

    建立一个新文件svdRec.py:

    def loadExData():
        return[[1, 1, 1, 0, 0],
               [2, 2, 2, 0, 0],
               [1, 1, 1, 0, 0],
               [5, 5, 5, 0, 0],
               [1, 1, 0, 0, 0],
               [0, 0, 0, 3, 3],
               [0, 0, 0, 1, 1]]
    

    接下来对该矩阵进行SVD分解:

    In [17]: import svdRec
        ...: Data = svdRec.loadExData()
        ...: U, Sigma,VT = linalg.svd(Data)
        ...: Sigma
        ...: 
    Out[17]: 
    array([  9.71302333e+00,   4.47213595e+00,   8.10664981e-01,
             1.62982155e-15,   8.33719667e-17])
    

    因为最后两个数太小了,我们可以去掉。
    我们试图重新构造原始矩阵,首先构建一个3x3的矩阵Sig3,因而我们只需要前三行和前三列:

    In [18]: Sig3 = mat([[Sigma[0],0,0],[0,Sigma[1],0],[0,0,Sigma[2]]])
        ...: U[:,:3]*Sig3*VT[:3,:]
        ...: 
    Out[18]: 
    matrix([[  1.00000000e+00,   1.00000000e+00,   1.00000000e+00,
              -7.70210327e-33,  -7.70210327e-33],
            [  2.00000000e+00,   2.00000000e+00,   2.00000000e+00,
              -4.60081159e-17,  -4.60081159e-17],
            [  1.00000000e+00,   1.00000000e+00,   1.00000000e+00,
              -1.23532915e-17,  -1.23532915e-17],
            ..., 
            [  1.00000000e+00,   1.00000000e+00,   4.53492652e-16,
              -5.59432048e-34,  -5.59432048e-34],
            [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
               3.00000000e+00,   3.00000000e+00],
            [  0.00000000e+00,   0.00000000e+00,   0.00000000e+00,
               1.00000000e+00,   1.00000000e+00]])
    

    确定要保留的奇异值的数目有很多启发式的策略,其中一个典型的做法就是保留矩阵中90%的能量。为了计算总能量信息,我们将所有的奇异值求其平方和。于是可以将奇异值的平方和累加到总之的90%为止。另一个启发式策略就是,当矩阵上有上万的奇异值时,那么就保留钱2000~3000个。
    下面开始研究相似度的计算:(上文矩阵出错,后文已更正)

    from numpy import *
    from numpy import linalg as la
        
    def ecludSim(inA,inB):#欧氏距离
        return 1.0/(1.0 + la.norm(inA - inB))
    
    def pearsSim(inA,inB):#皮尔逊指数
        if len(inA) < 3 : return 1.0#不存在,则两个向量完全相关
        return 0.5+0.5*corrcoef(inA, inB, rowvar = 0)[0][1]
    
    def cosSim(inA,inB):#余弦相似度
        num = float(inA.T*inB)
        denom = la.norm(inA)*la.norm(inB)
        return 0.5+0.5*(num/denom)
    

    下面我们将对上述函数进行尝试:

    In [9]: import svdRec
       ...: myMat = mat(svdRec.loadExData())
       ...: svdRec.ecludSim(myMat[:,0],myMat[:,4])
       ...: 
    Out[9]: 0.13367660240019172
    
    In [10]: svdRec.ecludSim(myMat[:,0],myMat[:,0])
    Out[10]: 1.0
    
    In [11]: svdRec.cosSim(myMat[:,0],myMat[:,4])
    Out[11]: 0.54724555912615336
    
    In [12]: svdRec.cosSim(myMat[:,0],myMat[:,0])
    Out[12]: 0.99999999999999989
    
    In [13]: svdRec.pearsSim(myMat[:,0],myMat[:,4])
    Out[13]: 0.23768619407595815
    
    In [14]: svdRec.pearsSim(myMat[:,0],myMat[:,0])
    Out[14]: 1.0
    

    这里采用列向量的表示方法,暗示着我们将利用基于物品的相似度计算方法。使用哪一种相似度,取决于用户或者物品的数目。
    如何对推荐引擎进行评价呢?具体的做法是我们将某些已知的评分去掉,如何对他们进行预测,最后计算预测值和真实值的差异。
    通常用于推荐引擎评价的指标是称为最小均方误差(Root Mean Squared,RMSE)的指标,他首先计算均方误差的平均值,然后取其平方根。
    接下来我们尝试一个物品相似度推荐引擎:

    def standEst(dataMat, user, simMeas, item):#给定相似度计算方法的条件下,计算用户对物品的估计评分制
        n = shape(dataMat)[1]#物品数目
        simTotal = 0.0; ratSimTotal = 0.0
        for j in range(n):
            userRating = dataMat[user,j]
            if userRating == 0: continue#没有评分,跳过
            overLap = nonzero(logical_and(dataMat[:,item].A>0,dataMat[:,j].A>0))[0]#寻找两个用户已经评分的物品
            if len(overLap) == 0: similarity = 0
            else: similarity = simMeas(dataMat[overLap,item],dataMat[overLap,j])
            print 'the %d and %d similarity is: %f' % (item, j, similarity)
            simTotal += similarity
            ratSimTotal += similarity * userRating
        if simTotal == 0: return 0
        else: return ratSimTotal/simTotal
        
    def recommend(dataMat, user, N=3, simMeas=cosSim, estMethod=standEst):
        unratedItems = nonzero(dataMat[user,:].A==0)[1]#没有评分的物品
        if len(unratedItems) == 0: return 'you rated everything'
        itemScores = []
        for item in unratedItems:
            estimatedScore = estMethod(dataMat, user, simMeas, item)
            itemScores.append((item, estimatedScore))
        #sorted排序函数,key 是按照关键字排序,lambda是隐函数,固定写法,
        #jj表示待排序元祖,jj[1]按照jj的第二列排序,reverse=True,降序;[:N]前N个
        return sorted(itemScores, key=lambda jj: jj[1], reverse=True)[:N]
    

    接下来看看他的实际效果,首先对前面给出的矩阵稍加修改:

    def loadExData():
        return[[0, 0, 0, 2, 2],
               [0, 0, 0, 3, 3],
               [0, 0, 0, 1, 1],
               [1, 1, 1, 0, 0],
               [2, 2, 2, 0, 0],
               [5, 5, 5, 0, 0],
               [1, 1, 1, 0, 0]]
    
    In [20]: import svdRec
        ...: myMat = mat(svdRec.loadExData())
        ...: myMat[0,1]=myMat[0,0]=myMat[1,0]=myMat[2,0]=4
        ...: myMat[3,3]=2
        ...: 
    
    In [21]: myMat
    Out[21]: 
    matrix([[4, 4, 0, 2, 2],
            [4, 0, 0, 3, 3],
            [4, 0, 0, 1, 1],
            ..., 
            [2, 2, 2, 0, 0],
            [5, 5, 5, 0, 0],
            [1, 1, 1, 0, 0]])
    

    我们先尝试做一些推荐:

    In [22]: svdRec.recommend(myMat,2)
    the 1 and 0 similarity is: 1.000000
    the 1 and 3 similarity is: 0.928746
    the 1 and 4 similarity is: 1.000000
    the 2 and 0 similarity is: 1.000000
    the 2 and 3 similarity is: 1.000000
    the 2 and 4 similarity is: 0.000000
    Out[22]: [(2, 2.5), (1, 2.0243290220056256)]
    

    下面利用其他相似度计算方法:

    In [24]: svdRec.recommend(myMat,2,simMeas=svdRec.ecludSim)
    the 1 and 0 similarity is: 1.000000
    the 1 and 3 similarity is: 0.309017
    the 1 and 4 similarity is: 0.333333
    the 2 and 0 similarity is: 1.000000
    the 2 and 3 similarity is: 0.500000
    the 2 and 4 similarity is: 0.000000
    Out[24]: [(2, 3.0), (1, 2.8266504712098603)]
    
    In [25]: svdRec.recommend(myMat,2,simMeas=svdRec.pearsSim)
    the 1 and 0 similarity is: 1.000000
    the 1 and 3 similarity is: 1.000000
    the 1 and 4 similarity is: 1.000000
    the 2 and 0 similarity is: 1.000000
    the 2 and 3 similarity is: 1.000000
    the 2 and 4 similarity is: 0.000000
    Out[25]: [(2, 2.5), (1, 2.0)]
    

    实际的数据集会比我们用于展示recommend()函数功能的myMat矩阵稀疏得多。我们载入新的矩阵:

    def loadExData2():
        return[[2, 0, 0, 4, 4, 0, 0, 0, 0, 0, 0],
               [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 5],
               [0, 0, 0, 0, 0, 0, 0, 1, 0, 4, 0],
               [3, 3, 4, 0, 3, 0, 0, 2, 2, 0, 0],
               [5, 5, 5, 0, 0, 0, 0, 0, 0, 0, 0],
               [0, 0, 0, 0, 0, 0, 5, 0, 0, 5, 0],
               [4, 0, 4, 0, 0, 0, 0, 0, 0, 0, 5],
               [0, 0, 0, 0, 0, 4, 0, 0, 0, 0, 4],
               [0, 0, 0, 0, 0, 0, 5, 0, 0, 5, 0],
               [0, 0, 0, 3, 0, 0, 0, 0, 4, 5, 0],
               [1, 1, 2, 1, 1, 2, 1, 0, 4, 5, 0]]
    

    下面我们计算该矩阵的SVD来了解其到底需要但是维特征:

    In [39]: import svdRec
        ...: from numpy import linalg as la
        ...: U, Sigma, VT = la.svd(mat(svdRec.loadExData2()))
        ...: Sigma
        ...: 
    Out[39]: 
    array([  1.34342819e+01,   1.18190832e+01,   8.20176076e+00, ...,
             2.08702082e+00,   7.08715931e-01,   1.90990329e-16])
    
    

    接下来我们看看到底有多少个奇异值能达到总能量的90%。首先对Sigma中的值求平方:

    In [40]: Sig2=Sigma**2
    
    In [41]: Sig2
    Out[41]: 
    array([  1.80479931e+02,   1.39690727e+02,   6.72688795e+01, ...,
             4.35565591e+00,   5.02278271e-01,   3.64773057e-32])
    
    In [42]: sum(Sig2)
    Out[42]: 497.0
    
    In [43]: sum(Sig2)*0.9
    Out[43]: 447.30000000000001
    
    In [44]: sum(Sig2[:2])
    Out[44]: 320.17065834028847
    
    In [45]: sum(Sig2[:3])
    Out[45]: 387.43953785565782
    
    In [46]: sum(Sig2[:4])
    Out[46]: 434.62441339532074
    
    In [47]: sum(Sig2[:5])
    Out[47]: 462.61518152879415
    

    所以可以使用11维矩阵转化成一个5维矩阵。
    下面对转化后的空间构造出一个相似度计算函数。我们利用SVD将所有的菜肴映射到一个低维空间去:

    def svdEst(dataMat, user, simMeas, item):
        n = shape(dataMat)[1]
        simTotal = 0.0; ratSimTotal = 0.0
        U,Sigma,VT = la.svd(dataMat)
        Sig4 = mat(eye(4)*Sigma[:4]) 
        xformedItems = dataMat.T * U[:,:4] * Sig4.I
        for j in range(n):
            userRating = dataMat[user,j]
            if userRating == 0 or j==item: continue
            similarity = simMeas(xformedItems[item,:].T, xformedItems[j,:].T)
            print 'the %d and %d similarity is: %f' % (item, j, similarity)
            simTotal += similarity
            ratSimTotal += similarity * userRating
        if simTotal == 0: return 0
        else: return ratSimTotal/simTotal
    

    然后我们看看效果:

    In [61]: myMat = mat(svdRec.loadExData2())
    
    In [62]: svdRec.recommend(myMat, 1, estMethod=svdRec.svdEst)
    the 0 and 10 similarity is: 0.584526
    the 1 and 10 similarity is: 0.342595
    the 2 and 10 similarity is: 0.553617
    the 3 and 10 similarity is: 0.509334
    the 4 and 10 similarity is: 0.478823
    the 5 and 10 similarity is: 0.842470
    the 6 and 10 similarity is: 0.512666
    the 7 and 10 similarity is: 0.320211
    the 8 and 10 similarity is: 0.456105
    the 9 and 10 similarity is: 0.489873
    Out[62]: [(8, 5.0000000000000009), (0, 5.0), (1, 5.0)]
    

    下面尝试另外一种相似度计算方法:

    In [63]: svdRec.recommend(myMat, 1, estMethod=svdRec.svdEst, simMeas=svdRec.pearsSim)
    the 0 and 10 similarity is: 0.602364
    the 1 and 10 similarity is: 0.303884
    the 2 and 10 similarity is: 0.513270
    the 3 and 10 similarity is: 0.787267
    the 4 and 10 similarity is: 0.667888
    the 5 and 10 similarity is: 0.833890
    the 6 and 10 similarity is: 0.560256
    the 7 and 10 similarity is: 0.371606
    the 8 and 10 similarity is: 0.520289
    the 9 and 10 similarity is: 0.604393
    Out[63]: [(0, 5.0), (1, 5.0), (2, 5.0)]
    

    在大型程序中,SVD每天运行一次或者频率更低,并且还要离线运行。冷启动问题(如何在缺乏数据时给出更好的推荐)处理起来也非常困难。
    下面是使用SVD实现对图像的压缩,在svdRec.py中加入如下代码:

    def printMat(inMat, thresh=0.8):#thresh阈值
        for i in range(32):
            for k in range(32):
                if float(inMat[i,k]) > thresh:
                    print 1,
                else: print 0,
            print ''
    
    def imgCompress(numSV=3, thresh=0.8):
        myl = []
        for line in open('0_5.txt').readlines():
            newRow = []
            for i in range(32):
                newRow.append(int(line[i]))
            myl.append(newRow)
        myMat = mat(myl)
        print "****original matrix******"
        printMat(myMat, thresh)
        U,Sigma,VT = la.svd(myMat)
        #新建全0矩阵重构
        SigRecon = mat(zeros((numSV, numSV)))
        for k in range(numSV):
            SigRecon[k,k] = Sigma[k]
        reconMat = U[:,:numSV]*SigRecon*VT[:numSV,:]
        print "****reconstructed matrix using %d singular values******" % numSV
        printMat(reconMat, thresh)
    

    下面我们看看实际效果:

    In [81]: svdRec.imgCompress(2)
    ****original matrix******
    0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 
    0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 
    0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 
    0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 
    0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 
    0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 
    0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 
    0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 
    0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 
    0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 
    0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 
    0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 
    ****reconstructed matrix using 2 singular values******
    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 
    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
    

    可以看见,只需要两个奇异值就能相当精确地对图像实现重构。数字的总数目是64+64+2=130,与原来的1024相比实现了很高的压缩比。
    在大规模数据集上,SVD的计算和推荐可能是一个很困难的工程问题。通过离线方式来进行SVD分解和相似度计算,是一种减少冗余和推荐时所需时间的方法。下一章将介绍在大数据集上进行机器学习的一些工具。

    相关文章

      网友评论

        本文标题:机器学习实战-利用SVD简化数据

        本文链接:https://www.haomeiwen.com/subject/basgdxtx.html