美文网首页
协同过滤算法MapReduce实现(python)

协同过滤算法MapReduce实现(python)

作者: kang_james | 来源:发表于2019-06-15 13:41 被阅读0次

    python实现协同过滤算法

    实现基于MapReduce协同过滤,需要三个阶段,如下所示

    第一个MapReduce:通过ui矩阵得到归一化后的ui矩阵

    map阶段:以i为key进行分区排序,相同的key的hash编码放到同一个partition中,

    #!/usr/local/bin/python
    import sys
    for line in sys.stdin:
      u, i, s = line.strip().split(',')
      print "%s\t%s\t%s" % (i, u, s)
    

    reduce阶段:利用同一个i被所用用户打过分的score,对其进行归一化操作

    #!/usr/local/bin/python
    import sys
    import math
    
    cur_item = None
    user_score_list = []
    
    for line in sys.stdin:
        item, user, score = line.strip().split('\t')
        if cur_item == None:
            cur_item = item
        if item != cur_item:
            sum = 0.0
            for tuple in user_score_list:
                (u, s) = tuple
                sum  += pow(s,2)
            sum = math.sqrt(sum)
            for tuple in user_score_list:
                (u, s) = tuple 
                print "%s\t%s\t%\t" % (u,cur_item,float(s/sum))
            user_score_list = []
            cur_item = item
        user_score_list.append((user,float(score))
    
    for tuple in user_score_list:
        (u, s) = tuple
        sum += pow(s, 2)
    sum = math.sqrt(sum)
    for tuple in user_score_list:
        (u, s) = tuple
        print "%s\t%s\t%s" % (u, cur_item, float(s / sum))
    
    第二个MapReduce:

    map阶段:为了得到ii 矩阵必须以u为key,得到(u,i,s)

    #!/usr/local/bin/python
    import sys
    
    for line in sys.stdin:
        u, i, s = line.strip().split('\t')
        print "%s\t%s\t%s" % (u, i, s)
    

    reduce阶段:对同一个用户,计算所有打过分的item之间归一化后的分数的乘积,得到 ii 矩阵

    #!/usr/local/bin/python
    
    import sys
    cur_user = None
    item_score_list = []
    for  line in sys.stdin:
        user, item, score = line.strip().split(\t')
        if cur_user == None:
            cur_user = user
        if cur_user != user:
            for i in range(0,len(item_score_list) -1):
                for j in range(i +1,len(item_score_list)):
                    item_a, score_a = item_score_list[i]
                    item_b, score_b = item_score_list[j]
                    print "%s\t%s\t%s" % (item_a, item_b, score_a * score_b)
                    print "%s\t%s\t%s" % (item_b, item_a, score_a * score_b)
            item_score_list = []
            cur_user = user
        item_user_score.append((item, float(score)))
    
    for i in range(0, len(item_score_list) - 1):
        for j in range(i + 1, len(item_score_list)):
            item_a, score_a = item_score_list[i]
            item_b, score_b = item_score_list[j]
            print "%s\t%s\t%s" % (item_a, item_b, score_a * score_b)
            print "%s\t%s\t%s" % (item_b, item_a, score_a * score_b)
    
    第三个MapReduce:

    map阶段:以item_a_item_b key,调用map函数

    #!/usr/local/bin/python
    import sys
    
    for line in sys.stdin:
        item_a, item_b, s = line.strip().split('\t')
        print "%s\t%s" % (item_a +"_" + item_b, s)
    

    reduce阶段: 对相同的key进行聚合,对value值score进行求和,就得到item与item之间的相似度

    #!/usr/local/bin/python
    import  sys
    
    cur_ii_pair = None
    score = 0.0
    for line in sys.stdin:
        ii_pair, s = line.strip().split('\t')
        if cur_ii_pair == None:
            cur_ii_pair = ii_pair
        if cur_ii_pair != ii_pair:
            item_a, item_b = cur_ii_pair.split('_')
            print "%s\t%s\t%s" % (item_a, item_b, sum)
            cur_ii_pair = ii_pair
            score = 0.0
        score += float(s)
    item_a, item_b = cur_ii_pair.split('_')
        print "%s\t%s\t%s" % (item_a, item_b, sum)
    

    相关文章

      网友评论

          本文标题:协同过滤算法MapReduce实现(python)

          本文链接:https://www.haomeiwen.com/subject/ugqbfctx.html