美文网首页机器学习学习笔记机器学习
pyspark实现FunkSVD电影推荐系统

pyspark实现FunkSVD电影推荐系统

作者: 老周算法 | 来源:发表于2017-07-12 17:04 被阅读665次
    from pyspark import SparkConf, SparkContext 
    from pyspark.mllib.recommendation import ALS, Rating
    # 获取所有movie名称和id对应集合  
    def movie_dict(file):  
    dict = {}  
        with open(file) as f:  
            for line in f:  
                arr= line.split('|')  
                movie_id = int(arr[0])  
                movie_name = str(arr[1])  
                dict[movie_id] = movie_name  
        return dict  
    # 转换用户评分数据格式  
    def get_rating(str):  
        #arr = str.split('\t') 
        arr = str.split('::')  
        user_id = int(arr[0])  
        movie_id = int(arr[1])  
        user_rating = float(arr[2])  
        return Rating(user_id, movie_id, user_rating)
    try:
        sc.stop()
    except:
        pass
    conf=SparkConf().setMaster('local').setAppName('MovieRec').set("spark.executor.memory", "520m")  
    sc = SparkContext(conf=conf)
    #加载数据  
    #movies = movie_dict('/home/joe/.surprise_data/ml-100k/ml-100k/u.item')  
    #sc.broadcast(movies)  
    #data = sc.textFile('/home/joe/.surprise_data/ml-100k/ml-100k/u.data')  
    data = sc.textFile('/home/joe/.surprise_data/ml-1m/ml-1m/ratings.dat')  
    # 转换 (user, product, rating) tuple  
    ratings = data.map(get_rating)  
      
    # 建立模型  
    rank = 10  
    iterations = 5  
    model = ALS.train(ratings, rank, iterations)  
      
    # 对指定用户ID推荐  
    userid = 10  
    user_ratings = ratings.filter(lambda x: x[0] == userid)  
      
    #按得分高低推荐前10电影  
    rec_movies=model.recommendProducts(userid, 10)  
    print ('\n################################\n')    
    print ('recommend movies for userid %d:' % userid)
    for i in rec_movies:
        print(i)
    #for item in rec_movies:  
        #print ('name:'+movies[item[1]]+'==> score: %.2f' % item[2]) 
    print ('\n################################\n' ) 
    sc.stop() 
    

    参考: http://strayly.iteye.com/blog/2341855
    按照上面链接会出现ValueError: Cannot run multiple SparkContexts at once; existing SparkContext(app=PySparkShell, master=local[*]) created by <module> at /usr/local/lib/python2.7/dist-packages/IPython/utils/py3compat.py:288
    然后参考一下链接进行修改
    http://blog.csdn.net/dream_an/article/details/51935530
    movies 可以不使用
    用Spark学习矩阵分解推荐算法http://www.cnblogs.com/pinard/p/6364932.html

    相关文章

      网友评论

      • 番茄ozz:你好 请问你的对应数据集的格式是怎样的呢?

      本文标题:pyspark实现FunkSVD电影推荐系统

      本文链接:https://www.haomeiwen.com/subject/ztibhxtx.html