美文网首页机器学习基础:案例研究
机器学习基础:案例研究——week 5

机器学习基础:案例研究——week 5

作者: Think123DO | 来源:发表于2017-04-20 00:42 被阅读0次
    #导入数据
    import graphlab
    song_data = graphlab.SFrame("song_data.gl/")
    #查看数据结构
    song_data.head()
    

    数据结构如下:

    Paste_Image.png

    数据由user_id,song_id,listen_count,title,artist,song这几列构成。

    1. Which of the artists below have had the most unique users listening to their songs?('Kanye West,'Foo Fighters,Taylor Swift,Lady GaGa)
    print song_data[song_data['artist'] == 'Kanye West']
    

    将artist为Kanye West的数据全部选定,得到如下数据:

    Paste_Image.png

    然后对用户(user_id)进行统计,这里使用unique()函数,其可以输出其中不重复的用户名

    print song_data[song_data['artist'] == 'Kanye West']['user_id'].unique()
    

    这样就将所有用户统计了出来,输入结果如下:

    Paste_Image.png
    len(song_data[song_data['artist'] == 'Kanye West']['user_id'].unique())
    

    输出结果:2522
    对剩下的三人进行重复的操作

    len(song_data[song_data['artist'] == 'Foo Fighters']['user_id'].unique())
    len(song_data[song_data["artist"] == "Taylor Swift"]["user_id"].unique())
    len(song_data[song_data["artist"] == "Lady GaGa"]["user_id"].unique())
    

    输出结果:2055,3246,2928

    2 . Which of the artists below is the most popular artist, the one with highest total listen_count, in the data set?

    3 .
    Which of the artists below is the least popular artist, the one with smallest total listen_count, in the data set?
    这里要用到groupby(key_columns, operations, *args)
    其可以将关键列按给出的列聚合。
    i. key_columns , which takes the column we want to group, in our case, 'artist'
    ii. operations , where we define the aggregation operation we using, in our case, we want to sum over the 'listen_count'.

    data = song_data.groupby(key_columns='artist', operations={'total_count': graphlab.aggregate.SUM('listen_count')}).sort('total_count', ascending=False)
    print data[0]
    print data[-1]
    

    输出结果如下:
    {'total_count': 43218, 'artist': 'Kings Of Leon'}
    {'total_count': 14, 'artist': 'William Tabbert'}

    相关文章

      网友评论

        本文标题:机器学习基础:案例研究——week 5

        本文链接:https://www.haomeiwen.com/subject/celkzttx.html