美文网首页
编写爬虫之爬取网易云音乐上的精彩评论

编写爬虫之爬取网易云音乐上的精彩评论

作者: 无罪的坏人 | 来源:发表于2019-08-16 14:02 被阅读0次

    首先感谢【小甲鱼】极客Python之效率革命。讲的很好,通俗易懂,适合入门。

    感兴趣的朋友可以访问https://fishc.com.cn/forum-319-1.html来支持小甲鱼。谢谢大家。
    想要学习requests库的可以查阅: https://fishc.com.cn/forum.php?mod=viewthread&tid=95893&extra=page%3D1%26filter%3Dtypeid%26typeid%3D701

    1.首先我们来分析一下,先元素定位

    精彩评论.png

    我们先把网页源代码爬下来看看

    # -*- coding:UTF-8 -*-
    import requests
    
    def get_url(url):
        headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029."
                          "110 Safari/537.36 SE 2.X MetaSr 1.0"}
        res = requests.get(url, headers=headers)
        return res
    
    def main():
        url = input("请输入链接地址:")
        res = get_url(url)
    
        with open("res.txt", "w", encoding="utf-8") as file:
            file.write(res.text)
    
    if __name__ == "__main__":
        main()
    

    发现内容里面并没有我们想要的精彩评论。

    2.放慢浏览器的加载速度,一旦出现精彩评论内容,就给它取消掉,找到评价对应的资源文件

    放慢浏览器加载速度.png
    Request URL:https://music.163.com/weapi/v1/resource/comments/R_SO_4_1356350562?csrf_token=643432a22c0bfd772c33e2726c942e48
    Request Method:POST
    这样我们把这个目标文件给下载下来(用requests去模范浏览器请求)
    # -*- coding:UTF-8 -*-
    import requests
    
    def get_comments(url):
        name_id = url.split('=')[1]
        headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029."
                          "110 Safari/537.36 SE 2.X MetaSr 1.0",
            "refer": "https://music.163.com/"
        }
        params = "jvRGxPQYIeDQiiYsS8qg51ryAhi9TwM0H3NGLu7B9re4EOw9/a7jHRW0P5jhupFbSamLsjHvSpivhbtFiTObUOR2mYA7nFh5KUxaXn3bYh8GXy9sGTbxLeFCuY0KoNAfwWICK0n9ZRPlBHQ1CGBiohOq8+FDDPVBJhbcYgOSPhpTiZ22Ea+/xoYuk7UHnXHty093tfxAXJU032N1uaksCQmMzHxafQ1OA0BroKvyEMA="
        encSecKey = "969f735e7bc94d2b6a6f8371dd89e27d16161ea019a7d2b31391c257452c358678e7ffc11c45712a7f1e47fb1bea81dcf0dbb6f6335045766c06ef1fcc3758987cd30a8674510a062bf626dc2aed8b24c25e7a92ecb1ea38ac514e937f69343923a669d9024ff7a65f8154a35f854de05b67a56dd46d7fa5c136b02c414ce0ea"
        data = {
            "params": params,
            "encSecKey": encSecKey
        }
        target_url = "https://music.163.com/weapi/v1/resource/comments/R_SO_4_{}?csrf_token=".format(name_id)  # 对目标URL进行分析,让每个URL都能用
        res = requests.post(target_url, headers=headers, data=data)  # 把这个post请求给构造出来,F12看浏览器里面是怎么样的
        return res
    
    
    def main():
        url = input("请输入链接地址:")
        res = get_comments(url)
    
    
        with open("data.txt", "w", encoding="utf-8") as file:
            file.write(res.text)
    
    
    if __name__ == "__main__":
        main()
    

    3.提取我们要的数据(把返回内容保存为json用火狐打开分析下,看看我们需要提取的数据是在哪里的)

    火狐直接打开看.png

    这样我们就知道我们要的数据在哪里了

    上完整代码

    # -*- coding:UTF-8 -*-
    import requests
    import json
    
    def get_hot_comment(res):
        comment_json = json.loads(res.text)  # 将已编码的 JSON 字符串解码为 Python 对象
        hot_comments = comment_json['hotComments']
        print(hot_comments)
        with open('hot_comment.txt', 'w', encoding='utf-8') as file:
            for each in hot_comments:
                file.write(each['user']['nickname'] + ':\n\n')
                file.write(each['content'] + '\n')
                file.write('-'*50 + '\n')
    
    def get_comments(url):
        name_id = url.split('=')[1]
        headers = {
            "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029."
                          "110 Safari/537.36 SE 2.X MetaSr 1.0",
            "refer": "https://music.163.com/"
        }
        params = "jvRGxPQYIeDQiiYsS8qg51ryAhi9TwM0H3NGLu7B9re4EOw9/a7jHRW0P5jhupFbSamLsjHvSpivhbtFiTObUOR2mYA7nFh5KUxaXn3bYh8GXy9sGTbxLeFCuY0KoNAfwWICK0n9ZRPlBHQ1CGBiohOq8+FDDPVBJhbcYgOSPhpTiZ22Ea+/xoYuk7UHnXHty093tfxAXJU032N1uaksCQmMzHxafQ1OA0BroKvyEMA="
        encSecKey = "969f735e7bc94d2b6a6f8371dd89e27d16161ea019a7d2b31391c257452c358678e7ffc11c45712a7f1e47fb1bea81dcf0dbb6f6335045766c06ef1fcc3758987cd30a8674510a062bf626dc2aed8b24c25e7a92ecb1ea38ac514e937f69343923a669d9024ff7a65f8154a35f854de05b67a56dd46d7fa5c136b02c414ce0ea"
        data = {
            "params": params,
            "encSecKey": encSecKey
        }
        target_url = "https://music.163.com/weapi/v1/resource/comments/R_SO_4_{}?csrf_token=".format(name_id)
    
        res = requests.post(target_url, headers=headers, data=data)
        return res
    
    def main():
        url = input("请输入链接地址:")
        res = get_comments(url)
        get_hot_comment(res)
    
    
    if __name__ == "__main__":
        main()
    

    实现的效果

    效果.png

    是不是挺有意思的呢

    相关文章

      网友评论

          本文标题:编写爬虫之爬取网易云音乐上的精彩评论

          本文链接:https://www.haomeiwen.com/subject/ojwfsctx.html