美文网首页散文想法心理
python协程爬取网易云歌单

python协程爬取网易云歌单

作者: Cache_wood | 来源:发表于2022-01-07 10:17 被阅读0次
在这里插入图片描述

@[toc]

任务分析

首先通过网易分页歌单的url获取每份歌单的url,然后通过每份歌单的url提取歌单中的前十首歌的url,通过每首歌的url获取一些歌曲的作者和专辑等信息,整个过程通过协程来加快速度。

核心代码

import time
import csv
import gevent
import requests
import asyncio
import aiofiles
from io import BytesIO
from PIL import Image
import requests as req
from bs4 import BeautifulSoup
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

urls = []
analysis_url = []
  
def coroutine1():   #协程1执行producer1的所有url任务
    job_list1 = []  # 保存所有协程任务
    for n in range(0,1300,35):
        url = f'https://music.163.com/discover/playlist/?order=hot&cat=%E8%AF%B4%E5%94%B1&limit=35&offset={n}'
        print(url)
        job1 = gevent.spawn(producer1, url)  # 执行一个协程任务
        job_list1.append(job1)  # 把每个协程任务放进一个列表中保存
        gevent.joinall(job_list1)  # 等待所有协程结束

def coroutine2():  #协程2执行consumer1的所有url任务
    row = ['id','title','nickname','img','description','count','number of song','number of adding list','share','comment']
    with open('data.csv','a',encoding='utf-8') as file:
        csv_writer = csv.writer(file)
        csv_writer.writerow(row)
    job_list2 = []
    for url in urls:
        job2 = gevent.spawn(consumer1, url)  # 执行一个协程任务
        job_list2.append(job2)  # 把每个协程任务放进一个列表中保存
        gevent.joinall(job_list2)  # 等待所有协程结束

#使用生产者消费者模式,生产者产生的id链接传给消费者执行
def producer1(url):  
    headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'
    }
    response = requests.get(url=url,headers=headers,verify=False)
    html = response.text

    soup = BeautifulSoup(html, 'html.parser') 

    ids = soup.select('.dec a')      # 获取包含歌单详情页网址的标签
    for i in ids:
        page_url = 'https://music.163.com/' + i['href']  #生产者传递的id链接
        #print(page_url)
        urls.append(page_url)

def consumer1(url):  #将获取歌单的信息写入csv文件
    with open('data.csv','a',encoding='utf-8') as file:
        csv_writer = csv.writer(file)
        headers = {
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'
        }

        response = requests.get(url=url,headers=headers,verify=False)
        html = response.text
        soup = BeautifulSoup(html, 'html.parser') 

        idd = soup.select('.s-fc7')[0]['href'].split('=')[-1]   #获取歌单id
        img = soup.select('img')[0]['data-src']   #图片链接
        res = req.get(img)
        image = Image.open(BytesIO(res.content))  #图片处理
        try:
            image.save(str(time.time())+'.jpg')
        except:
            image.save(str(time.time())+'.png')
            #os.remove(os.getcwd()+f'\\{cnt}.jpg')

        title = soup.select('title')[0].get_text()  #标题
        nickname = soup.select('.s-fc7')[0].get_text()  #昵称
        #print(idd,title,nickname)

        description = soup.select('p')[1].get_text()  #简介
        count = soup.select('strong')[0].get_text()   #播放次数
        song_number = soup.select('span span')[0].get_text()  #歌的数目
        add_lis = soup.select('a i')[1].get_text()   #添加进列表次数
        share = soup.select('a i')[2].get_text()    #分享次数
        comment = soup.select('a i')[4].get_text()  #评论次数
        #print(description,count,song_number,add_lis,share,comment)
        
        csv_writer.writerow([idd,title,nickname,img,description,count,song_number,add_lis,share,comment])
        
        if float(count)>1000000:  #提取播放数超过1百万的歌单
            #yield url
            res = requests.get(url,headers=headers,verify=False)
            soup = BeautifulSoup(res.text,'html.parser')

            song = soup.select('li a')
            for s in song[:10]:
                analysis_url.append('https://music.163.com/'+s['href']) #添加歌曲id进列表

async def write_demo(songname,singer,album):  #异步实现文件的写
    # 异步方式执行with操作,修改为 async with
    async with aiofiles.open("song.txt","a",encoding="utf-8") as file:
        await file.write(songname+','+singer+','+album+'\n')

def consumer2(url):   
    headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36'
    }
    res = requests.get(url,headers=headers,verify=False)
    soup = BeautifulSoup(res.text,'html.parser')

    songname = soup.select('.f-ff2')[0].string  #获取歌名
    singer = soup.select('p a')[0].string   #获取歌手
    album = soup.select('p a ')[1].string   #获取专辑
    asyncio.run(write_demo(songname,singer,album))  #异步实现写入

def coroutine3():   #协程3执行consumer2的所有url
    with open('song.txt','a') as file:
        file.write('songname,singer,album\n')
    job_list3 = []
    for url in analysis_url:
        job3 = gevent.spawn(consumer2, url)  # 执行一个协程任务
        job_list3.append(job3)  # 把每个协程任务放进一个列表中保存
        gevent.joinall(job_list3)  # 等待所有协程结束

def main():
    start_time = time.time()
    coroutine1()
    coroutine2()
    coroutine3()
    print('time = %f'%(time.time()-start_time))

main()

爬取效果

  • 爬取图片如图所示


    在这里插入图片描述
  • 爬取歌单信息如下
id,title,nickname,img,description,count,number of song,number of adding list,share,comment

201586,[拒绝瞌睡] 跟着节拍抖腿的嘻哈音乐 - 歌单 - 网易云音乐,原创君,http://p3.music.126.net/AEjvfbLi56fDGcSXQnkxrg==/109951165529653802.jpg,"
【纯音乐】情绪氛围系列‖钢琴
",10140783,30,(79284),(545),(153)

1463586082,[一周欧美说唱] 别人放假Nas加班,圣诞献礼新专辑歌迷大福利 - 歌单 - 网易云音乐,云音乐官方歌单,http://p4.music.126.net/ZcM1CfTLL5gIafgbNzvHwQ==/109951166789455206.jpg,"
【纯音乐】情绪氛围系列‖钢琴
",4705263,50,(29956),(152),(91)

1463586082,[Z世代说唱] 在节奏中感受Z世代的嘻哈力量 - 歌单 - 网易云音乐,云音乐官方歌单,http://p3.music.126.net/XOgxHCqyT2HHLZdvLeiCYg==/109951166416650856.jpg,"介绍:
90前的说唱歌手大多早已登入封神殿堂,90后的说唱歌手多数也成为业内代表人,满载00后说唱歌手的东风列车也已经悉数候场,有的说唱歌手潜心学习,时刻准备着展现自己的作品,有的说唱歌手也早已登上评委席,载入说唱的史册。点击播放,进入Z世代的说唱世界,感受Z世代的嘻哈力量吧!
",476933,30,(2860),(28),(15)

1463586082,[打游戏听说唱] 说唱律动是你最佳游戏伴侣 - 歌单 - 网易云音乐,云音乐官方歌单,http://p4.music.126.net/9FOaQlzu-THnYxqYTnvv-Q==/109951166032530603.jpg,"
【纯音乐】情绪氛围系列‖钢琴
",3742997,30,(28315),(183),(46)
  • 爬取歌曲信息如下
songname,singer,album
Wave Gods,Nas,A$AP Rocky
Misfit Toys (from the series Arcane League of Legends),Pusha T,Mako
Peru,Fireboy DML,Ed Sheeran
Chrome Lips (feat. Freddie Gibbs),Le$,Freddie Gibbs
Dynasties & Dystopia (from the series Arcane League of Legends),Denzel Curry,GIZZLE
Stack House,Zona Man,Future
City Love,Pacman Da Gunman,Nipsey Hussle
WFM Slow,Realestk,WFM Slow
bluemoon,mixed matches,im a good sport
IDFK,Domo Genesis,IDFK
航行,蒋小呢,航行
差不多姑娘,G.E.M.邓紫棋,摩天动物园
We Can Do It,NINEONE#,We Can Do It
Don't Go,ZIV,倒叙爱情
敏感黑夜,YangYang,敏感黑夜
Pink Love,似水_,陈奕楠
∞,YKEY,无限
说散就散,艾福杰尼,黄旭
心都碎了 :(,YangYang,Lonely Night
她不爱的Rapstar,深水29,情绪制造 Made In Emotions
BORRAXXA,Feid,Manuel Turizo
Mi Gente,J. Balvin,Willy William
Despacito,Sam Tsui,Sam Tsui Covers Vol.6
Danza Kuduro 2019 (Diamont Dr),Lucenzo,Don Omar
Ola La,Kate Linn,Ola La
Danza Kuduro,Don Omar,Lucenzo
Ka Je X2,Adrian Gaxha,Ronela Hajati
Acapella,Mikolas Josef,Fito Blanko
SUBEME LA RADIO,Enrique Iglesias,Descemer Bueno
RITMO (Bad Boys For Life),Black Eyed Peas,J. Balvin
木头人,马思唯,黑马
明亮航线,明堂唱片,地磁卡
日落大道,地下因素Under Factor,爵士乐歌星
Man in the Mirror,小河River,Man in the Mirror
最佳情况,直火帮 Straight Fire Gang,Revenge Season REMIXES
没心没肺,李毅杰PISSY,快乐星球-母星
矛盾,OWEN欧阳子文,矛盾
Chinese shh!,KEY.L刘聪,Kafe.Hu
Dysonman,某幻君,Dysonman

相关文章

网友评论

    本文标题:python协程爬取网易云歌单

    本文链接:https://www.haomeiwen.com/subject/llrdqrtx.html