之前爬取的只是一个歌单里边的歌曲,现在增加爬取多个歌单的功能,通过一次次点击歌单的分析,网易云对歌单也是通过一个id值来确定的,最开始的地址(点击歌单后):url=https://music.163.com/discover/playlist,注意:#号不能要。进去歌单后,页面是这样的
通过Network查找,发现第一页每个歌单的id是在一个Name叫playlist里边,再通过每个歌单的id在里边搜了一下,发现在每个id会在这么3个地方 那我该选哪一个来解析呢?注意看一下我们的歌单url,是这样的:
url = https://music.163.com/playlist?id=2264645756
分析一下,可以发现,我们找到每个歌单是通过id的,也就是说我们要的就是id,用来构造访问每个歌单的url:"https://music.163.com/#/playlist?id=歌单的id",再结合解析,我用的bs4,那毫无疑问,我肯定选择第三个来解析,我可以直接解析出红框中的内容,然后用"https://music.163.com+我得到的内容"来构造歌单url
此处的解析代码:
def getplayList(html): #解析出歌单的id和歌单名字
playlists = []
soup = BeautifulSoup(html, 'html.parser')
id = soup.select('.dec a')
for i in id:
playlist = []
playlist.append(i['href']) #获得歌单的[/playlist?id=2264645756]
playlist.append(i['title']) #获得歌单的名字
playlists.append(playlist) #把每一个歌单的名字和id放到列表中
return playlists
playlists是这样的一个形式(只选了部分),但其中的歌单名我并没有用,因为我没想好怎么用它
[['/playlist?id=2264645756', '想 一 个 人 在 黄 昏 后'], ['/playlist?id=2355333774', '你不是我的诗\xa0正如我不是你的梦'], ['/playlist?id=363692915', '「 Indie 」 那天午后打了个盹儿'], ['/playlist?id=2347578332', '西音东渐:日式西洋古典美学'], ['/playlist?id=2331853291', '爱是紫色的折叠梦境,曼妙又绮丽'], ['/playlist?id=2349865512', '【情话说唱】我的歌里写的是你'], ['/playlist?id=2353471182', '攒了一大堆好听的歌想和你一起听'], ['/playlist?id=2352321741', 'Bass Institute|Bass House'], ['/playlist?id=2298138241', '听了几个故事,正好讲给你玩'], ['/playlist?id=2311431519', '你的名字我的心事〈情歌说唱〉'], ['/playlist?id=2299157419', '活的像风 没有归宿 却也够酷'], ['/playlist?id=2302705693', '『古风』我自问酒不问仙 半世逍遥半世癫']]
这样就能得出歌单的id了,就可以构造出歌单的url了,如下(只贴出了实现本功能的代码):
playurl = 'https://music.163.com/discover/playlist'#歌单页面的url(第一页)
headers = {
'User-Agent': 'Mozilla/5.0'
}
playhtml = getHtml(playurl,headers=headers)#获得歌单页面
playidlist = getplayList(playhtml) #解析出id、每个歌单名字
for u in playidlist: #构造每个歌单url,用于下载歌单中的音乐
start_url_list.append('https://music.163.com'+u[0])
start_url_list的结构内容是这样的:
['https://music.163.com/playlist?id=2264645756', 'https://music.163.com/playlist?id=2355333774', 'https://music.163.com/playlist?id=363692915', 'https://music.163.com/playlist?id=2347578332', 'https://music.163.com/playlist?id=2331853291', 'https://music.163.com/playlist?id=2349865512', 'https://music.163.com/playlist?id=2353471182', 'https://music.163.com/playlist?id=2352321741', 'https://music.163.com/playlist?id=2298138241', 'https://music.163.com/playlist?id=2311431519', 'https://music.163.com/playlist?id=2299157419', 'https://music.163.com/playlist?id=2302705693', 'https://music.163.com/playlist?id=2300523945', 'https://music.163.com/playlist?id=2301094981', 'https://music.163.com/playlist?id=2301267346', 'https://music.163.com/playlist?id=2297457355', 'https://music.163.com/playlist?id=2290267281', 'https://music.163.com/playlist?id=2291115145', 'https://music.163.com/playlist?id=2301310816', 'https://music.163.com/playlist?id=2290797610', 'https://music.163.com/playlist?id=2286380125', 'https://music.163.com/playlist?id=2283281232', 'https://music.163.com/playlist?id=2278767768', 'https://music.163.com/playlist?id=2277307819', 'https://music.163.com/playlist?id=2343741251', 'https://music.163.com/playlist?id=2274985772', 'https://music.163.com/playlist?id=2335662972', 'https://music.163.com/playlist?id=2274346473', 'https://music.163.com/playlist?id=2336165805', 'https://music.163.com/playlist?id=2274803562', 'https://music.163.com/playlist?id=2339316534', 'https://music.163.com/playlist?id=2272295927', 'https://music.163.com/playlist?id=2336073422', 'https://music.163.com/playlist?id=2286925070', 'https://music.163.com/playlist?id=2341435171']
Process finished with exit code 0
也就是说每个项就是一个歌单的url,那接下来就是通过每个歌单的url去爬取歌单里边的音乐了,当然,得结合之前的去歌单解析出每首歌的id值,接下来的步骤就和第一篇文章的步骤一样了爬取网易云部分音乐
完整代码:
import requests
from bs4 import BeautifulSoup
def getHtml(url,headers):
try:
r = requests.get(url,headers = headers)
r.raise_for_status()
r.encoding = 'utf-8'
return r.text
except:
print('爬取失败')
return ''
def htmlParser(html):
try:
id_list = []
soup = BeautifulSoup(html,'html.parser')
li = soup.select('.f-hide li a')
for i in li:
id_list.append(i['href'].split('=')[-1])
return id_list
except:
print('获得id出错')
return ''
def get_name_singer(html):
name_sig_list = []
soup = BeautifulSoup(html,'html.parser')
name = soup.select('.f-ff2')
singer = soup.select('p.des.s-fc4 span a')
name_sig_list.append(name[0].text)
name_sig_list.append(singer[0].text)
return name_sig_list
def getMusic(lst,nslst):
urls = []
for id in lst:
urls.append('http://music.163.com/song/media/outer/url?id='+id+'.mp3')
for i in range(len(urls)):
try:
r = requests.get(urls[i])
with open('music/'+nslst[i][1].strip()+','+nslst[i][0].strip()+'.mp3','wb') as f:
f.write(r.content)
print('第{}首音乐下载成功'.format(i+1))
except :
print('第{}首音乐下载失败'.format(i+1))
f.close()
def getplayList(html):
playlists = []
soup = BeautifulSoup(html, 'html.parser')
id = soup.select('.dec a')
for i in id:
playlist = []
playlist.append(i['href'])
playlist.append(i['title'])
playlists.append(playlist)
return playlists
def main():
urlls = []
name_singer_list = []
start_url_list = []
# start_url = 'https://music.163.com/playlist?id=2153101541'
playurl = 'https://music.163.com/discover/playlist'#歌单页面的url(第一页)
headers = {
'User-Agent': 'Mozilla/5.0'
}
playhtml = getHtml(playurl,headers=headers)#获得歌单页面
playidlist = getplayList(playhtml) #解析出id、每个歌单名字
for u in playidlist: #构造每个歌单url,用于下载歌单中的音乐
start_url_list.append('https://music.163.com'+u[0])
print(start_url_list)
for url in start_url_list:
html = getHtml(url,headers=headers)
idlist = htmlParser(html)
for id in idlist:
urlls.append('https://music.163.com/song?id='+id)
for url in urlls:
html = getHtml(url,headers)
name_singer_list.append(get_name_singer(html))
# print(name_singer_list)
getMusic(idlist,name_singer_list)
main()
运行结果:
F:\New_Anaconda\python.exe E:/Spider_Folder/网易云音乐下载.py
第1首音乐下载成功
第2首音乐下载成功
第3首音乐下载成功
第4首音乐下载成功
第5首音乐下载成功
第6首音乐下载成功
Process finished with exit code -1
碍于用的是流量,还有网速感人,只是下载了一点。保存的截图:
介于没有运行完整,可能会后边有什么错误,慢慢改吧。
网友评论