笔记7：对本地网页解析并数据抓取统计分析

作者: Think4doing | 来源:发表于2017-01-01 00:02 被阅读58次

笔记7：对本地网页解析并数据抓取统计分析
Selenium/Xpath/BeautifulSoup
【Python爬虫】分析网页真实请求
通宵整理的140种Python标准库、第三方库和外部工具都有！
140种Python标准库、第三方库和外部工具
通宵整理的140种Python标准库、第三方库和外部工具都有！
集算器 SPL 抓取网页数据
简单爬虫架构
Python实用练手小案例
Android开发 - 抓取并解析网页数据（xml解析、html

from bs4 import BeautifulSoup

path=r'E:\index.html'

data_list= []

with open(path,'r') as file:

Soup=BeautifulSoup(file,'lxml')

brices=Soup.select('body > div > div > div.col-md-9 > div > div > div > div.caption > h4.pull-right')

titles=Soup.select('body > div > div > div.col-md-9 > div > div > div > div.caption > h4 > a')

levels=Soup.select('body > div > div > div.col-md-9 > div > div > div > div.ratings > p.pull-right')

counts=Soup.select('body > div > div > div.col-md-9 > div > div > div > div.ratings > p:nth-of-type(2)')

imgs=Soup.select('body > div > div > div.col-md-9 > div > div > div > img')

# print(counts)

for brice,title,level,count,img in zip(brices,titles,levels,counts,imgs):

data={

'brice':brice.get_text(),

'title':title.get_text(),

'level':level.get_text(),

'count':len(count.find_all(attrs={'class':'glyphicon glyphicon-star'})),#采用find_all(attrs = {属性：内容}) 匹配方法，确定星星数量

'img':img.get('src')

}

data_list.append(data)

foriinsorted(data_list,key=lambdax:x['count'],reverse=True):

print('title {} --count is {} -- brice is {}'.format(i['title'], i['count'], i['brice']))

输出结果：

title EarPod --count is 5 -- brice is $24.99

title New Pocket --count is 4 -- brice is $64.99

title New sunglasses --count is 4 -- brice is $74.99

title iphone gamepad --count is 4 -- brice is $94.99

title Best Bed --count is 4 -- brice is $214.5

title iWatch --count is 4 -- brice is $500

title Park tickets --count is 4 -- brice is $15.5

title Art Cup --count is 3 -- brice is $84.99

小结：

BeautifulSoup参考文档链接：

https://www.crummy.com/software/BeautifulSoup/bs4/doc.zh/#find-all-tag

网友评论

本文标题：笔记7：对本地网页解析并数据抓取统计分析

本文链接：https://www.haomeiwen.com/subject/pkvbvttx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

笔记7：对本地网页解析并数据抓取统计分析

相关文章