美文网首页
webscraper——展开/收起评论的抓取办法(element

webscraper——展开/收起评论的抓取办法(element

作者: 风不千山 | 来源:发表于2018-03-12 01:34 被阅读0次
实作网址:

http://www.dianping.com/shop/93729095/review_all/p1

抓取大众点评的店铺评论时遇到了【展开/收起评论】的点击键,无法用type/text直接抓取,如图:


数据不全……
点击展开后……

考虑使用element click,多页面抓取的话继续套用,但值得庆幸的是,评论作为规律分页省去套用的麻烦尝试。


Selector graph
Import

{"_id":"zhankai2","startUrl":["[http://www.dianping.com/shop/93729095/review_all/p1-3]"],"selectors":[{"id":"111","type":"SelectorElementClick","selector":"div.main-review","parentSelectors":["_root"],"multiple":true,"delay":"3000","clickElementSelector":"div.more-words a.fold","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"333","type":"SelectorText","selector":"div.review-words","parentSelectors":["111"],"multiple":false,"regex":"","delay":0}]}

当然,唯一不方便的是,需要点击店铺详情页的更多评价进入评论页面(但是,似乎点进去之后才是scraper的优势页面,这就有点尴尬了)……

可是,再后来的测试中,发现有的店铺的两种评论(有展开评论按钮的,以及啥都没有的)无法全部选中,且抓取结果中有重复项,故对其进行优化。


graph2
Import2

{"_id":"blue","startUrl":["[http://www.dianping.com/shop/96127598/review_all/p1-3]"],"selectors":[{"id":"111","type":"SelectorElementClick","selector":"div.reviews-items > ul > li","parentSelectors":["_root"],"multiple":true,"delay":"3000","clickElementSelector":"div.more-words a.fold","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"222","type":"SelectorElement","selector":"div.review-words:nth-of-type(n+3)","parentSelectors":["_root"],"multiple":true,"delay":"3000"},{"id":"333","type":"SelectorText","selector":"parent","parentSelectors":["222"],"multiple":false,"regex":"","delay":0}]}

相关文章

网友评论

      本文标题:webscraper——展开/收起评论的抓取办法(element

      本文链接:https://www.haomeiwen.com/subject/cjmgfftx.html