美文网首页
webscraper——展开/收起评论的抓取办法(element

webscraper——展开/收起评论的抓取办法(element

作者: 风不千山 | 来源:发表于2018-03-12 01:34 被阅读0次
    实作网址:

    http://www.dianping.com/shop/93729095/review_all/p1

    抓取大众点评的店铺评论时遇到了【展开/收起评论】的点击键,无法用type/text直接抓取,如图:


    数据不全……
    点击展开后……

    考虑使用element click,多页面抓取的话继续套用,但值得庆幸的是,评论作为规律分页省去套用的麻烦尝试。


    Selector graph
    Import

    {"_id":"zhankai2","startUrl":["[http://www.dianping.com/shop/93729095/review_all/p1-3]"],"selectors":[{"id":"111","type":"SelectorElementClick","selector":"div.main-review","parentSelectors":["_root"],"multiple":true,"delay":"3000","clickElementSelector":"div.more-words a.fold","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"333","type":"SelectorText","selector":"div.review-words","parentSelectors":["111"],"multiple":false,"regex":"","delay":0}]}

    当然,唯一不方便的是,需要点击店铺详情页的更多评价进入评论页面(但是,似乎点进去之后才是scraper的优势页面,这就有点尴尬了)……

    可是,再后来的测试中,发现有的店铺的两种评论(有展开评论按钮的,以及啥都没有的)无法全部选中,且抓取结果中有重复项,故对其进行优化。


    graph2
    Import2

    {"_id":"blue","startUrl":["[http://www.dianping.com/shop/96127598/review_all/p1-3]"],"selectors":[{"id":"111","type":"SelectorElementClick","selector":"div.reviews-items > ul > li","parentSelectors":["_root"],"multiple":true,"delay":"3000","clickElementSelector":"div.more-words a.fold","clickType":"clickOnce","discardInitialElements":false,"clickElementUniquenessType":"uniqueText"},{"id":"222","type":"SelectorElement","selector":"div.review-words:nth-of-type(n+3)","parentSelectors":["_root"],"multiple":true,"delay":"3000"},{"id":"333","type":"SelectorText","selector":"parent","parentSelectors":["222"],"multiple":false,"regex":"","delay":0}]}

    相关文章

      网友评论

          本文标题:webscraper——展开/收起评论的抓取办法(element

          本文链接:https://www.haomeiwen.com/subject/cjmgfftx.html