美文网首页我爱编程
关于大众点评的反爬技术

关于大众点评的反爬技术

作者: sexy_cyber | 来源:发表于2018-06-14 13:38 被阅读3007次

    被ban响应类型:

    一、403forbidden
    二、响应码是200:

    1.返回空的响应体,什么都没有
    2.返回一个静态页面,大意思是请求失败,并且源码中能够看到失败的ip和UA(这难道是在提醒我们哪里做的不对),实践验证是封的ua

    反扒手段:

    一、短期高频率请求,会得到要求输入验证码的页面
    二、UA,如果是ubuntu的UA是直接封的,另外还有很多ua也都是不行的,实践测试,用的网上的UA池,最高需要切换20次以上,才能得到正确的响应;初步猜测是ua池的大部分ua,太老,网站不支持,比如Chrome已经到了65版本了,但是ua池还是能看到12版本的,这种应该就是被封掉了;
    三、ip一个ip被盯上了,基本上接下来要稍微难过点了;
    四、评论api js加密,评论是ajax加载的,需要调接口,这个接口需要传递几个加密的参数;需要逆向或者另辟蹊径了;
    五、识别selenium+ headless chrome or phantomjs的特征,请求得到空的响应;

    提醒:referer似乎是不需要的;

    设计思路:

    同时验证 :status_code,响应体长度,响应体是否包含需求字段;
    如果失败 立马切换ua,和代理,一定要要保证每一次出去的请求,代理和ua都是随机切换过的;降低爬取评率,加入time.sleep()

    放大招:

    爬取大众点评的评论,调用点评api:

    payload不需要改变什么,传参id即可调用,请求头经过测试,也是只需要加上以下三条就行,但是经过测试,10次以上有效响应后,再也调用不了;具体分析见代码下面:

    
    def get_comments(id):
        url = 'http://m.dianping.com/isoapi/module'
        # id = 97116301
        payload = {
            "uuid": "6f86750a-3f6f-7976-c261-a95c2c932eee.{}".format(int(time.time())),
            "platform": 3,
            "partner": 150,
            "optimusCode": 10,
            "originUrl": "http://m.dianping.com/shop/{}/review_all".format(id),
            "pageEnName": "shopreviewlist",
            "moduleInfoList": [{
                "moduleName": "baseinfo",
                "query": {
                    "shopId": "{}".format(id),
                    "pageDomain": "m.dianping.com"
                },
                "config": {
                    "photo_link_ios": "https://m.dianping.com/shop/{shopId}/photos",
                    "hidePicLink": False,
                    "setLink": "",
                    "enableShopNameLink": True,
                    "hideMultipic": False,
                    "enablePhotoULink": False,
                    "shanhuiSourceType": "",
                    "hideShanhui": False,
                    "photo_utm": "ulink_shopphoto",
                    "enableHuiULink": False,
                    "photo_link_android": "https://m.dianping.com/shop/{shopId}/photos"
                }
            }, {
                "moduleName": "reviewlist",
                "query": {
                    "shopId": "{}".format(id),
                    "offset": 0,
                    "limit": 10,
                    "type": 1,
                    "keyword": "",
                    "hit": "",
                    "pageDomain": "m.dianping.com"
                }
            }, {
                "moduleName": "bottom-app",
                "query": {
                    "shopId": "{}".format(id),
                    "pageDomain": "m.dianping.com"
                },
                "config": {
                    "support_system": "all",
                    "bottomapp_link_ios": "https://link.dianping.com/universal-link?originalUrl=https%3A%2F%2Fevt.dianping.com%2Fsynthesislink%2F6166.html%3FshopId%3D@shopId@&schema=dianping%3A%2F%2Fshopinfo%3Fid%3D@shopId@%26utm%3D@utm@",
                    "bottomapp_utm": "ulink_reviewbutton",
                    "setSyntheticalLink": "",
                    "bottomapp_link_android": "https://evt.dianping.com/synthesislink/6166.html?shopId={shopId}",
                    "setDownloadLink": "",
                    "pos": "top"
                }
            }]
        }
    
        while True:
            headers = {
                'Host': 'm.dianping.com',
                'Content-Type': 'application/json',
                'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36',
            }
            content = requests.post(url,headers=headers,data=json.dumps(payload)).json()
            print(content)
            if content['code'] == 200 and content['msg'] == 'success':
                try:
                    comments = content['data']['moduleInfoList']
                    reviewList = comments[0]['moduleData']['data']['reviewList']
                    lastTimeStr = reviewList[0]['lastTimeStr']
                    print(lastTimeStr)
                    return lastTimeStr
                except:
                    print('该商铺没有评论信息')
                    return ''
            else:
                print('评论接口调用失败,三秒后重试')
                time.sleep(3)
    

    uuid不是根据ua生成的,ua可以放心切换;
    uuid是在该请求http://m.dianping.com/review/425742779(id)的响应体内(由服务器返回)
    后面进过校验,cookie必须要带,相同的cookie可以接受不同的payload,不同的shopid;也就是时候加上cookie并且传参id就可以灵活调用了,但是,cookie是存在过期的,所以需要分析cookie的生成;
    网页源码中有这么一段:

       window.PAGE_INITIAL_STATE = {"_context":{"header":{"user-agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36","cookie":"s_ViewType=10; _lxsdk_cuid=163c8ec4668c8-022dadb3f4d532-19326952-13c680-163c8ec466ac8; _lxsdk=163c8ec4668c8-022dadb3f4d532-19326952-13c680-163c8ec466ac8; _hc.v=7615b9e1-a7bd-ea10-9fc2-71602832d6ab.1528084318; Hm_lvt_e6f449471d3527d58c46e24efb4c343e=1528369542; Hm_lpvt_e6f449471d3527d58c46e24efb4c343e=1528369542; _lx_utm=utm_source%3DBaidu%26utm_medium%3Dorganic; cy=207; cye=shantou; ua=dpuser_3971866024; ctu=3fe0ea23f9b2673c148e477b16eef3ee02f1cbf83dc900bc457dab8eb8e860ef; uamo=17621989923; _dp.ac.v=73ce4109-baea-456f-acf2-812478fecf64; cityid=1; logan_custom_report=; default_ab=shop%3AA%3A1%7Cshopreviewlist%3AA%3A1%7Csinglereview%3AA%3A1; logan_session_token=a1n39xc1myeus6i2fa5c; _lxsdk_s=%7C%7C0"},"modules":[{"moduleName":"bottomapp","config":{"bottomapp_link_ios":"https://link.dianping.com/universal-link?originalUrl=https%3A%2F%2Fevt.dianping.com%2Fsynthesislink%2F13392.html%3FshopId%3D@shopId@&schema=dianping%3A%2F%2Fshopinfo%3Fid%3D@shopId@%26utm%3D@utm@","support_system":"Android","bottomapp_utm":"ulink_shopbutton","bottomapp_link_android":"https://evt.dianping.com/synthesislink/13392.html?shopId={shopId}","setDownloadLink":"https://m.dianping.com/download/redirect?id=13393","pos":"top"},"isServerRender":true},{"moduleName":"header","config":{"backLink":"javascript:history.go(-1);","supportSearch":true},"isServerRender":true},{"moduleName":"baseinfo","config":{"setLink":"","hidePicLink":false,"photo_link_ios":"https://link.dianping.com/universal-link?originalUrl=https://m.dianping.com/shop/@shopId@/photos&schema=dianping://shopinfo?id=@shopId@&utm=@utm@","hideMultipic":false,"enablePhotoULink":false,"shanhuiSourceType":"","hideShanhui":false,"enableHuiULink":false,"photo_utm":"ulink_shopphoto","photo_link_android":"https://m.dianping.com/shop/{shopId}/photos"},"isServerRender":true},{"moduleName":"list","config":{},"isServerRender":true},{"moduleName":"morereview","config":{},"isServerRender":true},{"moduleName":"breadcrumb","config":{},"isServerRender":true},{"moduleName":"bottomsearch","config":{},"isServerRender":true},{"moduleName":"footer","config":{},"isServerRender":true}],"mSource":"default","pageInitData":{"reviewId":"425742779","pageDomain":"m.dianping.com","shopId":"97116301","title":"脉冲汽车美容贴膜:我这个从来不爱洗车的人难得洗次车\n虹漕路店预约满了就约了这家...-大众点评网","description":"我这个从来不爱洗车的人难得洗次车\n虹漕路店预约满了就约了这家","lastDate":"2018-06-11T11:34:44","url4baiduLD":"http://m.dianping.com/review/425742779"},"pageEnName":"singlereview"},"bottomapp":{"_config":{"bottomapp_link_ios":"https://link.dianping.com/universal-link?originalUrl=https%3A%2F%2Fevt.dianping.com%2Fsynthesislink%2F13392.html%3FshopId%3D@shopId@&schema=dianping%3A%2F%2Fshopinfo%3Fid%3D@shopId@%26utm%3D@utm@","support_system":"Android","bottomapp_utm":"ulink_shopbutton","bottomapp_link_android":"https://evt.dianping.com/synthesislink/13392.html?shopId={shopId}","setDownloadLink":"https://m.dianping.com/download/redirect?id=13393","pos":"top"},"_isInit":true,"data":{"bottomappUrl":"//evt.dianping.com/synthesislink/13392.html?shopId=97116301","show":false}},"header":{"backLink":"javascript:history.go(-1);","search":false,"keyWord":"","inputValue":"","historyList":[],"showHistory":false,"keyList":[],"resultList":[],"shopListUrl":"","dealUrl":"","originalUrl":"//m.dianping.com/search/keyword/1/0_","mount":true,"config":{},"categoryId":0,"suggestHeight":0,"seo":true,"cityId":1,"_config":{"backLink":"javascript:history.go(-1);","supportSearch":true},"_isInit":true},"baseinfo":{"_config":{"setLink":"","hidePicLink":false,"photo_link_ios":"https://link.dianping.com/universal-link?originalUrl=https://m.dianping.com/shop/@shopId@/photos&schema=dianping://shopinfo?id=@shopId@&utm=@utm@","hideMultipic":false,"enablePhotoULink":false,"shanhuiSourceType":"","hideShanhui":false,"enableHuiULink":false,"photo_utm":"ulink_shopphoto","photo_link_android":"https://m.dianping.com/shop/{shopId}/photos"},"_isInit":true,"data":{"multiPics":["//vfile.meituan.net/joymerchant/8969151812651862155-3698830-1523266513952.jpg%40220w_164h_1e_1c_1l%7Cwatermark%3D1%26%26r%3D1%26p%3D9%26x%3D20%26y%3D20","//qcloud.dpfile.com/pc/k4KJ3o4jUyatzKw2VzSkNWp8RrSTP5R-1N9u-Y3FF14gX4MZOpGNaFtM93oSSizb_Gd2X_f-v9T8Yj4uLt25Gg.jpg","//qcloud.dpfile.com/pc/JZdu5al4xQm62WM5SCmqhKFeyZTWyHHvrbjX35Ln3l2wQ22aPczvkt7hEH2bBNKR_Gd2X_f-v9T8Yj4uLt25Gg.jpg","//qcloud.dpfile.com/pc/XSaHnGSV9fskBaWFxTaRyntLVJ4_Y_b61SEes_Xt6U-naWB5pQE-0MENC6SaYjHl_Gd2X_f-v9T8Yj4uLt25Gg.jpg"],"picCount":653,"reviewCount":171,"photoUrl":"//m.dianping.com/shop/97116301/photos","shopName":"脉冲汽车美容贴膜(TheWash沪太店)","shopPower":50,"defaultPic":"https://vfile.meituan.net/joymerchant/8969151812651862155-3698830-1523266513952.jpg%40300w_225h_1e_1c_1l%7Cwatermark%3D1%26%26r%3D1%26p%3D9%26x%3D2%26y%3D2%26relative%3D1%26o%3D20","avgPrice":741,"mainRegionName":"大宁地区","mainCategoryName":"美容洗车","shopUrl":"//m.dianping.com/shop/97116301"}},"list":{"_config":{},"_isInit":true,"data":{"reviewDTO":{"reviewId":425742779,"userId":6840470,"shopId":"97116301","shopGroupId":6357438,"shopType":65,"cityId":1,"status":1,"statusCode":0,"reviewBody":"我这个从来不爱洗车的人难得洗次车<br/>虹漕路店预约满了就约了这家,还好人不多,一开始就一辆宾利在洗<br/>洗车小哥说认识我的车,说是原来虹漕路店的,哈哈哈哈哈<br/>真的都是法拉利,宾利,太厉害惹……我的小马捉襟见肘了<br/>洗的很干净,服务没话说,还做了一个漆面护理,我也忘记具体叫啥了,总之就是下雨的时候水比较能快速流走,感觉还是不错的也不贵。<br/>下次还会再光顾的","reviewBodyLength":172,"lastIp":"220.112.121.49","hits":0,"qualityScore":1,"flowerTotal":3,"picTotal":4,"followNoteNo":0,"tuangouTag":0,"referType":2,"referId":20640848,"extDealOrderId":1906528284,"extShanHuiOrderId":0,"extOrderId":1906528284,"extDealId":0,"referToken":null,"clientType":2,"merchantFollowCount":0,"type":0,"rank":2099999999,"calcStar":0,"reviewTitle":"","riskLevel":0,"encryptedReviewId":"425742779","star":{"title":"总体评价","value":50,"desc":"超赞"},"addTime":"2018年6月11日","lastTime":"2018-06-12T02:08:46.000Z","scoreList":null,"expenseInfoList":[{"title":"费用","value":100,"desc":"元"}],"extInfoList":null,"reviewPics":[{"reviewId":425742779,"picId":1148654174,"url":"//qcloud.dpfile.com/pc/3_mM6o3shfVMUcrKL5WB_wHUsGU1El4rFqnKBLQ-TBvTtYaXFmiKBqJc_bNvyQABR4s50ruFIbVLR-NFhcR5Xg.jpg","videoUrl":null,"mediaType":0,"status":1,"addTime":"2018-06-11T11:34:44.000Z","lastTime":"2018-06-12T02:08:46.000Z"},{"reviewId":425742779,"picId":1148654175,"url":"//qcloud.dpfile.com/pc/380hloWkfaTV7AEsCr3v3jf2r3meV7qEMLkZtow9MaYJmTb7TCvKeBrwM9ltz8a8R4s50ruFIbVLR-NFhcR5Xg.jpg","videoUrl":null,"mediaType":0,"status":1,"addTime":"2018-06-11T11:34:44.000Z","lastTime":"2018-06-12T02:08:46.000Z"},{"reviewId":425742779,"picId":1148654176,"url":"//qcloud.dpfile.com/pc/HCl0xLUdlLCFZftPg21sPma4Kgc-IiiJTAIB-ohzRWU9thrOHKCZqVapCs44Fv_zR4s50ruFIbVLR-NFhcR5Xg.jpg","videoUrl":null,"mediaType":0,"status":1,"addTime":"2018-06-11T11:34:44.000Z","lastTime":"2018-06-12T02:08:46.000Z"},{"reviewId":425742779,"picId":1148654177,"url":"2862bf548065e17ca405af9cbe018477","videoUrl":null,"mediaType":0,"status":1,"addTime":"2018-06-11T11:34:44.000Z","lastTime":"2018-06-12T02:08:46.000Z"}],"dpReviewVideo":null,"userNickName":"没了气的汽水","userLevelImg":"","userPhoto":"https://p0.meituan.net/userheadpicbackend/39c31170972631193699161a2b03cd2a35228.jpg%40120w_120h_1e_1c_1l%7Cwatermark%3D0","reviewCount":170,"reviews":{"recordCount":0,"pageSize":0,"page":1,"sortField":null,"sortAsc":true,"records":null},"userFaces":["https://p0.meituan.net/userheadpicbackend/562b222fa998f106941a302e31e99ca8504440.jpg%4048w_48h_1e_1c_1l%7Cwatermark%3D0","https://p1.meituan.net/userheadpicbackend/2e7e2bf96cd152adcc607bfc6eb2463d369951.jpg%4048w_48h_1e_1c_1l%7Cwatermark%3D0","https://p0.meituan.net/userheadpic/onigiri.png%4048w_48h_1e_1c_1l%7Cwatermark%3D0"],"allReviewsUrl":"//m.dianping.com/shop/97116301/review_all"}},"pageInitData":{"reviewId":"425742779","pageDomain":"m.dianping.com","shopId":"97116301","title":"脉冲汽车美容贴膜:我这个从来不爱洗车的人难得洗次车\n虹漕路店预约满了就约了这家...-大众点评网","description":"我这个从来不爱洗车的人难得洗次车\n虹漕路店预约满了就约了这家","lastDate":"2018-06-11T11:34:44","url4baiduLD":"http://m.dianping.com/review/425742779"}},"morereview":{"items":[{"reviewId":415658110,"platform":1,"addTime":"2018-05-01T14:25:17.000Z","lastTime":"2018-05-01T14:25:17.000Z","flowerTotal":7,"followNoteNo":2,"browseCount":12383,"lastTimeStr":"2018年5月1日","vipLevel":1,"star":50,"reviewPicNum":8,"reviewPics":[{"url":"//qcloud.dpfile.com/pc/eFMmt2WNFpDdFuCUsw9yrjgIhA_h_n9sJlYW8yHoDW_HjlYGlwvmDrKgf1j7TbngGybIjx5eX6WNgCPvcASYAw.jpg","picId":1115255806,"status":0,"bigurl":"//qcloud.dpfile.com/pc/eFMmt2WNFpDdFuCUsw9yrjgIhA_h_n9sJlYW8yHoDW_HjlYGlwvmDrKgf1j7Tbngd376ss-jT4ZOn1XtrMhDPw.jpg"},{"url":"//qcloud.dpfile.com/pc/0HP2ygfSRKZosxtK4U2qtpvT5vWjbGe3SXBWdCFRvQYN56-_QiKuOvyio1OOxsRtGybIjx5eX6WNgCPvcASYAw.jpg","picId":1115255808,"status":0,"bigurl":"//qcloud.dpfile.com/pc/0HP2ygfSRKZosxtK4U2qtpvT5vWjbGe3SXBWdCFRvQYN56-_QiKuOvyio1OOxsRtd376ss-jT4ZOn1XtrMhDPw.jpg"},{"url":"//qcloud.dpfile.com/pc/kwA0Dsh6JiDLJRtbuSg-5_pKQKHoFes6qaSYRlrL5z31BE-kMOu6Xik-Cr0iSjXVGybIjx5eX6WNgCPvcASYAw.jpg","picId":1115255810,"status":0,"bigurl":"//qcloud.dpfile.com/pc/kwA0Dsh6JiDLJRtbuSg-5_pKQKHoFes6qaSYRlrL5z31BE-kMOu6Xik-Cr0iSjXVd376ss-jT4ZOn1XtrMhDPw.jpg"}],"reviewBody":"今天是五一劳动节,小长假最后一天,短途自驾游结束后准备犒劳一下工作了一天的小车车,在朋友介绍下来到了这家新开的脉冲!<br><br>【交通】非常便利老沪太路新村路这里路口就能看到这家酷酷的店了<br><br>【装修】可以说是非常有***and明亮干净了,内设有两个休息室供客人等待⌛️提供奶茶咖啡等饮料,还有电视机可供选择休闲打发时间<br><br>【服务】选择了精洗车辆,工时大约2小时左右,同时会有三名帅气的洗车小哥来施工作业,去污,去油膜,去铁粉,清洁内饰和脚垫,用无微不至来形容一点也不夸张,洗完以后听说我是第一次精洗,想拍几张照片留念,小哥很爽气的帮我开到了转盘上,让我拍了好久,真的是太给力了!<br><br>【福利】今天不凑巧,洗到一半开始下雨了☔️……没想到这里还有雨天福利,买一送一,让你不留遗憾!😊","avgPrice":0,"userPhoto":"https://p1.meituan.net/userheadpicbackend/2d16fcfbe22566ce36640bf8e4dbb8f967957.jpg%4048w_48h_1e_1c_1l%7Cwatermark%3D0","userNickName":"风歌夜曲CK","userPower":2781,"userLevelImg":"https://p0.meituan.net/gpa/roundlv6.png","honour":1,"reviewdetailUrl":"javascript:void(0);"},{"reviewId":418050888,"platform":1,"addTime":"2018-05-12T05:53:23.000Z"…
    

    其中这一块就是cookie:

    "cookie":"s_ViewType=10; _lxsdk_cuid=163c8ec4668c8-022dadb3f4d532-19326952-13c680-163c8ec466ac8; _lxsdk=163c8ec4668c8-022dadb3f4d532-19326952-13c680-163c8ec466ac8; _hc.v=7615b9e1-a7bd-ea10-9fc2-71602832d6ab.1528084318; Hm_lvt_e6f449471d3527d58c46e24efb4c343e=1528369542; Hm_lpvt_e6f449471d3527d58c46e24efb4c343e=1528369542; _lx_utm=utm_source%3DBaidu%26utm_medium%3Dorganic; cy=207; cye=shantou; ua=dpuser_3971866024; ctu=3fe0ea23f9b2673c148e477b16eef3ee02f1cbf83dc900bc457dab8eb8e860ef; uamo=17621989923; _dp.ac.v=73ce4109-baea-456f-acf2-812478fecf64; cityid=1; logan_custom_report=; default_ab=shop%3AA%3A1%7Cshopreviewlist%3AA%3A1%7Csinglereview%3AA%3A1; logan_session_token=a1n39xc1myeus6i2fa5c; _lxsdk_s=%7C%7C0"

    直接re匹配出来,放到ajax请求头中;
    后面发现payload的参数uuid 也是来自于cookie,关于cookie的问题,模拟三次请求用session保存记录即可
    那么代码来了:

    """
    大众点评评论接口
    
    创建类实例时传入shopid即可
    GetComments类提供两个方法,
    api_comments
    返回json评论详情
    get_lasttime
    返回最新的评论时间
    """
    import re
    import requests
    from fake_useragent import UserAgent
    import json
    import time
    class GetComments:
        def __init__(self,id):
            self.firt_url = 'http://m.dianping.com/shop/{}'.format(id)
            self.second_ulr = 'http://m.dianping.com/shop/{}/review_all'.format(id)
            self.id = id
    
        def second_page(self):
            num = 0
    
            while True:
                num += 1
                try:
                    ua = UserAgent()
                    headers = {
                        'User-Agent':ua.random
                    }
                    self.session = requests.Session()
                    self.session.get(self.firt_url,headers=headers)
                    content = self.session.get(self.second_ulr,headers=headers).text
                    # print(content)
                    cookie = re.findall(r'.+"cookie":"(.+?)"},"modules"', content)[0]
                    # print(cookie)
                    uuid = re.findall(r'_hc.v=(.+)', cookie)[0]
                    print(uuid)
                    return {'cookie':cookie,'uuid':uuid}
                except:
                    time.sleep(5)
                    print('第{}次请求手机评论首页失败'.format(num))
    
        # 拿到全部的评论数据,有待完善,,,,,,,,
        def api_comments(self):
            url = 'http://m.dianping.com/isoapi/module'
            id = self.id
            result = self.second_page()
            uuid = result['uuid']
            cookie = result['cookie']
            payload = {
                "uuid": uuid,
                "platform": 3,
                "partner": 150,
                "optimusCode": 10,
                "originUrl": "http://m.dianping.com/shop/{}/review_all".format(id),
                "pageEnName": "shopreviewlist",
                "moduleInfoList": [{
                    "moduleName": "baseinfo",
                    "query": {
                        "shopId": "{}".format(id),
                        "pageDomain": "m.dianping.com"
                    },
                    "config": {
                        "photo_link_ios": "https://m.dianping.com/shop/{shopId}/photos",
                        "hidePicLink": False,
                        "setLink": "",
                        "enableShopNameLink": True,
                        "hideMultipic": False,
                        "enablePhotoULink": False,
                        "shanhuiSourceType": "",
                        "hideShanhui": False,
                        "photo_utm": "ulink_shopphoto",
                        "enableHuiULink": False,
                        "photo_link_android": "https://m.dianping.com/shop/{shopId}/photos"
                    }
                }, {
                    "moduleName": "reviewlist",
                    "query": {
                        "shopId": "{}".format(id),
                        "offset": 0,
                        "limit": 10,
                        "type": 1,
                        "keyword": "",
                        "hit": "",
                        "pageDomain": "m.dianping.com"
                    }
                }, {
                    "moduleName": "bottom-app",
                    "query": {
                        "shopId": "{}".format(id),
                        "pageDomain": "m.dianping.com"
                    },
                    "config": {
                        "support_system": "all",
                        "bottomapp_link_ios": "https://link.dianping.com/universal-link?originalUrl=https%3A%2F%2Fevt.dianping.com%2Fsynthesislink%2F6166.html%3FshopId%3D@shopId@&schema=dianping%3A%2F%2Fshopinfo%3Fid%3D@shopId@%26utm%3D@utm@",
                        "bottomapp_utm": "ulink_reviewbutton",
                        "setSyntheticalLink": "",
                        "bottomapp_link_android": "https://evt.dianping.com/synthesislink/6166.html?shopId={shopId}",
                        "setDownloadLink": "",
                        "pos": "top"
                    }
                }]
            }
            while True:
                ua = UserAgent()
                headers = {
                    'Host': 'm.dianping.com',
                    'Content-Type': 'application/json',
                    'User-Agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.62 Safari/537.36",
                    # 'Cookie':cookie
                }
                data = json.dumps(payload)
                content = self.session.post(url, headers=headers, data=data).json()
                print(content)
                if content['code'] == 200 and content['msg'] == 'success':
                    try:
                        comments = content['data']['moduleInfoList']
                        reviewList = comments[0]['moduleData']['data']['reviewList']
                        lastTimeStr = reviewList[0]['lastTimeStr']
                        return content
                    except:
                        print('该商铺没有评论信息')
                        return ''
                else:
                    print('评论接口调用失败,三秒后重试')
                    time.sleep(3)
    
        # 拿到最新的评论时间
        def get_lasttime(self):
            content = self.api_comments()
            if content:
                comments = content['data']['moduleInfoList']
                reviewList = comments[0]['moduleData']['data']['reviewList']
                lastTimeStr = reviewList[0]['lastTime']
                print(lastTimeStr)
                return lastTimeStr
    
    if __name__ == '__main__':
        getcomments = GetComments('97116301')
        getcomments.get_lasttime()
    
    
    

    提醒

    scrapy请求不支持payload形式的post请求;

    相关文章

      网友评论

        本文标题:关于大众点评的反爬技术

        本文链接:https://www.haomeiwen.com/subject/jmqyeftx.html