美文网首页
爬虫 response.text乱码

爬虫 response.text乱码

作者: 门前的那颗樱桃树 | 来源:发表于2023-06-29 15:00 被阅读0次
    • 打印 response.text乱码。
    • 打印 response.encodingutf-8
        当用 Python 做爬虫的时候,一些网站为了防爬虫会设置一些检查机制,这时我们就需要添加请求头,伪装成浏览器正常访问。例如我们在使用scrapy写爬虫时,在setting中我们设置了DEFAULT_REQUEST_HEADERS。在这里面我们设置了Accept-Encoding"gzip, deflate, br"。那么有可能这个网站的编码就br,然而我们的pycharm上没有下载这个库,就会导致乱码。
    DEFAULT_REQUEST_HEADERS = {
       "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
       # 这里使用了br,就有可能乱码,解释器需要下载 Brotli pip install Brotli
       "Accept-Encoding": "gzip, deflate, br",
       "Accept-Language": "zh-CN,zh;q=0.9",
       "Cache-Control": "max-age=0",
       "Cookie": "resolution=1080*1920; Hm_lvt_c826b0776d05b85d834c5936296dc1d5=1686822404; sensorsdata2015jssdkcross=%7B%22distinct_id%22%3A%22188228516787b0-0579a4d65a09f9-1c525634-2073600-188228516791df7%22%2C%22first_id%22%3A%22%22%2C%22props%22%3A%7B%22%24latest_traffic_source_type%22%3A%22%E7%9B%B4%E6%8E%A5%E6%B5%81%E9%87%8F%22%2C%22%24latest_search_keyword%22%3A%22%E6%9C%AA%E5%8F%96%E5%88%B0%E5%80%BC_%E7%9B%B4%E6%8E%A5%E6%89%93%E5%BC%80%22%2C%22%24latest_referrer%22%3A%22%22%7D%2C%22identities%22%3A%22eyIkaWRlbnRpdHlfY29va2llX2lkIjoiMTg4MjI4NTE2Nzg3YjAtMDU3OWE0ZDY1YTA5ZjktMWM1MjU2MzQtMjA3MzYwMC0xODgyMjg1MTY3OTFkZjcifQ%3D%3D%22%2C%22history_login_id%22%3A%7B%22name%22%3A%22%22%2C%22value%22%3A%22%22%7D%2C%22%24device_id%22%3A%22188228516787b0-0579a4d65a09f9-1c525634-2073600-188228516791df7%22%7D; kk_s_t=1687169079723",
       "If-None-Match": "27733-wHpibHGyRBeG+tUml+dq3EKDpIc",
       "Sec-Ch-Ua-Mobile": "?0",
       "Sec-Ch-Ua-Platform": "macOS",
       "Sec-Fetch-Dest": "document",
       "Sec-Fetch-Mode": "navigate",
       "Sec-Fetch-Site": "none",
       "Sec-Fetch-User": "?1",
       "Upgrade-Insecure-Requests": "1",
       "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36",
    }
    

    总结:

    1、将Accept-Encoding中的:br 去除。
    2、导入Brotli这个库。

    相关文章

      网友评论

          本文标题:爬虫 response.text乱码

          本文链接:https://www.haomeiwen.com/subject/xlrnydtx.html