美文网首页
python爬虫报错 'utf-8' codec can't d

python爬虫报错 'utf-8' codec can't d

作者: AoeKeller | 来源:发表于2018-08-16 10:25 被阅读0次

    不废话,用urllib.open,返回的信用是二进制文件,然后decode的时候,报错
    'utf-8' codec can't decode byte 0x8b
    检查了原网页确实是utf-8编码,
    我的代码如下

        def myDownLoad(self, url):
        webheader = {
            'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng, */*',
            'Accept-Language': 'zh-CN',
            'User-Agent': 'Mozilla/5.0 (iPad; CPU OS 11_0 like Mac OS X) AppleWebKit/604.1.34 (KHTML, like Gecko) Version/11.0 Mobile/15A5341f Safari/604.1',
            'DNT': '1',
            'Accept-Encoding': 'gzip, deflate',
            'Connection': 'Keep-Alive',
            'Host': 'tuchong.com',
            'Cookie': 'PHPSESSID=rqp9t3p1p0n4qkr3kt0f24vd44; webp_enabled=1; _ga=GA1.2.518965677.1534150590; log_web_id=5010412077; email=576063964%40qq.com; token=234ef3d22ce69f5f; _gid=GA1.2.1159319752.1534333971; _gat=1',
            'Referer':'https://tuchong.com/1890400/'
        }
        try:
            context = ssl._create_unverified_context()
            request_data = request.Request(url, headers=webheader)
            response = urlopen(request_data, context=context).read()
            # print(response.decode('utf8'))
            return response.decode('utf8')
        except Exception as e:
            print(e)
            return False
    

    然后报错

     'utf-8' codec can't decode byte 0x8b
    

    后来才发现,是压缩的问题,去掉这个就好了

    'Accept-Encoding': 'gzip, deflate',

    相关文章

      网友评论

          本文标题:python爬虫报错 'utf-8' codec can't d

          本文链接:https://www.haomeiwen.com/subject/qajnbftx.html