如果你尝试了我在Python爬虫(二)Requests库题中讲述的response方式,发现有可能会获取不到网页源码(登陆知乎一定可以获取不到)。
data:image/s3,"s3://crabby-images/e9108/e9108033c81e1bbbeb18a998cbdfc21a1e0a9693" alt=""
写一个简单的例子:
import requests
url = 'https://www.zhihu.com.'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.119 Safari/537.36'}
response = requests.get(url, headers=headers)
# response.encoding = 'utf-8'
print(response.text)
data:image/s3,"s3://crabby-images/23ae6/23ae66f1826c01945035134ee4d9be2706b597d1" alt=""
如果User-Agent还是不能够获取正确解码后的字符串,我们还可以再headers字典中,加入referer、cookie参数。
网友评论