爬虫学习(2)—Python requests库

作者: 罗汉堂主 | 来源:发表于2019-10-14 18:47 被阅读0次

[Python]从Web解析到网络空间（一些第三方库的简要介绍）
2019-01-01
tenliu的爬虫-抓包分析
tenliu的爬虫-python的urllib库
tenliu的爬虫-python库urllib、urllib2、
tenliu的爬虫-urllib2学习
tenliu的爬虫-requests学习
BeautifulSoup requests 爬虫初体验
python3 爬虫学习python爬虫库-requests使用
bs4是非常牛逼的爬虫库！深度解析爬虫利器，轻松获得网站信息！

直接上代码

 import requests

response = requests.get('http://www.baidu.com')

print(response.text)      # 内容
print(response.content)    # 内容
print(response.status_code)    # 状态码
print(response.request.headers)    # 请求头
print(response.headers)    # 响应头

状态码用法

# 断言， 假设状态码是200
assert response.status_code == 200 # 这里的断言什么都不会发生，正常运行

# 如果假设状态码是300，则会报错，断言错误
assert response.status_code == 300
Traceback (most recent call last):
  File "E:/ProjectSaas/mitmproxy/spider/spider_practice.py", line 11, in <module>
    assert response.status_code == 300
AssertionError

所以在爬虫中处理大量请求的时候状态码及断言的使用起到了很大的作用

发送带headers的请求

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'
}
response = requests.get('http://www.baidu.com', headers=headers)

发送带参数的请求

start_url = 'http://www.baidu.com/s'
headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'
}
params = {
    'wd': 'python'
}

response = requests.get(start_url, headers=headers, params=params)

发送post请求

response = requests.get(start_url, headers=headers, data=data)

使用代理
爬虫为什么要使用代理？

让服务器以为不是同一个客户端在请求

防止我们真实地址被泄露，防止被追究

代理的流程

代理流程

proxise = {
    'http': 'http://ip:port',
    'https': 'https://ip:port'
}
response = requests.get(start_url, proxise=proxise)

使用代理Ip地址需注意的几点

首先准备一堆的ip组成ip池

随机选择代理ip

检查ip的可用性：判断ip地址的质量，超时参数等

requests模拟登陆

1.携带cookie请求，需要用到requests库中的session对象

登陆成功之后需cookie才能访问之后的页面

先试用session发送请求，登录网站，把coolie保存到session中

再使用session请求登陆之后才能访问的网站，session能够自动的携带登陆成功时保存在其中的cookie，进行请求

2.不发送post请求，在页面登陆之后直接拿到cookie进行登录

不需要自动的进行抓取

cookie过期时间很长的网站

在coolie过期之前能够拿到所有的数据，比较麻烦

配合其他程序一起使用，其他程序专门获取cookie，当前程序专门请求页面

网友评论

本文标题：爬虫学习(2)—Python requests库

本文链接：https://www.haomeiwen.com/subject/izccmctx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

爬虫学习(2)—Python requests库

使用代理Ip地址需注意的几点

requests模拟登陆

相关文章

[Python]从Web解析到网络空间（一些第三方库的简要介绍）

2019-01-01

tenliu的爬虫-抓包分析

tenliu的爬虫-python的urllib库

tenliu的爬虫-python库urllib、urllib2、

tenliu的爬虫-urllib2学习

tenliu的爬虫-requests学习

BeautifulSoup requests 爬虫初体验

python3 爬虫学习python爬虫库-requests使用

bs4是非常牛逼的爬虫库！深度解析爬虫利器，轻松获得网站信息！

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读