requests的学习笔记

作者: NewForMe | 来源:发表于2018-08-10 10:33 被阅读0次

requests的学习笔记
Node.js 笔记四：简单爬虫
Python学习笔记-Requests
python requests学习笔记
requests模块
2018-06-15 Python Requests-学习笔记(
python 常用内建模块之requests
requests笔记
Python网络爬虫04——requests_html库
python爬虫day-11（requests库基本用法）

虽然python自带的urllib库能够发起请求，但它对加请求头，参数等操作都有小小的不便，因此requests库就应运而生，相对于urllib发起请求requests更加简便，下面就来了解一下requests的基本使用。

一、GET请求

下面先写一个例子：

import requests

r = requests.get('http://httpbin.org/get')
print(r.text)

运行结果返回：

{
  "args": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.10.0"
  }, 
  "origin": "122.4.215.33", 
  "url": "http://httpbin.org/get"
}

可以发现我们成功发起了 GET 请求，返回的结果中包含了 Request Headers、URL、IP 等信息。
添加参数，添加请求头，也是只需构建好对应字典放入方法参数里即可：

import requests

data = {
    'name': 'germey',
    'age': 22
}
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'
}
r = requests.get("http://httpbin.org/get", headers=headers,params=data)
print(r.text)

二、抓取二进制数据

我们都知道，图片、音频、视频这些文件都是本质上由二进制码组成的，由于有特定的保存格式和对应的解析方式，我们才可以看到这些形形色色的多媒体。所以想要抓取他们，那就需要拿到他们的二进制码。
下面我们以 GitHub 的站点图标为例来感受一下：

import requests

r = requests.get("https://github.com/favicon.ico")
print(r.text)
print(r.content)

运行结果：

image.png
前再行是r.text的结果，后一行是r.content的结果，可以看到用r.text读取二进制内容是会得到乱码，所以当我们要获取音频，视频，图片等二进制内容的时候就要用content，如果是获取网页html内容就是用text。

三、POST请求

在前面我们了解了最基本的 GET 请求，另外一种比较常见的请求方式就是 POST 了，就像模拟表单提交一样，将一些数据提交到某个链接。
使用 Request 是实现 POST 请求同样非常简单。
我们先用一个实例来感受一下：

import requests

data = {'name': 'germey', 'age': '22'}
r = requests.post("http://httpbin.org/post", data=data)
print(r.text)

运行结果如下：

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "age": "22", 
    "name": "germey"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Content-Length": "18", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.10.0"
  }, 
  "json": null, 
  "origin": "182.33.248.131", 
  "url": "http://httpbin.org/post"
}

可以发现，成功获得了返回结果，返回结果中的 form 部分就是提交的数据，那么这就证明 POST 请求成功发送了。

四、Response

发送 Request 之后，得到的自然就是 Response，在上面的实例中我们使用了 text 和 content 获取了 Response 内容，不过还有很多属性和方法可以获取其他的信息，比如状态码 Status Code、Headers、Cookies 等信息。

import requests

r = requests.get('http://www.jianshu.com')
print(type(r.status_code), r.status_code)
print(type(r.headers), r.headers)
print(type(r.cookies), r.cookies)
print(type(r.url), r.url)
print(type(r.history), r.history)

在这里分别打印输出了 status_code 属性得到状态码， headers 属性得到 Response Headers，cookies 属性得到 Cookies，url 属性得到 URL，history 属性得到请求历史。
运行结果如下：

<class 'int'> 200
<class 'requests.structures.CaseInsensitiveDict'> {'X-Runtime': '0.006363', 'Connection': 'keep-alive', 'Content-Type': 'text/html; charset=utf-8', 'X-Content-Type-Options': 'nosniff', 'Date': 'Sat, 27 Aug 2016 17:18:51 GMT', 'Server': 'nginx', 'X-Frame-Options': 'DENY', 'Content-Encoding': 'gzip', 'Vary': 'Accept-Encoding', 'ETag': 'W/"3abda885e0e123bfde06d9b61e696159"', 'X-XSS-Protection': '1; mode=block', 'X-Request-Id': 'a8a3c4d5-f660-422f-8df9-49719dd9b5d4', 'Transfer-Encoding': 'chunked', 'Set-Cookie': 'read_mode=day; path=/, default_font=font2; path=/, _session_id=xxx; path=/; HttpOnly', 'Cache-Control': 'max-age=0, private, must-revalidate'}
<class 'requests.cookies.RequestsCookieJar'> <RequestsCookieJar[<Cookie _session_id=xxx for www.jianshu.com/>, <Cookie default_font=font2 for www.jianshu.com/>, <Cookie read_mode=day for www.jianshu.com/>]>
<class 'str'> http://www.jianshu.com/
<class 'list'> []

session_id 过长在此简写。可以看到，headers 还有 cookies 这两个属性得到的结果分别是 CaseInsensitiveDict 和 RequestsCookieJar 类型。
在这里 Status Code 常用来判断请求是否成功，Requests 还提供了一个内置的 Status Code 查询对象 requests.codes。

五、高级使用

1. 会话对象
  会话对象让你能够跨请求保持某些参数。它也会在同一个 Session 实例发出的所有请求之间保持 cookie，期间使用 urllib3 的 connection pooling 功能。所以如果你向同一主机发送多个请求，底层的 TCP 连接将会被重用，从而带来显著的性能提升。
  我们来跨请求保持一些 cookie:

s = requests.Session()

s.get('http://httpbin.org/cookies/set/sessioncookie/123456789')
r = s.get("http://httpbin.org/cookies")

print(r.text)
# '{"cookies": {"sessioncookie": "123456789"}}'

不过需要注意，就算使用了会话，方法级别的参数也不会被跨请求保持。下面的例子只会和第一个请求发送 cookie ，而非第二个：

s = requests.Session()

r = s.get('http://httpbin.org/cookies', cookies={'from-my': 'browser'})
print(r.text)
# '{"cookies": {"from-my": "browser"}}'

r = s.get('http://httpbin.org/cookies')
print(r.text)
# '{"cookies": {}}'

会话还可以用作前后文管理器：

with requests.Session() as s:
    s.get('http://httpbin.org/cookies/set/sessioncookie/123456789')

这样就能确保 with 区块退出后会话能被关闭，即使发生了异常也一样。

1. 代理
  如果需要使用代理，你可以通过为任意请求方法提供proxies 参数来配置单个请求:

import requests

proxies = {
  "http": "http://10.10.1.10:3128",
  "https": "http://10.10.1.10:1080",
}

requests.get("http://example.org", proxies=proxies)

你也可以通过环境变量 HTTP_PROXY 和 HTTPS_PROXY 来配置代理。

$ export HTTP_PROXY="http://10.10.1.10:3128"
$ export HTTPS_PROXY="http://10.10.1.10:1080"

$ python
>>> import requests
>>> requests.get("http://example.org")

若你的代理需要使用HTTP Basic Auth，可以使用http://user:password@host/语法：

proxies = {
    "http": "http://user:pass@10.10.1.10:3128/",
}

要为某个特定的连接方式或者主机设置代理，使用 scheme://hostname 作为 key，它会针对指定的主机和连接方式进行匹配。

proxies = {'http://10.20.1.128': 'http://10.10.1.10:5323'}

注意，代理 URL 必须包含连接方式。

1. 超时（timeout）
  为防止服务器不能及时响应，大部分发至外部服务器的请求都应该带着 timeout 参数。在默认情况下，除非显式指定了 timeout 值，requests 是不会自动进行超时处理的。如果没有 timeout，你的代码可能会挂起若干分钟甚至更长时间。
  连接超时指的是在你的客户端实现到远端机器端口的连接时（对应的是connect()_），Request 会等待的秒数。一个很好的实践方法是把连接超时设为比 3 的倍数略大的一个数值，因为 TCP 数据包重传窗口 (TCP packet retransmission window) 的默认大小是 3。
  一旦你的客户端连接到了服务器并且发送了 HTTP 请求，读取超时指的就是客户端等待服务器发送请求的时间。（特定地，它指的是客户端要等待服务器发送字节之间的时间。在 99.9% 的情况下这指的是服务器发送第一个字节之前的时间）。
  如果你制订了一个单一的值作为 timeout，如下所示：

r = requests.get('https://github.com', timeout=5)

这一 timeout 值将会用作 connect 和 read 二者的 timeout。如果要分别制定，就传入一个元组：

r = requests.get('https://github.com', timeout=(3.05, 27))

如果远端服务器很慢，你可以让 Request 永远等待，传入一个 None 作为 timeout 值，然后就冲咖啡去吧。

r = requests.get('https://github.com', timeout=None)

更多的requests高级使用可以访问：http://docs.python-requests.org/zh_CN/latest/user/advanced.html

requests的学习笔记
虽然python自带的urllib库能够发起请求，但它对加请求头，参数等操作都有小小的不便，因此requests库...
Node.js 笔记四：简单爬虫
Node.js 笔记四：爬虫 Python 笔记七：Requests爬虫技巧讲解了一些requests的技巧。同...
Python学习笔记-Requests
1、安装终端cmd 输入： pip install requests 2、发送请求 import request...
python requests学习笔记
https://blog.csdn.net/qq_40961831/article/details/104897387
requests模块
为什么要学习requests，而不是urllib？ requests的底层实现就是urllib requests在...
2018-06-15 Python Requests-学习笔记(
转载： Python Requests-学习笔记(4)-定制请求头和POST 定制请求头如果你想为请求添加HTTP...
python 常用内建模块之requests
python学习笔记，特做记录，分享给大家，希望对大家有所帮助。安装requests 1. pip安装reque...
requests笔记
requests笔记发送get请求：发送get请求，直接调用requests.get就可以了。想要发送什么类型...
Python网络爬虫04——requests_html库
详细学习requests_html库官方文档： http://html.python-requests.org/...
python爬虫day-11（requests库基本用法）
个人学习笔记，方便自己查阅，仅供参考，欢迎交流 requests库 1.基本用法 1.GET请求 HTTP中最常见...

requests的学习笔记

一、GET请求

二、抓取二进制数据

三、POST请求

四、Response

五、高级使用

相关文章

requests的学习笔记

Node.js 笔记四：简单爬虫

Python学习笔记-Requests

python requests学习笔记

requests模块

2018-06-15 Python Requests-学习笔记(

python 常用内建模块之requests

requests笔记

Python网络爬虫04——requests_html库

python爬虫day-11（requests库基本用法）

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读