requests是个非常好用的http库,但是我之前对它一知半解,这次初步得做了整理,加深了对它的理解,相信对其他同学也会有帮助。
- 返回码
- get请求
- request headers
- request body
- 检查发出的request
- Authentication
- SSL与https
- Session
- cookies
- 重定向历史
- timeout和max-retries
返回码
>>> response = requests.get('https://api.github.com')
>>> response.status_code
200
response自身也有bool值,但真值是一个范围:
if response: # True if the status code was between 200 and 400, and False otherwise.
print('Success!')
else:
print('An error has occurred.')
此外也可以通过response.raise_for_status()抛异常来判断是否错误:
# If the response was successful, no Exception will be raised
response.raise_for_status()
get请求
response.content是byte类型
>>> response = requests.get('https://api.github.com')
>>> response.content
b'{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}","emails_url":"https://api.github.com/user/emails","emojis_url":"https://api.github.com/emojis","events_url":"https://api.github.com/events","feeds_url":"https://api.github.com/feeds","followers_url":"https://api.github.com/user/followers","following_url":"https://api.github.com/user/following{/target}","gists_url":"https://api.github.com/gists{/gist_id}","hub_url":"https://api.github.com/hub","issue_search_url":"https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}","issues_url":"https://api.github.com/issues","keys_url":"https://api.github.com/user/keys","notifications_url":"https://api.github.com/notifications","organization_repositories_url":"https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}","organization_url":"https://api.github.com/orgs/{org}","public_gists_url":"https://api.github.com/gists/public","rate_limit_url":"https://api.github.com/rate_limit","repository_url":"https://api.github.com/repos/{owner}/{repo}","repository_search_url":"https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}","current_user_repositories_url":"https://api.github.com/user/repos{?type,page,per_page,sort}","starred_url":"https://api.github.com/user/starred{/owner}{/repo}","starred_gists_url":"https://api.github.com/gists/starred","team_url":"https://api.github.com/teams","user_url":"https://api.github.com/users/{user}","user_organizations_url":"https://api.github.com/user/orgs","user_repositories_url":"https://api.github.com/users/{user}/repos{?type,page,per_page,sort}","user_search_url":"https://api.github.com/search/users?q={query}{&page,per_page,sort,order}"}'
response.text会自动decode成str, 内部用了response的headers中的信息或chardet.detect
来猜测编码格式
>>> response.text
'{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}","emails_url":"https://api.github.com/user/emails","emojis_url":"https://api.github.com/emojis","events_url":"https://api.github.com/events","feeds_url":"https://api.github.com/feeds","followers_url":"https://api.github.com/user/followers","following_url":"https://api.github.com/user/following{/target}","gists_url":"https://api.github.com/gists{/gist_id}","hub_url":"https://api.github.com/hub","issue_search_url":"https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}","issues_url":"https://api.github.com/issues","keys_url":"https://api.github.com/user/keys","notifications_url":"https://api.github.com/notifications","organization_repositories_url":"https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}","organization_url":"https://api.github.com/orgs/{org}","public_gists_url":"https://api.github.com/gists/public","rate_limit_url":"https://api.github.com/rate_limit","repository_url":"https://api.github.com/repos/{owner}/{repo}","repository_search_url":"https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}","current_user_repositories_url":"https://api.github.com/user/repos{?type,page,per_page,sort}","starred_url":"https://api.github.com/user/starred{/owner}{/repo}","starred_gists_url":"https://api.github.com/gists/starred","team_url":"https://api.github.com/teams","user_url":"https://api.github.com/users/{user}","user_organizations_url":"https://api.github.com/user/orgs","user_repositories_url":"https://api.github.com/users/{user}/repos{?type,page,per_page,sort}","user_search_url":"https://api.github.com/search/users?q={query}{&page,per_page,sort,order}"}'
也可以自定义编码格式:
>>> response.encoding = 'utf-8' # Optional: requests infers this internally
>>> response.text
'{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}","emails_url":"https://api.github.com/user/emails","emojis_url":"https://api.github.com/emojis","events_url":"https://api.github.com/events","feeds_url":"https://api.github.com/feeds","followers_url":"https://api.github.com/user/followers","following_url":"https://api.github.com/user/following{/target}","gists_url":"https://api.github.com/gists{/gist_id}","hub_url":"https://api.github.com/hub","issue_search_url":"https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}","issues_url":"https://api.github.com/issues","keys_url":"https://api.github.com/user/keys","notifications_url":"https://api.github.com/notifications","organization_repositories_url":"https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}","organization_url":"https://api.github.com/orgs/{org}","public_gists_url":"https://api.github.com/gists/public","rate_limit_url":"https://api.github.com/rate_limit","repository_url":"https://api.github.com/repos/{owner}/{repo}","repository_search_url":"https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}","current_user_repositories_url":"https://api.github.com/user/repos{?type,page,per_page,sort}","starred_url":"https://api.github.com/user/starred{/owner}{/repo}","starred_gists_url":"https://api.github.com/gists/starred","team_url":"https://api.github.com/teams","user_url":"https://api.github.com/users/{user}","user_organizations_url":"https://api.github.com/user/orgs","user_repositories_url":"https://api.github.com/users/{user}/repos{?type,page,per_page,sort}","user_search_url":"https://api.github.com/search/users?q={query}{&page,per_page,sort,order}"}'
使用response.json()可以自动反序列化,用来简化json.loads(response.text)
>>> response.json()
{'current_user_url': 'https://api.github.com/user', 'current_user_authorizations_html_url': 'https://github.com/settings/connections/applications{/client_id}', 'authorizations_url': 'https://api.github.com/authorizations', 'code_search_url': 'https://api.github.com/search/code?q={query}{&page,per_page,sort,order}', 'commit_search_url': 'https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}', 'emails_url': 'https://api.github.com/user/emails', 'emojis_url': 'https://api.github.com/emojis', 'events_url': 'https://api.github.com/events', 'feeds_url': 'https://api.github.com/feeds', 'followers_url': 'https://api.github.com/user/followers', 'following_url': 'https://api.github.com/user/following{/target}', 'gists_url': 'https://api.github.com/gists{/gist_id}', 'hub_url': 'https://api.github.com/hub', 'issue_search_url': 'https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}', 'issues_url': 'https://api.github.com/issues', 'keys_url': 'https://api.github.com/user/keys', 'notifications_url': 'https://api.github.com/notifications', 'organization_repositories_url': 'https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}', 'organization_url': 'https://api.github.com/orgs/{org}', 'public_gists_url': 'https://api.github.com/gists/public', 'rate_limit_url': 'https://api.github.com/rate_limit', 'repository_url': 'https://api.github.com/repos/{owner}/{repo}', 'repository_search_url': 'https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}', 'current_user_repositories_url': 'https://api.github.com/user/repos{?type,page,per_page,sort}', 'starred_url': 'https://api.github.com/user/starred{/owner}{/repo}', 'starred_gists_url': 'https://api.github.com/gists/starred', 'team_url': 'https://api.github.com/teams', 'user_url': 'https://api.github.com/users/{user}', 'user_organizations_url': 'https://api.github.com/user/orgs', 'user_repositories_url': 'https://api.github.com/users/{user}/repos{?type,page,per_page,sort}', 'user_search_url': 'https://api.github.com/search/users?q={query}{&page,per_page,sort,order}'}
get的params参数可以用字典方式方式设置:
response = requests.get(
'https://api.github.com/search/repositories',
params={'q': 'requests+language:python'},
)
print(response.request.url)
# 输出
# https://api.github.com/search/repositories?q=requests%2Blanguage%3Apython
用tuple也是等效的
requests.get(
'https://api.github.com/search/repositories',
params=[('q', 'requests+language:python')],
)
print(response.request.url)
# 输出
# https://api.github.com/search/repositories?q=requests%2Blanguage%3Apython
上面2种方法自动做URL编码, 若不想做编码,可以传入bytes类型
response = requests.get(
'https://api.github.com/search/repositories',
params=b'q=requests+language:python',
)
print(response.request.url)
# 输出
# https://api.github.com/search/repositories?q=requests+language:python
request headers
请求的时候,可以传入headers参数:
response = requests.get(
'https://api.github.com/search/repositories',
params={'q': 'requests+language:python'},
headers={'Accept': 'application/vnd.github.v3.text-match+json'},
)
request body
当用data传入时,可以是dict也可以是tuple,request的headers中content-type=application/x-www-form-urlencoded;
当用json传入时,request的headers中content-type=application/json
看了下面的代码和输出很容易就理解了
response = requests.post('https://httpbin.org/post', data={'key': 'value'})
print(response.json().get('form'))
print(response.request.headers['Content-Type'])
print(response.request.body)
print()
response = requests.post('https://httpbin.org/post', json={'key':' value'})
json_response = response.json()
print(type(json_response['data']))
print(response.request.headers['Content-Type'])
print(response.request.body)
print()
response = requests.post('https://httpbin.org/post', data=[('key', 'value')])
print(response.json().get('form'))
print(response.request.headers['Content-Type'])
print(response.request.body)
输出
{'key': 'value'}
application/x-www-form-urlencoded
key=value
<class 'str'>
application/json
b'{"key": " value"}'
{'key': 'value'}
application/x-www-form-urlencoded
key=value
检查发出的request
request是一个PreparedRequest对象,可以通过response.request查看本次请求的request
response = requests.post('https://httpbin.org/post', json={'key': 'value'})
print(type(response.request.body))
print(response.request.body)
print(response.request.headers)
print()
response = requests.post('https://httpbin.org/post', data={'key': 'value'}, cookies={'xx': 'yy', 'zz': 'aa'})
print(type(response.request))
print(type(response.request.body))
print(response.request.body)
print(response.request.headers)
输出
<class 'bytes'>
b'{"key": "value"}'
{'User-Agent': 'python-requests/2.21.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Length': '16', 'Content-Type': 'application/json'}
<class 'requests.models.PreparedRequest'>
<class 'str'>
key=value
{'User-Agent': 'python-requests/2.21.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Cookie': 'xx=yy; zz=aa', 'Content-Length': '9', 'Content-Type': 'application/x-www-form-urlencoded'}
Authentication
在访问某些服务时,需要提供认证,requests提供了3种认证方式(HTTPBasicAuth, HTTPProxyAuth, HTTPDigestAuth
),还可以自定义认证方式.
其中HTTPBasicAuth的原理是, :
拼接username
和password
后做base64编码放在headers中, https://en.wikipedia.org/wiki/Basic_access_authentication
from base64 import b64encode
print(b64encode(b':'.join((b'username', b'password'))))
请求时传入auth参数即可,默认是HTTPBasicAuth
from getpass import getpass
# https://en.wikipedia.org/wiki/Basic_access_authentication
response = requests.get('https://api.github.com/user', auth=('username', getpass()))
print(response.status_code)
print(response.request.headers)
response = requests.get('https://api.github.com/user')
print(response.request.headers)
print(response.status_code)
print(response.text)
输出
200
{'User-Agent': 'python-requests/2.21.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Authorization': 'Basic bWluaXaflrMjAxjp5adfUwNzgdfyMzasfdMA=='}
{'User-Agent': 'python-requests/2.21.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
401
{"message":"Requires authentication","documentation_url":"https://developer.github.com/v3/users/#get-the-authenticated-user"}
也可以明确指定auth, 效果同上
response = requests.get(
'https://api.github.com/user',
auth=HTTPBasicAuth('username', getpass())
)
也可以自定义认证, 继承AuthBase
,实现__call__
即可:
"""
自定义Auth,其实就是自定义request.headers中的认证
"""
import requests
from requests.auth import AuthBase
# AuthBase有3个子类: HTTPBasicAuth, HTTPProxyAuth, HTTPDigestAuth. 可以查询google或wiki了解它们的定义
class TokenAuth(AuthBase):
"""Implements a custom authentication scheme."""
def __init__(self, token):
self.token = token
def __call__(self, r):
"""Attach an API token to a custom auth header."""
r.headers['X-TokenAuth'] = f'{self.token}' # Python 3.6+
return r
response = requests.get('https://httpbin.org/get', auth=TokenAuth('12345abcde-token'))
print(response.request.headers['X-TokenAuth'])
# 输出
# 12345abcde-token
SSL与https
https=http+SSL,requests请求https时,会默认做SLL认证。如果不想做认证,可以设置verify=False
>>> requests.get('https://api.github.com', verify=False)
InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
<Response [200]>
Session
官方文档见:http://docs.python-requests.org/en/master/user/advanced/#session-objects
session有2个作用:
- persist parameters across requests
- When your app makes a connection to a server using a Session, it keeps that connection around in a connection pool. When your app wants to connect to the same server again, it will reuse a connection from the pool rather than establishing a new one.
session-level的dict会被session persist, method-level的dict不会被session persist, method-level会覆盖session-level的headers (http://docs.python-requests.org/en/master/user/advanced/#session-objects
), session-level
和method-level
见下方代码:
s = requests.Session()
s.auth = ('user', 'pass') # session-level的dict会被session persist
s.headers.update({'x-test': 'true'}) # session-level的dict会被session persist
s.get('https://httpbin.org/headers', auth=('yangkai', 'pass')) # method-level的dict不会被session persist
我做了一些实验,代码可以查看gist:session_test.py
需要注意的是session不是线程安全的(this issue),因此多线程不能共用同一个session(因为操作系统会不断切换线程,共用一个session可能导致该session内部的状态混乱),要在每个线程各自创建一个session(同一个线程内代码是顺序执行的,就不用担心),以下是一种多线程session实现方案(使用了threading.local()
):
import concurrent.futures
import requests
import threading
import time
thread_local = threading.local()
def get_session():
if not getattr(thread_local, "session", None):
thread_local.session = requests.Session()
return thread_local.session
def download_site(url):
session = get_session()
with session.get(url) as response:
print(f"Read {len(response.content)} from {url}")
def download_all_sites(sites):
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
executor.map(download_site, sites)
if __name__ == "__main__":
sites = [
"http://www.jython.org",
"http://olympus.realpython.org/dice",
] * 80
start_time = time.time()
download_all_sites(sites)
duration = time.time() - start_time
print(f"Downloaded {len(sites)} in {duration} seconds")
cookies
官方文档见:
http://docs.python-requests.org/en/master/api/#api-cookies
http://cn.python-requests.org/zh_CN/latest/user/quickstart.html#cookie
session-level设置cookies
with requests.Session() as s:
jar = requests.cookies.RequestsCookieJar()
jar.set('a', 'b')
jar.set('x', 'y')
s.cookies = jar
method-level设置cookies
with requests.Session() as s:
cookies = dict(cookies_are='working')
r = s.get("http://httpbin.org/cookies", cookies=cookies)
从请求中查看cookies, 建议通过headers查看
print(response.request._cookies) # RequestsCookieJar对象, 不能确保请求确实带上了该cookies,`_`说明了这一点
# 或者
print(response.request.headers['Cookie']) # 真正的请求headers
查看刚刚设置的session-level级别的cookies
print(s.cookies)
从响应中查看Set-Cookie
通过response.cookies查看
with requests.Session() as s:
r = s.get('http://httpbin.org/cookies/set/sessioncookie/123456789')
print(r.history)
print(r.history[1].headers)
print(r.history[1].cookies) # 查看源码可知, 取的是响应headers的`Set-Cookie`的value, 是个RequestsCookieJar对象
请看我的gist: cookies_test.py
重定向历史
重定向的历史可以通过response.history获得,它是一个由Response对象构成的列表,而返回的那个response其实是最后一次跳转的response.
import requests
url = 'http://httpbin.org/cookies/set/sessioncookie/123456789'
with requests.Session() as s:
response = s.get(url)
#: A list of :class:`Response <Response>` objects from
#: the history of the Request. Any redirect responses will end
#: up here. The list is sorted from the oldest to the most recent request.
print(response.history)
for resp in response.history:
print(resp, resp.request.url, resp.headers)
print(response, response.request.url)
从输出可以看出,跳了3次:
http://httpbin.org/cookies/set/sessioncookie/123456789 ->
https://httpbin.org/cookies/set/sessioncookie/123456789 ->
https://httpbin.org/cookies
[<Response [301]>, <Response [302]>]
<Response [301]> http://httpbin.org/cookies/set/sessioncookie/123456789 {'Connection': 'close', 'Cache-Control': 'max-age:86400', 'Date': 'Friday, 08-Feb-19 12:49:25 CST', 'Expires': 'Sat, 09 Feb 2019 12:49:25 GMT', 'Keep-Alive': 'timeout=38', 'Location': 'https://httpbin.org/cookies/set/sessioncookie/123456789', 'Content-Length': '0'}
<Response [302]> https://httpbin.org/cookies/set/sessioncookie/123456789 {'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Date': 'Fri, 08 Feb 2019 04:49:26 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Content-Length': '223', 'Location': '/cookies', 'Set-Cookie': 'sessioncookie=123456789; Secure; Path=/', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true', 'Via': '1.1 vegur'}
<Response [200]> https://httpbin.org/cookies
有个概念:重定向与转发:
重定向是客户端行为,浏览器会发起多次请求
转发是服务器行为,浏览器只会发起一次请求
我们这里是重定向,浏览器里面确实发起了3次跳转(顺序从下往上):
重定向.png
timeout和max-retries
Timeout
请求总时长=连接时长+读时长, requests默认无超时时间,设置 timeout
参数可以指定超时时间,可以传一个数字,也可以传一个tuple,超时则会抛出异常。(以上是我的理解,不保证正确)
You can also pass a tuple to
timeout
with the first element being a connect timeout (the time it allows for the client to establish a connection to the server), and the second being a read timeout (the time it will wait on a response once your client has established a connection).
import requests
from requests.exceptions import Timeout, ConnectionError
try:
# 连接超时
response = requests.get('https://api.github.com', timeout=(0.1, 5))
except ConnectionError as e:
print(e, type(e))
try:
# 读超时
response = requests.get('https://api.github.com', timeout=(1, 0.1))
except Timeout as e:
print(e, type(e))
try:
# 整体超时,连接就超时了
response = requests.get('https://api.github.com', timeout=0.1)
except Exception as e:
print(e, type(e))
try:
# 整体超时, 连接成功了,但是读超时了
response = requests.get('https://api.github.com', timeout=0.3)
except Exception as e:
print(e, type(e))
Max Retries
requests默认失败不重试,可以通过Transport Adapter指定失败后的重试次数,下面的代码会重试最多3次:
import requests
from requests.adapters import HTTPAdapter
from requests.exceptions import ConnectionError
github_adapter = HTTPAdapter(max_retries=3)
session = requests.Session()
# Use `github_adapter` for all requests to endpoints that start with this URL
session.mount('https://api.github.com', github_adapter)
try:
session.get('https://api.github.com')
except ConnectionError as ce:
print(ce)
其实我建议不用timeout或adapter做超时和重试处理,有一个非常好的库来做这件事:tenacity.
import requests
from tenacity import retry, stop_after_attempt, stop_after_delay
# 不断重试,直到以下任一种情况发生 1.执行成功不抛异常,2.耗时达到3秒,3.重试达到4次;
# 若最后还未成功则抛RetryError异常
@retry(stop=(stop_after_delay(3) | stop_after_attempt(4)))
def get_github_api():
with requests.Session() as session:
return session.get('https://api.github.com')
if __name__ == '__main__':
try:
response = get_github_api()
print(response)
except RetryError as ce:
print(ce)
参考文献:
网友评论