美文网首页
requests快速入门

requests快速入门

作者: ThomasYoungK | 来源:发表于2019-02-08 13:23 被阅读70次

    requests是个非常好用的http库,但是我之前对它一知半解,这次初步得做了整理,加深了对它的理解,相信对其他同学也会有帮助。

    返回码

    >>> response = requests.get('https://api.github.com')
    >>> response.status_code
    200
    

    response自身也有bool值,但真值是一个范围:

    if response:  # True if the status code was between 200 and 400, and False otherwise.
        print('Success!')
    else:
        print('An error has occurred.')
    

    此外也可以通过response.raise_for_status()抛异常来判断是否错误:

    # If the response was successful, no Exception will be raised
    response.raise_for_status()
    

    get请求

    response.content是byte类型

    >>> response = requests.get('https://api.github.com')
    >>> response.content
    b'{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}","emails_url":"https://api.github.com/user/emails","emojis_url":"https://api.github.com/emojis","events_url":"https://api.github.com/events","feeds_url":"https://api.github.com/feeds","followers_url":"https://api.github.com/user/followers","following_url":"https://api.github.com/user/following{/target}","gists_url":"https://api.github.com/gists{/gist_id}","hub_url":"https://api.github.com/hub","issue_search_url":"https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}","issues_url":"https://api.github.com/issues","keys_url":"https://api.github.com/user/keys","notifications_url":"https://api.github.com/notifications","organization_repositories_url":"https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}","organization_url":"https://api.github.com/orgs/{org}","public_gists_url":"https://api.github.com/gists/public","rate_limit_url":"https://api.github.com/rate_limit","repository_url":"https://api.github.com/repos/{owner}/{repo}","repository_search_url":"https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}","current_user_repositories_url":"https://api.github.com/user/repos{?type,page,per_page,sort}","starred_url":"https://api.github.com/user/starred{/owner}{/repo}","starred_gists_url":"https://api.github.com/gists/starred","team_url":"https://api.github.com/teams","user_url":"https://api.github.com/users/{user}","user_organizations_url":"https://api.github.com/user/orgs","user_repositories_url":"https://api.github.com/users/{user}/repos{?type,page,per_page,sort}","user_search_url":"https://api.github.com/search/users?q={query}{&page,per_page,sort,order}"}'
    

    response.text会自动decode成str, 内部用了response的headers中的信息或chardet.detect来猜测编码格式

    >>> response.text
    '{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}","emails_url":"https://api.github.com/user/emails","emojis_url":"https://api.github.com/emojis","events_url":"https://api.github.com/events","feeds_url":"https://api.github.com/feeds","followers_url":"https://api.github.com/user/followers","following_url":"https://api.github.com/user/following{/target}","gists_url":"https://api.github.com/gists{/gist_id}","hub_url":"https://api.github.com/hub","issue_search_url":"https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}","issues_url":"https://api.github.com/issues","keys_url":"https://api.github.com/user/keys","notifications_url":"https://api.github.com/notifications","organization_repositories_url":"https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}","organization_url":"https://api.github.com/orgs/{org}","public_gists_url":"https://api.github.com/gists/public","rate_limit_url":"https://api.github.com/rate_limit","repository_url":"https://api.github.com/repos/{owner}/{repo}","repository_search_url":"https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}","current_user_repositories_url":"https://api.github.com/user/repos{?type,page,per_page,sort}","starred_url":"https://api.github.com/user/starred{/owner}{/repo}","starred_gists_url":"https://api.github.com/gists/starred","team_url":"https://api.github.com/teams","user_url":"https://api.github.com/users/{user}","user_organizations_url":"https://api.github.com/user/orgs","user_repositories_url":"https://api.github.com/users/{user}/repos{?type,page,per_page,sort}","user_search_url":"https://api.github.com/search/users?q={query}{&page,per_page,sort,order}"}'
    

    也可以自定义编码格式:

    >>> response.encoding = 'utf-8' # Optional: requests infers this internally
    >>> response.text
    '{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}","emails_url":"https://api.github.com/user/emails","emojis_url":"https://api.github.com/emojis","events_url":"https://api.github.com/events","feeds_url":"https://api.github.com/feeds","followers_url":"https://api.github.com/user/followers","following_url":"https://api.github.com/user/following{/target}","gists_url":"https://api.github.com/gists{/gist_id}","hub_url":"https://api.github.com/hub","issue_search_url":"https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}","issues_url":"https://api.github.com/issues","keys_url":"https://api.github.com/user/keys","notifications_url":"https://api.github.com/notifications","organization_repositories_url":"https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}","organization_url":"https://api.github.com/orgs/{org}","public_gists_url":"https://api.github.com/gists/public","rate_limit_url":"https://api.github.com/rate_limit","repository_url":"https://api.github.com/repos/{owner}/{repo}","repository_search_url":"https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}","current_user_repositories_url":"https://api.github.com/user/repos{?type,page,per_page,sort}","starred_url":"https://api.github.com/user/starred{/owner}{/repo}","starred_gists_url":"https://api.github.com/gists/starred","team_url":"https://api.github.com/teams","user_url":"https://api.github.com/users/{user}","user_organizations_url":"https://api.github.com/user/orgs","user_repositories_url":"https://api.github.com/users/{user}/repos{?type,page,per_page,sort}","user_search_url":"https://api.github.com/search/users?q={query}{&page,per_page,sort,order}"}'
    

    使用response.json()可以自动反序列化,用来简化json.loads(response.text)

    >>> response.json()
    {'current_user_url': 'https://api.github.com/user', 'current_user_authorizations_html_url': 'https://github.com/settings/connections/applications{/client_id}', 'authorizations_url': 'https://api.github.com/authorizations', 'code_search_url': 'https://api.github.com/search/code?q={query}{&page,per_page,sort,order}', 'commit_search_url': 'https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}', 'emails_url': 'https://api.github.com/user/emails', 'emojis_url': 'https://api.github.com/emojis', 'events_url': 'https://api.github.com/events', 'feeds_url': 'https://api.github.com/feeds', 'followers_url': 'https://api.github.com/user/followers', 'following_url': 'https://api.github.com/user/following{/target}', 'gists_url': 'https://api.github.com/gists{/gist_id}', 'hub_url': 'https://api.github.com/hub', 'issue_search_url': 'https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}', 'issues_url': 'https://api.github.com/issues', 'keys_url': 'https://api.github.com/user/keys', 'notifications_url': 'https://api.github.com/notifications', 'organization_repositories_url': 'https://api.github.com/orgs/{org}/repos{?type,page,per_page,sort}', 'organization_url': 'https://api.github.com/orgs/{org}', 'public_gists_url': 'https://api.github.com/gists/public', 'rate_limit_url': 'https://api.github.com/rate_limit', 'repository_url': 'https://api.github.com/repos/{owner}/{repo}', 'repository_search_url': 'https://api.github.com/search/repositories?q={query}{&page,per_page,sort,order}', 'current_user_repositories_url': 'https://api.github.com/user/repos{?type,page,per_page,sort}', 'starred_url': 'https://api.github.com/user/starred{/owner}{/repo}', 'starred_gists_url': 'https://api.github.com/gists/starred', 'team_url': 'https://api.github.com/teams', 'user_url': 'https://api.github.com/users/{user}', 'user_organizations_url': 'https://api.github.com/user/orgs', 'user_repositories_url': 'https://api.github.com/users/{user}/repos{?type,page,per_page,sort}', 'user_search_url': 'https://api.github.com/search/users?q={query}{&page,per_page,sort,order}'}
    

    get的params参数可以用字典方式方式设置:

    response = requests.get(
        'https://api.github.com/search/repositories',
        params={'q': 'requests+language:python'},
    )
    print(response.request.url)  
    # 输出
    # https://api.github.com/search/repositories?q=requests%2Blanguage%3Apython
    

    用tuple也是等效的

    requests.get(
        'https://api.github.com/search/repositories',
        params=[('q', 'requests+language:python')],
    )
    print(response.request.url)  
    # 输出
    # https://api.github.com/search/repositories?q=requests%2Blanguage%3Apython
    

    上面2种方法自动做URL编码, 若不想做编码,可以传入bytes类型

    response = requests.get(
        'https://api.github.com/search/repositories',
        params=b'q=requests+language:python',
    )
    print(response.request.url)
    # 输出
    # https://api.github.com/search/repositories?q=requests+language:python
    

    request headers

    请求的时候,可以传入headers参数:

    response = requests.get(
        'https://api.github.com/search/repositories',
        params={'q': 'requests+language:python'},
        headers={'Accept': 'application/vnd.github.v3.text-match+json'},
    )
    

    request body

    当用data传入时,可以是dict也可以是tuple,request的headers中content-type=application/x-www-form-urlencoded;
    当用json传入时,request的headers中content-type=application/json
    看了下面的代码和输出很容易就理解了

    response = requests.post('https://httpbin.org/post', data={'key': 'value'})
    print(response.json().get('form'))
    print(response.request.headers['Content-Type'])
    print(response.request.body)
    
    print()
    response = requests.post('https://httpbin.org/post', json={'key':' value'})
    json_response = response.json()
    print(type(json_response['data']))
    print(response.request.headers['Content-Type'])
    print(response.request.body)
    
    print()
    response = requests.post('https://httpbin.org/post', data=[('key', 'value')])
    print(response.json().get('form'))
    print(response.request.headers['Content-Type'])
    print(response.request.body)
    

    输出

    {'key': 'value'}
    application/x-www-form-urlencoded
    key=value
    
    <class 'str'>
    application/json
    b'{"key": " value"}'
    
    {'key': 'value'}
    application/x-www-form-urlencoded
    key=value
    

    检查发出的request

    request是一个PreparedRequest对象,可以通过response.request查看本次请求的request

    response = requests.post('https://httpbin.org/post', json={'key': 'value'})
    print(type(response.request.body))
    print(response.request.body)
    print(response.request.headers)
    
    print()
    response = requests.post('https://httpbin.org/post', data={'key': 'value'}, cookies={'xx': 'yy', 'zz': 'aa'})
    print(type(response.request))
    print(type(response.request.body))
    print(response.request.body)
    print(response.request.headers)
    

    输出

    <class 'bytes'>
    b'{"key": "value"}'
    {'User-Agent': 'python-requests/2.21.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Length': '16', 'Content-Type': 'application/json'}
    
    <class 'requests.models.PreparedRequest'>
    <class 'str'>
    key=value
    {'User-Agent': 'python-requests/2.21.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Cookie': 'xx=yy; zz=aa', 'Content-Length': '9', 'Content-Type': 'application/x-www-form-urlencoded'}
    

    Authentication

    在访问某些服务时,需要提供认证,requests提供了3种认证方式(HTTPBasicAuth, HTTPProxyAuth, HTTPDigestAuth),还可以自定义认证方式.

    其中HTTPBasicAuth的原理是, :拼接usernamepassword后做base64编码放在headers中, https://en.wikipedia.org/wiki/Basic_access_authentication

    from base64 import b64encode
    print(b64encode(b':'.join((b'username', b'password'))))
    

    请求时传入auth参数即可,默认是HTTPBasicAuth

    from getpass import getpass
    # https://en.wikipedia.org/wiki/Basic_access_authentication
    response = requests.get('https://api.github.com/user', auth=('username', getpass()))
    print(response.status_code)
    print(response.request.headers)
    
    response = requests.get('https://api.github.com/user')
    print(response.request.headers)
    print(response.status_code)
    print(response.text)
    

    输出

    200
    {'User-Agent': 'python-requests/2.21.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Authorization': 'Basic bWluaXaflrMjAxjp5adfUwNzgdfyMzasfdMA=='}
    {'User-Agent': 'python-requests/2.21.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
    401
    {"message":"Requires authentication","documentation_url":"https://developer.github.com/v3/users/#get-the-authenticated-user"}
    

    也可以明确指定auth, 效果同上

    response = requests.get(
         'https://api.github.com/user',
         auth=HTTPBasicAuth('username', getpass())
    )
    

    也可以自定义认证, 继承AuthBase,实现__call__即可:

    """
    自定义Auth,其实就是自定义request.headers中的认证
    """
    import requests
    from requests.auth import AuthBase
    
    # AuthBase有3个子类: HTTPBasicAuth, HTTPProxyAuth, HTTPDigestAuth. 可以查询google或wiki了解它们的定义
    class TokenAuth(AuthBase):
        """Implements a custom authentication scheme."""
    
        def __init__(self, token):
            self.token = token
    
        def __call__(self, r):
            """Attach an API token to a custom auth header."""
            r.headers['X-TokenAuth'] = f'{self.token}'  # Python 3.6+
            return r
    
    
    response = requests.get('https://httpbin.org/get', auth=TokenAuth('12345abcde-token'))
    print(response.request.headers['X-TokenAuth'])
    
    # 输出
    # 12345abcde-token
    

    SSL与https

    https=http+SSL,requests请求https时,会默认做SLL认证。如果不想做认证,可以设置verify=False

    >>> requests.get('https://api.github.com', verify=False)
    InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
      InsecureRequestWarning)
    <Response [200]>
    

    Session

    官方文档见:http://docs.python-requests.org/en/master/user/advanced/#session-objects

    session有2个作用:

    1. persist parameters across requests
    2. When your app makes a connection to a server using a Session, it keeps that connection around in a connection pool. When your app wants to connect to the same server again, it will reuse a connection from the pool rather than establishing a new one.

    session-level的dict会被session persist, method-level的dict不会被session persist, method-level会覆盖session-level的headers (http://docs.python-requests.org/en/master/user/advanced/#session-objects
    ), session-levelmethod-level见下方代码:

    s = requests.Session()
    s.auth = ('user', 'pass')  # session-level的dict会被session persist
    s.headers.update({'x-test': 'true'})  # session-level的dict会被session persist
    s.get('https://httpbin.org/headers', auth=('yangkai', 'pass'))  # method-level的dict不会被session persist
    

    我做了一些实验,代码可以查看gist:session_test.py

    需要注意的是session不是线程安全的(this issue),因此多线程不能共用同一个session(因为操作系统会不断切换线程,共用一个session可能导致该session内部的状态混乱),要在每个线程各自创建一个session(同一个线程内代码是顺序执行的,就不用担心),以下是一种多线程session实现方案(使用了threading.local()):

    import concurrent.futures
    import requests
    import threading
    import time
    
    
    thread_local = threading.local()
    
    
    def get_session():
        if not getattr(thread_local, "session", None):
            thread_local.session = requests.Session()
        return thread_local.session
    
    
    def download_site(url):
        session = get_session()
        with session.get(url) as response:
            print(f"Read {len(response.content)} from {url}")
    
    
    def download_all_sites(sites):
        with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
            executor.map(download_site, sites)
    
    
    if __name__ == "__main__":
        sites = [
            "http://www.jython.org",
            "http://olympus.realpython.org/dice",
        ] * 80
        start_time = time.time()
        download_all_sites(sites)
        duration = time.time() - start_time
        print(f"Downloaded {len(sites)} in {duration} seconds")
    

    cookies

    官方文档见:
    http://docs.python-requests.org/en/master/api/#api-cookies
    http://cn.python-requests.org/zh_CN/latest/user/quickstart.html#cookie

    session-level设置cookies

    with requests.Session() as s:
        jar = requests.cookies.RequestsCookieJar()
        jar.set('a', 'b')
        jar.set('x', 'y')
        s.cookies = jar
    

    method-level设置cookies

    with requests.Session() as s:
        cookies = dict(cookies_are='working')
        r = s.get("http://httpbin.org/cookies", cookies=cookies)
    

    从请求中查看cookies, 建议通过headers查看

    print(response.request._cookies)   # RequestsCookieJar对象, 不能确保请求确实带上了该cookies,`_`说明了这一点
    # 或者
    print(response.request.headers['Cookie'])   # 真正的请求headers
    

    查看刚刚设置的session-level级别的cookies

    print(s.cookies)
    

    从响应中查看Set-Cookie

    通过response.cookies查看

    with requests.Session() as s:
        r = s.get('http://httpbin.org/cookies/set/sessioncookie/123456789')
        print(r.history)
        print(r.history[1].headers)
        print(r.history[1].cookies)   # 查看源码可知, 取的是响应headers的`Set-Cookie`的value, 是个RequestsCookieJar对象
    

    请看我的gist: cookies_test.py

    重定向历史

    重定向的历史可以通过response.history获得,它是一个由Response对象构成的列表,而返回的那个response其实是最后一次跳转的response.

    import requests
    
    url = 'http://httpbin.org/cookies/set/sessioncookie/123456789'
    
    with requests.Session() as s:
        response = s.get(url)
        #: A list of :class:`Response <Response>` objects from
        #: the history of the Request. Any redirect responses will end
        #: up here. The list is sorted from the oldest to the most recent request.
        print(response.history)
        for resp in response.history:
            print(resp, resp.request.url, resp.headers)
        print(response, response.request.url)
    

    从输出可以看出,跳了3次:
    http://httpbin.org/cookies/set/sessioncookie/123456789 ->
    https://httpbin.org/cookies/set/sessioncookie/123456789 ->
    https://httpbin.org/cookies

    [<Response [301]>, <Response [302]>]
    <Response [301]> http://httpbin.org/cookies/set/sessioncookie/123456789 {'Connection': 'close', 'Cache-Control': 'max-age:86400', 'Date': 'Friday, 08-Feb-19 12:49:25 CST', 'Expires': 'Sat, 09 Feb 2019 12:49:25 GMT', 'Keep-Alive': 'timeout=38', 'Location': 'https://httpbin.org/cookies/set/sessioncookie/123456789', 'Content-Length': '0'}
    <Response [302]> https://httpbin.org/cookies/set/sessioncookie/123456789 {'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Date': 'Fri, 08 Feb 2019 04:49:26 GMT', 'Content-Type': 'text/html; charset=utf-8', 'Content-Length': '223', 'Location': '/cookies', 'Set-Cookie': 'sessioncookie=123456789; Secure; Path=/', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true', 'Via': '1.1 vegur'}
    <Response [200]> https://httpbin.org/cookies
    
    

    有个概念:重定向与转发
    重定向是客户端行为,浏览器会发起多次请求
    转发是服务器行为,浏览器只会发起一次请求

    我们这里是重定向,浏览器里面确实发起了3次跳转(顺序从下往上):


    重定向.png

    timeout和max-retries

    Timeout

    请求总时长=连接时长+读时长, requests默认无超时时间,设置 timeout参数可以指定超时时间,可以传一个数字,也可以传一个tuple,超时则会抛出异常。(以上是我的理解,不保证正确)

    You can also pass a tuple to timeout with the first element being a connect timeout (the time it allows for the client to establish a connection to the server), and the second being a read timeout (the time it will wait on a response once your client has established a connection).

    import requests
    from requests.exceptions import Timeout, ConnectionError
    
    try:
        # 连接超时
        response = requests.get('https://api.github.com', timeout=(0.1, 5))
    except ConnectionError as e:
        print(e, type(e))
    
    try:
        # 读超时
        response = requests.get('https://api.github.com', timeout=(1, 0.1))
    except Timeout as e:
        print(e, type(e))
    
    try:
        # 整体超时,连接就超时了
        response = requests.get('https://api.github.com', timeout=0.1)
    except Exception as e:
        print(e, type(e))
    
    try:
        # 整体超时, 连接成功了,但是读超时了
        response = requests.get('https://api.github.com', timeout=0.3)
    except Exception as e:
        print(e, type(e))
    

    Max Retries

    requests默认失败不重试,可以通过Transport Adapter指定失败后的重试次数,下面的代码会重试最多3次:

    import requests
    from requests.adapters import HTTPAdapter
    from requests.exceptions import ConnectionError
    
    github_adapter = HTTPAdapter(max_retries=3)
    
    session = requests.Session()
    
    # Use `github_adapter` for all requests to endpoints that start with this URL
    session.mount('https://api.github.com', github_adapter)
    
    try:
        session.get('https://api.github.com')
    except ConnectionError as ce:
        print(ce)
    

    其实我建议不用timeout或adapter做超时和重试处理,有一个非常好的库来做这件事:tenacity.

    import requests
    from tenacity import retry, stop_after_attempt, stop_after_delay
    
    # 不断重试,直到以下任一种情况发生 1.执行成功不抛异常,2.耗时达到3秒,3.重试达到4次;
    # 若最后还未成功则抛RetryError异常
    @retry(stop=(stop_after_delay(3) | stop_after_attempt(4)))
    def get_github_api():
        with requests.Session() as session:
            return session.get('https://api.github.com')
    
    
    if __name__ == '__main__':
        try:
            response = get_github_api()
            print(response)
        except RetryError as ce:
            print(ce)
    
    

    参考文献:

    1. https://realpython.com/python-requests/#conclusion
    2. http://docs.python-requests.org/en/master/user/quickstart/
    3. https://tenacity.readthedocs.io/en/latest/

    相关文章

      网友评论

          本文标题:requests快速入门

          本文链接:https://www.haomeiwen.com/subject/vkmnsqtx.html