python requests库入门

作者: Cache_wood | 来源:发表于2021-10-04 00:44 被阅读0次

    @[toc]

    requests库安装

    Windows平台

    pip install requests
    

    测试requests的安装

    import requests
    
    r = requests.get("http://www.baidu.com")
    print(r.status_code)
    
    print(r.text)
    
    200
    <!DOCTYPE html>
    <!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css><title>ç¾åº¦ä¸ä¸ï¼ä½ å
    

    requests库的7个主要方法

    方法 说明
    requests.request() 构造一个请求,支撑以下各方法的基础方法
    requests.get() 获取HTML网页的主要方法,对应于HTTP的GET
    requests.head() 获取HTML网页头信息的方法,对应于HTTP的HEAD
    requests.post() 向HTML网页提交POST请求的方法,对应于HTTP的POST
    requests.put() 向HTML网页提交PUT请求的方法,对应于HTTP的PUT
    requests.patch() 向HTML网页提交局部修改请求,对应于HTTP的PATCH
    requests.delete() 向HTML页面提交删除请求,对应于HTTP的DELETE

    requests库的get()方法

    requests.get(url,params=None,**kwargs)
    
    • url:拟获取页面的url链接
    • params:url中的额外参数,字典或字节流格式,可选
    • **kwargs:12个控制访问的参数
    import requests
    
    r = requests.get("http://www.baidu.com")
    print(r.status_code)
    print(type(r))
    print(r.headers)
    
    200
    <class 'requests.models.Response'>
    {'Cache-Control': 'private, no-cache, no-store, proxy-revalidate, no-transform', 'Connection': 'keep-alive', 'Content-Encoding': 'gzip', 'Content-Type': 'text/html', 'Date': 'Fri, 27 Aug 2021 08:13:17 GMT', 'Last-Modified': 'Mon, 23 Jan 2017 13:27:36 GMT', 'Pragma': 'no-cache', 'Server': 'bfe/1.0.8.18', 'Set-Cookie': 'BDORZ=27315; max-age=86400; domain=.baidu.com; path=/', 'Transfer-Encoding': 'chunked'}
    

    response对象包含服务器返回的所有信息,也包含请求的request信息

    response对象的属性

    属性 说明
    r.status_code HTTP请求的返回状态,200表示连接成功,404表示失败
    r.text HTTP响应内容的字符串形式,即url对应的页面内容
    r.encoding 从HTTP header中猜测的响应内容编码方式
    r.apparent_encoding 从内容中分析出的响应内容编码方式
    r.content HTTP响应内容的二进制形式
    import requests
    
    r = requests.get("http://www.baidu.com")
    print(r.status_code)
    print(r.text)
    print(r.encoding)
    print(r.apparent_encoding)
    print(r.content)
    
    =cp-feedback>æ
                  è§åé¦</a>&nbsp;京ICPè¯030173å·&nbsp; <img src=//www
    .baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>
    
    ISO-8859-1
    utf-8
    b'<!DOCTYPE html>\r\n<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=
    

    r.encoding:如果header中不存在charset,则认为编码为ISO-8859-1,r.text根据r.encoding显示网页内容
    r.apparent_encoding:根据网页内容分析出的编码方式,可以看做是r.encoding的备选

    爬取网页的通用代码

    import requests
    
    def getHTMLText(url):
        try:
            r = requests.get(url,timeout=30)
            r.raise_for_status()
            r.encoding = r.apparent_encoding
            return r.text
        except:
            return "产生异常"
    
    if __name__ == "__main__":
        url = "http://www.baidu.com"
        print(getHTMLText(url))
    
    <!DOCTYPE html>
    <!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min.css><title>百度一下,你就知道</title></head> <body link=#0000cc> <div id=wrapper> <div id=head> <div class=head_wrapper> <div class=s_form> <div class=s_form_wrapper> <div id=lg> <img hidefocus=true src=//www.baidu.com/img/bd_logo1.png width=270 height=129> </div> <form id=form name=f action=//www.baidu.com/s class=fm> <input type=hidden name=bdorz_come value=1> <input type=hidden name=ie value=utf-8> <input type=hidden name=f value=8> <input type=hidden name=rsv_bp value=1> <input type=hidden name=rsv_idx value=1> <input type=hidden name=tn value=baidu><span class="bg s_ipt_wr"><input id=kw name=wd class=s_ipt value maxlength=255 autocomplete=off autofocus></span><span class="bg s_btn_wr"><input type=submit id=su value=百度一下 class="bg s_btn"></span> </form> </div> </div> <div id=u1> <a href=http://news.baidu.com name=tj_trnews class=mnav>新闻</a> <a href=http://www.hao123.com name=tj_trhao123 clas/www.baidu.com/bdorz/login.gif?login&amp;tpl=mn&amp;u=http%3A%2F%2Fwww.baidu.com%2f%3fbdorz_come%3d1 name=tj_login class=lb>登录</a> </noscript> <script>document.write('<a href="http://www.baidu.com/bdorz/login.gif?login&tpl=mn&u='+ encodeURIComponent(window.location.href+ (window.location.search === "" ? "?" : "&")+ "bdorz_come=1")+ '" name="tj_login" class="lb">登录</a>');</script> <a href=//www.baidu.com/more/ name=tj_briicon class=bri style="display: block;">更多 
    产品</a> </div> </div> </div> <div id=ftCon> <div id=ftConw> <p id=lh> <a href=http://home.baidu.com>关于百度</a> <a href=http://ir.baidu.com>About Baidu</a> </p> <p id=cp>&copy;2017&nbsp;Baidu&nbsp;<a href=http://www.baidu.com/duty/>使
    用百度前必读</a>&nbsp; <a href=http://jianyi.baidu.com/ class=cp-feedback>意见反馈</a>&nbsp;京ICP证030173号&nbsp; <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>
    

    requests库的方法

    requests库的head()方法
    r = requests.head("http://httpbin.org/get")
    print(r.headers)
    
    {'Date': 'Fri, 27 Aug 2021 08:35:54 GMT', 'Content-Type': 'application/json', 'Content-Length': '307', 'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true'}
    
    requests库的post()方法
    payload = {'key1':'value1','key2':'value2'}
    r = requests.post("http://httpbin.org/post",data=payload)
    print(r.text)
    
    PS E:\coding> python -u "e:\coding\pa.py"
    {
      "args": {},
      "data": "",
      "files": {},
      "form": {
        "key1": "value1",
        "key2": "value2"
      },
      "headers": {
        "Accept": "*/*",
        "Accept-Encoding": "gzip, deflate",
        "Content-Length": "23",
        "Content-Type": "application/x-www-form-urlencoded",
        "Host": "httpbin.org",
        "User-Agent": "python-requests/2.25.1",
        "X-Amzn-Trace-Id": "Root=1-6128a52f-262035902ff7c33c4f61fe1b"
      },
      "json": null,
      "origin": "36.142.152.189",
      "url": "http://httpbin.org/post"
    }
    

    向URL POST一个字典,自动编码为form(表单)
    向URL POST一个字符串,自动编码为data

    相关文章

      网友评论

        本文标题:python requests库入门

        本文链接:https://www.haomeiwen.com/subject/eybcnltx.html