美文网首页
python请求模块

python请求模块

作者: 山高路陡 | 来源:发表于2020-07-15 16:55 被阅读0次

    爬虫请求模块

    urllib.request中的get与post请求

    • get请求,查询参数在url地址中显示

      • header={}
      • res = urllib.request.Request(url,headers=headers)
      • urllib.request.urlopen(res)
    • post请求

      • 在Request方法中添加data参数
      • header={}
      • data={},表单数据以bytes类型数据提交,urllib.parse.urlencode(data)
      • res = urllib.request.Request(url,data=data,headers=headers)
      • urllib.request.urlopen(res)
    • 通过Handler构建opener,用于处理ip代理,网页登录和cookie获取与保存读取

      • urllib.request.build_opener(*Handler) # *Handler代表可传入多个Handler

      • res = urllib.request.Request(url,data=data,headers=headers)

      • ProxyHandler代理

        • import urllib.request
          url = 'https://www.baidu.com'
          headers = {
          'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
          }
          res = urllib.request.Request(url,headers=headers)
          proxy_handler = urllib.request.ProxyHandler({‘http’:’ip:port’})
          opener = urllib.request.build_opener(proxy_handler)
          html = opener.open(res)
          
    • Cookies处理

      • import http.cookiejar, urllib.request
        
        url = 'https://www.baidu.com'
        headers = {
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'
        }
        res = urllib.request.Request(url,headers=headers)
        # 创建存储cookie的CookieJar()对象
        cookie = http.cookiejar.CookieJar()
        
        handler = urllib.request.HTTPCookieProcessor(cookie)
        opener = urllib.request.build_opener(handler)
        result = opener.open(res)
        for item in cookie:
            print(item)
        
        # cookie的保存与读取
        filename = 'cookies.txt'
        cookie = http.cookiejar.MozillaCookieJar(filename)
        handler = urllib.request.HTTPCookieProcessor(cookie)
        opener = urllib.request.build_opener(handler)
        result = opener.open(res)
        cookie.save(ignore_discard=True,ignore_expires=True)
        
        # LWP格式保存
        filename = 'cookies.txt'
        cookie = http.cookiejar.LWPCookieJar(filename)
        
        cookie.save(ignore_discard=True, ignore_expires=True)
        
        # 读取cookie
        cookie = http.cookiejar.LWPCookieJar()
        cookie.load('cookies.txt', ignore_discard=True, ignore_expires=True)
        handler = urllib.request.HTTPCookieProcessor(cookie)
        opener = urllib.request.build_opener(handler)
        

    requests模块

    • 安装 pip install requests
    • requests常用方法
      • response = requests.get(url, params=None, **kwargs)
        • params可给url传入参数,参数为字典格式,params={}
        • **kwargs–> 传入更多的参数,如:headers=headers
      • response.text –> str 自动猜测编码格式解码,可能会发生乱码
      • 编码后解码,response.encoding=’utf-8’ response.text
      • response.content –> bytes 字节流数据
      • response.content.decode(‘utf-8’) –>以特定格式解码
    • requests post请求
      • requests.post(url,data=None,json=None,**kwargs)
      • 参数data为字典表单数据,
      • **kwargs –>如:headers=headers
    • requests代理设置
      • proxy={‘http‘:’ip:port’}
      • res = requests.get(url,proxies=proxy)
    • 处理ssl风险
      • 在请求方式中加verify=False
      • requests.get(url,verify=False)

    相关文章

      网友评论

          本文标题:python请求模块

          本文链接:https://www.haomeiwen.com/subject/trdjhktx.html