美文网首页
(二)requests库的学习

(二)requests库的学习

作者: 费云帆 | 来源:发表于2018-12-03 16:48 被阅读0次

    requests是python实现的简单易用的HTTP库,使用起来比urllib简洁很多,比如遇到一些高级的操作,urllib需要先创建handler对象,再构造opener对象去解决,requests就简便多了.

    这位大神写得很好,我也是参考这篇学习,地址:
    https://www.cnblogs.com/mzc1997/p/7813801.html

    1.基本'get'的抓取,创建response对象,对象创建完成后,就可以利用对象相应的属性和方法,获取我们需要的信息:

    import requests
    
    url='http://www.baidu.com'
    response=requests.get(url)
    print(response.status_code)#网页的状态码
    print(response.url)#请求的url
    print(response.headers)#响应头
    print(response.cookies)#cookies
    print(response.text)#以文本的形式返回网页源码
    print(response.content)#以bytes的方式返回网页源码
    

    2.各种请求方式:

    import requests
    
    requests.get('http://httpbin.org/get')
    requests.post('http://httpbin.org/post')
    requests.put('http://httpbin.org/put')
    requests.delete('http://httpbin.org/delete')
    requests.head('http://httpbin.org/get')
    requests.options('http://httpbin.org/get')
    

    3.get请求的响应:

    import requests
    
    url='http://httpbin.org/get'
    response=requests.get(url)
    print(response.text)
    
    {
      "args": {}, #表示参数
      "headers": {
        "Accept": "*/*", 
        "Accept-Encoding": "gzip, deflate", 
        "Connection": "close", 
        "Host": "httpbin.org", 
        "User-Agent": "python-requests/2.18.4"
      }, 
      "origin": "117.28.251.74", 
      "url": "http://httpbin.org/get"
    }
    

    4.get参数,方法有两种,一种是利用paramas,另一种是直接带参数:

    import requests
    
    url='http://httpbin.org/get'
    data={
        'name':'Jim Green',
        'age':22
    }
    response=requests.get(url,params=data)
    print(response.text)
    
    {
      "args": {
        "age": "22", 
        "name": "Jim Green"
      }, 
      "headers": {
        "Accept": "*/*", 
        "Accept-Encoding": "gzip, deflate", 
        "Connection": "close", 
        "Host": "httpbin.org", 
        "User-Agent": "python-requests/2.18.4"
      }
    

    直接带参数的:

    import requests
    
    url='http://httpbin.org/get?name=Jim+Green&age=22'
    response=requests.get(url)
    print(response.text)
    

    结果是一样的
    5.json解析:

    import requests
    
    url='http://httpbin.org/get'
    response=requests.get(url)
    print(response.text)
    print(type(response.text)) #str 类型
    print(response.json()) #等同json.loads(response.text)
    print(type(response.json()))---dict 类型
    

    6.保存二进制文件(图片,视频等):

    import requests
    
    url='http://github.com/favicon.ico'
    response=requests.get(url)
    print(response.text) #乱码
    print(response.content) #二进制
    content=response.content
    with open('E:\Bin\Python\picture.ico','wb') as file:
        file.write(content)
    

    7.添加请求头:

    import requests
    
    url='http://httpbin.org/get'
    headers={}
    headers['User-Agent']='Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Mobile Safari/537.36'
    response=requests.get(url,headers=headers)
    print(response.text)
    
    {
      "args": {}, 
      "headers": {
        "Accept": "*/*", 
        "Accept-Encoding": "gzip, deflate", 
        "Connection": "close", 
        "Host": "httpbin.org", 
        "User-Agent": "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Mobile Safari/537.36"
      }, 
      "origin": "117.28.251.74", 
      "url": "http://httpbin.org/get"
    }
    

    8.添加代理IP:以下的例子实际是失败的

    import requests
    
    url='http://httpbin.org/get'
    headers={}
    headers['User-Agent']='Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Mobile Safari/537.36'
    proxies={
        'HTTP':'95.66.157.74:34827',
        'HTTP':'93.191.14.103:43981'
    }
    response=requests.get(url,headers=headers,proxies=proxies)
    print(response.text)
    

    下面的例子,有空好好研究下:

    import requests
    import re
    
    def get_html(url):
        proxy = {
            'HTTP':'95.66.157.74:34827',
            'HTTP':'93.191.14.103:43981'
        }
        heads = {}
        heads['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.221 Safari/537.36 SE 2.X MetaSr 1.0'
        req = requests.get(url, headers=heads,proxies=proxy)
        html = req.text
        return html
    
    def get_ipport(html):
        regex = r'<td data-title="IP">(.+)</td>'
        iplist = re.findall(regex, html)
        regex2 = '<td data-title="PORT">(.+)</td>'
        portlist = re.findall(regex2, html)
        regex3 = r'<td data-title="类型">(.+)</td>'
        typelist = re.findall(regex3, html)
        sumray = []
        for i in iplist:
            for p in portlist:
                for t in typelist:
                    pass
                pass
            a = t+','+i + ':' + p
            sumray.append(a)
        print('高匿代理')
        print(sumray)
    
    
    if __name__ == '__main__':
        url = 'http://www.kuaidaili.com/free/'
        get_ipport(get_html(url))
    
    高匿代理
    ['HTTP,117.191.11.80:8118', 'HTTP,116.7.176.75:8118', 'HTTP,183.129.207.82:8118', 'HTTP,120.77.247.147:8118', 'HTTP,183.129.207.82:8118', 'HTTP,183.129.207.82:8118', 'HTTP,183.129.207.82:8118', 'HTTP,121.232.148.118:8118', 'HTTP,101.76.209.69:8118', 'HTTP,111.75.193.25:8118', 'HTTP,120.26.199.103:8118', 'HTTP,111.43.70.58:8118', 'HTTP,27.214.112.102:8118', 'HTTP,124.42.7.103:8118', 'HTTP,113.69.137.222:8118']
    

    相关文章

      网友评论

          本文标题:(二)requests库的学习

          本文链接:https://www.haomeiwen.com/subject/fwlycqtx.html