美文网首页
Python爬虫之requests模块

Python爬虫之requests模块

作者: 松鼠大帝 | 来源:发表于2019-11-21 00:14 被阅读0次

    获取响应信息

    import requests
    response = requests.get('http://www.baidu.com')
    print(response.status_code)  # 状态码
    print(response.url)          # 请求url
    print(response.headers)      # 响应头信息
    print(response.cookies)      # cookie信息
    print(response.content)      # bytes形式的响应内容
    print(response.encoding)     # 获取响应内容编码
    response.encoding=”utf-8”    # 指定响应内容编码
    print(response.text)         # 文本形式的响应内容,response.content编码后的结果
    

    发送Get请求

    不带参数的Get请求

    response = requests.get('http://www.baidu.com')
    print(response.text)
    

    带参数的Get请求

    直接写在url后面

    在url后面用?表示带上参数,每对参数用&分隔。如下url:
    https://www.bilibili.com/video/av4050443?from=search&seid=17321873743047145176
    注意:url最长2048字节,且数据透明不安全

    作为字典参数传入

    data = {'name': 'xiaoming',  'age': 26}
    response = requests.get('http://www.abcd.com', params=data)
    print(response.text)
    

    发送post请求

    只能作为字典参数传入,注意参数名字是data而不是params

    data = {'name': 'xiaoming',  'age': 26}
    response = requests.post('http://www.abcd.com', data=data)
    print(response.text)
    

    添加headers

    heads = {}
    heads['User-Agent'] = 'Mozilla/5.0 ' \
                              '(Macintosh; U; Intel Mac OS X 10_6_8; en-us) AppleWebKit/534.50 ' \
                              '(KHTML, like Gecko) Version/5.1 Safari/534.50'
    response = requests.get('http://www.baidu.com',headers=headers)
    

    使用代理

    proxy = {'http': '49.89.84.106:9999', 'https': '49.89.84.106:9999'}
    heads = {}
    heads['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.221 Safari/537.36 SE 2.X MetaSr 1.0'
    req = requests.get(url, proxies=proxy, headers=heads)
    print(req.text)
    

    使用加密代理

    from requests.auth import HTTPProxyAuth
    proxies= {'http': '127.0.0.1:8888', 'https': '127.0.0.1:8888'}
    auth = HTTPProxyAuth('user', 'pwd')
    requests.get(url, proxies=proxies, auth=auth)
    

    也可以这样

    proxies = {"http": "http://user:pass@10.10.1.10:3128/",}
    req = requests.get(url, proxies=proxy, headers=heads)
    

    Cookie

    获取Cookie

    import requests
    response = requests.get("http://www.baidu.com")
    print(type(response.cookies))
    # 把cookiejar对象转化为字典
    cookies = requests.utils.dict_from_cookiejar(response.cookies)
    print(cookies)
    

    使用Cookie

    cookie = {"Cookie":"xxxxxxxx"}
    response = requests.get(url,cookies=cookie)
    

    Session

    session = requests.Session()
    session.get('http://httpbin.org/cookies/set/number/12345')
    response = session.get('http://httpbin.org/cookies')
    print(response.text)
    

    限定响应时间

    from requests.exceptions import ReadTimeout
    try:
        response = requests.get('https://www.baidu.com', timeout=1)
        print(response.status_code)
    except :
        print('给定时间内未响应')
    

    解析JSON格式的响应内容

    通过response.json()方法可以将为JSON格式的响应内容转变为Python的对象,json.loads(response.text)也能起到同样的作用

    response = requests.get('http://www.abcd.com')
    print(response.text)
    print(response.json())  
    print(type(response.json()))
    

    想进一步了解编程开发相关知识,与我一同成长进步,请关注我的公众号“松果仓库”,共同分享宅&程序员的各类资源,谢谢!!!

    相关文章

      网友评论

          本文标题:Python爬虫之requests模块

          本文链接:https://www.haomeiwen.com/subject/jvxzictx.html