美文网首页
爬虫与信息提取 1-1 - Requests库入门

爬虫与信息提取 1-1 - Requests库入门

作者: hongXkeX | 来源:发表于2017-10-07 16:39 被阅读12次

你是无意穿堂风 偏偏孤倨引山洪

1进度

安装 requests

conda install requests

http://www.python-requests.org/en/master/


# -*- coding: utf-8 -*-
import requests
r = requests.get("http://www.mi.com")
print(r.status_code)
r.encoding = 'utf-8'
print(r.text)

requests 库7个主要方法:

requests 库7个主要方法

requests - get()

r = requests.get(url)

r = requests.get("http://www.mi.com")
get()1 get()2 get()3
requests库

response 对象的属性:

response 对象的属性 status_code

理解response的编码:

理解response的编码

爬取网页的通用代码框架:

  • 网络连接有风险
  • 异常处理很重要

理解Requests库的异常:

Requests库的异常1 Requests库的异常2
# -*- coding: utf-8 -*-

import requests

def getHTMLText(url):
    try:
        r = requests.get(url, timeout=30)
        r.raise_for_status()
        r.encoding = r.apparent_encoding
        return r.text
    except:
        return "产生异常"

if __name__ == "__main__":
    url = "http://www.mi.com"
    print(getHTMLText(url))

HTTP协议及Requests库方法:

Http协议:

Http协议 url url

Http协议对资源的操作:

Http协议对资源的操作1 Http协议对资源的操作2

http协议通过url对资源定位 通过以上6个操作方法对资源进行管理 每次操作都是独立无状态的(两次操作间没有关系)


区别

Requests库的head():

# -*- coding: utf-8 -*-
import requests
r = requests.get("http://www.mi.com")
print(r.status_code)
r.encoding = 'utf-8'
print(r.headers)

Requests库的post():

post()
# -*- coding: utf-8 -*-

import requests

payload = {'key1':'value1', 'key2':'value2'}

r = requests.post('http://httpbin.org/post', data=payload)

print(r.text)

print:

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "key1": "value1", 
    "key2": "value2"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Content-Length": "23", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.14.2"
  }, 
  "json": null, 
  "origin": "43.247.4.53", 
  "url": "http://httpbin.org/post"
}
data

Requests库的put():

put()

Requests库主要方法解析:

主要方法解析 0 1 2 3 4 5 6 7 8 9 10

get最常用!

get() head() post() put() patch() delete()

小结:

库入门 框架

世界上所有的追求都是因为热爱
一枚爱编码 爱生活 爱分享的IT信徒
— hongXkeX

相关文章

网友评论

      本文标题:爬虫与信息提取 1-1 - Requests库入门

      本文链接:https://www.haomeiwen.com/subject/vqtjyxtx.html