4.爬虫 requests库讲解 GET请求 POST请求响应

作者: 那是个好男孩 | 来源:发表于2019-04-12 11:59 被阅读0次

4.爬虫 requests库讲解 GET请求 POST请求响应
python
Requests
爬虫的主要内容
requests（1）
python3 爬虫 requests
05、requests库的使用
python requests使用
爬虫基础：Requests模块
GET和POST请求

requests库相比于urllib库更好用！！！

0.各种请求方式

import requests
requests.post('http://httpbin.org/post') 
requests.put('http://httpbin.org/put')
requests.delete('http://httpbin.org/delete')
requests.head('http://httpbin.org/get')
requests.options('http://httpbin.org/get')

*http://httpbin.org是一个http请求验证网站！

1.GET请求

带参数的get请求（两种方式是等效的）

import requests

response = requests.get("http://httpbin.org/get?name=germey&age=22")
print(response.text)

########################

import requests

data = {
    'name': 'germey',
    'age': 22
}
response = requests.get("http://httpbin.org/get", params=data)
print(response.text)

输出结果如下：

{
  "args": {
    "age": "22", 
    "name": "germey"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.18.4"
  }, 
  "origin": "182.148.156.45, 182.148.156.45", 
  "url": "https://httpbin.org/get?name=germey&age=22"
}

params=data对于get请求添加附加的格外的信息，这个信息一般用字典来存储，可见返回的结果中args字段.

解析json

import requests
import json

response = requests.get("http://httpbin.org/get")
print(type(response.text))
print(response.json())
print(json.loads(response.text))
print(type(response.json()))

输出结果如下：

<class 'str'>
{'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.18.4'}, 'origin': '182.148.156.45, 182.148.156.45', 'url': 'https://httpbin.org/get'}
{'args': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Host': 'httpbin.org', 'User-Agent': 'python-requests/2.18.4'}, 'origin': '182.148.156.45, 182.148.156.45', 'url': 'https://httpbin.org/get'}
<class 'dict'>

网页返回的类型是str类型，但是有的很特殊，返回的是JSON格式的字符串.json()将JSON格式的字符串转换为字典。
可以看到response.json()和json.loads(response.text)打印出来的结果是一样的！
倘若网页返回的结果不是JSON格式的，便会出现解析错误，抛出json.decoder.JSONDecodeError的异常。

获取二进制数据

import requests

response = requests.get("https://github.com/favicon.ico")
print(type(response.text), type(response.content))
print(response.text)
print(response.content)

"""
#抓取并保存二进制数据(图片、视频、音频等文件)
with open('文件名称','wb') as f:
　　f.write(r.content)
"""
*图标地址一般都放在网络根目录下，名称为favicon.ico
*获取文本数据：response.text
*获取图片等二进制数据：response.content
添加headers
import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'
}
response = requests.get("https://www.zhihu.com/explore", headers=headers)
print(response.text)

2.POST请求

带参数POST请求

import requests

data = {'name': 'germey', 'age': '22'}
response = requests.post("http://httpbin.org/post", params=data)print(response.text)
添加headers
import requests

data = {'name': 'germey', 'age': '22'}
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'
}
response = requests.post("http://httpbin.org/post", params=data, headers=headers)
print(response.text)
print(response.json())

输出结果如下：

{
  "args": {
    "age": "22", 
    "name": "germey"
  }, 
  "data": "", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Content-Length": "0", 
    "Host": "httpbin.org", 
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36"
  }, 
  "json": null, 
  "origin": "101.206.170.234, 101.206.170.234", 
  "url": "https://httpbin.org/post?name=germey&age=22"
}

{'args': {'age': '22', 'name': 'germey'}, 'data': '', 'files': {}, 'form': {}, 'headers': {'Accept': '*/*', 'Accept-Encoding': 'gzip, deflate', 'Content-Length': '0', 'Host': 'httpbin.org', 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'}, 'json': None, 'origin': '101.206.170.234, 101.206.170.234', 'url': 'https://httpbin.org/post?name=germey&age=22'}

可以看到GET和POST方式的带参数的请求和添加headers的方式差不多！

3.响应

reponse属性

import requests

response = requests.get('http://www.jianshu.com')
print(type(response.status_code), response.status_code) #状态码
print(type(response.headers), response.headers) #响应头
print(type(response.cookies), response.cookies) #cookies值
print(type(response.url), response.url) #url
print(type(response.history), response.history) #请求历史

输出结果如下：

<class 'int'> 403
<class 'requests.structures.CaseInsensitiveDict'> {'Server': 'Tengine', 'Content-Type': 'text/html', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Date': 'Tue, 09 Apr 2019 13:14:23 GMT', 'Vary': 'Accept-Encoding', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'Content-Encoding': 'gzip', 'x-alicdn-da-ups-status': 'endOs,0,403', 'Via': 'cache29.l2cm12-6[17,0], cache8.cn389[77,0]', 'Timing-Allow-Origin': '*', 'EagleId': '7d412b4815548156634056180e'}
<class 'requests.cookies.RequestsCookieJar'> <RequestsCookieJar[]>
<class 'str'> https://www.jianshu.com/
<class 'list'> [<Response [301]>]

状态码判断

下面列出反悔码和相应的查询条件：

# 信息性状态码
100: ('continue',),
101: ('switching_protocols',),
102: ('processing',),
103: ('checkpoint',),
122: ('uri_too_long', 'request_uri_too_long'),

# 成功状态码
200: ('ok', 'okay', 'all_ok', 'all_okay', 'all_good', '\\o/', '✓'),
201: ('created',),
202: ('accepted',),
203: ('non_authoritative_info', 'non_authoritative_information'),
204: ('no_content',),
205: ('reset_content', 'reset'),
206: ('partial_content', 'partial'),
207: ('multi_status', 'multiple_status', 'multi_stati', 'multiple_stati'),
208: ('already_reported',),
226: ('im_used',),

# Redirection.重定向
300: ('multiple_choices',),
301: ('moved_permanently', 'moved', '\\o-'),
302: ('found',),
303: ('see_other', 'other'),
304: ('not_modified',),
305: ('use_proxy',),
306: ('switch_proxy',),
307: ('temporary_redirect', 'temporary_moved', 'temporary'),
308: ('permanent_redirect',
      'resume_incomplete', 'resume',), # These 2 to be removed in 3.0

# Client Error.客户端错误
400: ('bad_request', 'bad'),
401: ('unauthorized',),
402: ('payment_required', 'payment'),
403: ('forbidden',),
404: ('not_found', '-o-'),
405: ('method_not_allowed', 'not_allowed'),
406: ('not_acceptable',),
407: ('proxy_authentication_required', 'proxy_auth', 'proxy_authentication'),
408: ('request_timeout', 'timeout'),
409: ('conflict',),
410: ('gone',),
411: ('length_required',),
412: ('precondition_failed', 'precondition'),
413: ('request_entity_too_large',),
414: ('request_uri_too_large',),
415: ('unsupported_media_type', 'unsupported_media', 'media_type'),
416: ('requested_range_not_satisfiable', 'requested_range', 'range_not_satisfiable'),
417: ('expectation_failed',),
418: ('im_a_teapot', 'teapot', 'i_am_a_teapot'),
421: ('misdirected_request',),
422: ('unprocessable_entity', 'unprocessable'),
423: ('locked',),
424: ('failed_dependency', 'dependency'),
425: ('unordered_collection', 'unordered'),
426: ('upgrade_required', 'upgrade'),
428: ('precondition_required', 'precondition'),
429: ('too_many_requests', 'too_many'),
431: ('header_fields_too_large', 'fields_too_large'),
444: ('no_response', 'none'),
449: ('retry_with', 'retry'),
450: ('blocked_by_windows_parental_controls', 'parental_controls'),
451: ('unavailable_for_legal_reasons', 'legal_reasons'),
499: ('client_closed_request',),

# Server Error.服务端错误
500: ('internal_server_error', 'server_error', '/o\\', '✗'),
501: ('not_implemented',),
502: ('bad_gateway',),
503: ('service_unavailable', 'unavailable'),
504: ('gateway_timeout',),
505: ('http_version_not_supported', 'http_version'),
506: ('variant_also_negotiates',),
507: ('insufficient_storage',),
509: ('bandwidth_limit_exceeded', 'bandwidth'),
510: ('not_extended',),
511: ('network_authentication_required', 'network_auth', 'network_authentication'),

状态码用于判断请求是否成功，requests还提供了一个内置的状态码查询对象requests.codes。（两种写法都可以）

import requests

response = requests.get('http://www.jianshu.com')
exit() if not response.status_code == requests.codes.ok else print('Request Successfully')

###################################
import requests

response = requests.get('http://www.jianshu.com')
exit() if not response.status_code == 200 else print('Request Successfully')

4.爬虫 requests库讲解 GET请求 POST请求响应
requests库相比于urllib库更好用！！！ 0.各种请求方式 *http://httpbin.org是一个...
python
requests请求方式 get请求 requests.get() post请求 req...
Requests
Requests库目录一、Requests基础二、发送请求与接收响应（基本GET请求）三、发送请求与接收响应（基...
爬虫的主要内容
爬虫的主要内容 requests 发送请求传递url参数读取相应内容定制请求头部 Post请求响应状态码 ...
requests（1）
2、requests请求请求方法：requests.get requests.post requests.put...
python3 爬虫 requests
请求网页常用的库有requests和urllib html的请求方式： get post put delete ...
05、requests库的使用
1、什么是requests？ 2、实例引入 3、基本get请求 4、基本post请求 5、响应 6、高级用法
python requests使用
参考python爬虫---requests库的用法基本的get请求：各种请求方式：带参数、头部、代理
爬虫基础：Requests模块
Requests 是基于Python开发的HTTP网络请求库。 GET请求 POST 其他参数说明无论是get还...
GET和POST请求
GET和POST 请求请求响应

4.爬虫 requests库讲解 GET请求 POST请求响应

0.各种请求方式

1.GET请求

带参数的get请求（两种方式是等效的）

解析json

获取二进制数据

2.POST请求

带参数POST请求

3.响应

reponse属性

状态码判断

相关文章

4.爬虫 requests库讲解 GET请求 POST请求响应

python

Requests

爬虫的主要内容

requests（1）

python3 爬虫 requests

05、requests库的使用

python requests使用

爬虫基础：Requests模块

GET和POST请求

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

4.爬虫 requests库讲解 GET请求 POST请求 响应

0.各种请求方式

1.GET请求

带参数的get请求（两种方式是等效的）

解析json

获取二进制数据

2.POST请求

带参数POST请求

3.响应

reponse属性

状态码判断

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

4.爬虫 requests库讲解 GET请求 POST请求响应