01 requests

作者: wqjcarnation | 来源:发表于2019-02-25 13:16 被阅读9次

01 requests
requests01
Requests01.入门
爬虫 0&1
信息获取工具
requests xpath 爬虫
requests（1）
Python的Request包
requests库
python-requests库学习

参考网址：https://www.jianshu.com/p/2065f0292de6

目标

基本流程
requests简介
requests安装
requests-get请求实例
- 例1
- response.text和response.content的区别
- 例2
requests库的七个主要方法
几个方法调用示例

爬虫基本流程

发起请求
通过HTTP库向目标站点发起请求，也就是发送一个Request，请求可以包含额外的header等信息，等待服务器响应
相关组件: requests scrapy

获取响应内容
如果服务器能正常响应，会得到一个Response，Response的内容便是所要获取的页面内容，类型可能是HTML,Json字符串，二进制数据（图片或者视频）等类型
相关组件: requests scrapy

解析内容
得到的内容可能是HTML,可以用正则表达式，页面解析库进行解析，可能是Json,可以直接转换为Json对象解析，可能是二进制数据，可以做保存或者进一步的处理
相关组件:xpath

保存数据
保存形式多样，可以存为文本，也可以保存到数据库，或者保存特定格式的文件

requests简介

Requests 是用Python语言编写，基于 urllib，采用 Apache2 Licensed 开源协议的 HTTP 库。它比 urllib 更加简洁方便，更重要的一点是它支持 Python3。

requests安装

$ pip install requests
pycharm里settings-project interpreter-点击加号-输入requests-install

requests-get请求实例

1. 最基本的GET请求可以直接用get方法

  response = requests.get("http://www.baidu.com/")

也可以这么写

 response = requests.request("get", "http://www.baidu.com/")

2. 添加 headers 和传参数

如果想添加 headers，可以传入headers参数来增加请求头中的headers信息。如果要将参数放在url中传递，可以利用 params 参数。
d:\python\demo3\requestsdemo.py

import requests

kw = {'wd':'长城'}

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"}

# params 接收一个字典或者字符串的查询参数，字典类型自动转换为url编码，不需要urlencode()
response = requests.get("http://www.baidu.com/s?", params = kw, headers = headers)

# 查看响应内容，response.text 返回的是Unicode格式的数据
print (response.text)

# 查看响应内容，response.content返回的字节流数据
print (response.content)

# 查看完整url地址
print (response.url)

# 查看响应头部字符编码
print (response.encoding)

# 查看响应码
print (response.status_code)

运行结果

......

......

'http://www.baidu.com/s?wd=%E9%95%BF%E5%9F%8E'

'utf-8'

200

还可以将一个列表作为值传入

parameter = {
            "key1":"value1",
            "key2":["value21","value22"]
}
response3 = requests.get("http://httpbin.org/get",params = parameter)
print(response3.url)
# http://httpbin.org/get?key1=value1&key2=value21&key2=value22

使用response.text 时，Requests 会基于 HTTP 响应的文本编码自动解码响应内容，大多数 Unicode 字符集都能被无缝地解码。
使用response.content 时，返回的是服务器响应数据的原始二进制字节流，可以用来保存图片等二进制文件。

小栗子1

通过requests获取新浪首页

demo3\requestsina.py

import  requests
response = requests.get("http://www.sina.com.cn")
print(response.request.headers)
print(response.content.decode())

结果

    {'User-Agent': 'python-requests/2.21.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
<!DOCTYPE html>
<!-- [ published at 2019-02-25 14:24:00 ] -->
<html>
<head>
    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge" />
    <title>新浪首页</title>
    <meta name="keywords" content="新浪,新浪网,SINA,sina,sina.com.cn,新浪首页,门户,资讯" />
    <meta name="description" content="新浪网为全球用户24小时提供全面及时的中文资讯，内容覆盖国内外突发新闻事件、体坛赛事、娱乐时尚、产业资讯、实用信息等，设有新闻、体育、娱乐、财经、科技、房产、汽车等30多个内容频道，同时开设博客、视频、论坛等自由互动交流空间。" />
    <meta content="always" name="referrer">
    <link rel="mask-icon" sizes="any" href="//www.sina.com.cn/favicon.svg" color="red">
    ......

response.text

import  requests
response = requests.get("http://www.sina.com.cn")
print(response.request.headers)
print(response.text)

结果

{'User-Agent': 'python-requests/2.12.4', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
<!DOCTYPE html>
<!-- [ published at 2017-06-09 15:18:10 ] -->
<html>
<head>
    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge" />
    <title>æ–°æµªé¦–é¡µ</title>
    <meta name="keywords" content="æ–°æµª,æ–°æµªç½‘,SINA,sina,sina.com.cn,æ–°æµªé¦–é¡µ,é—¨æˆ·,èµ„è®¯" />
    <meta name="description" content="æ–°æµªç½‘ä¸ºå…¨ç�ƒç”¨æˆ·24å°�æ—¶æ��ä¾›å…¨é�¢å�Šæ—¶çš„ä¸æ–‡èµ„è®¯ï¼Œå†…å®¹è¦†ç›–å›½å†…å¤–çª�å�‘æ–°é—»äº‹ä»¶ã€�ä½“å�›èµ›äº‹ã€�å¨±ä¹�æ—¶å°šã€�äº§ä¸šèµ„è®¯ã€�å®žç”¨ä¿¡æ�¯ç‰ï¼Œè®¾æœ‰æ–°é—»ã€�ä½“è‚²ã€�å¨±ä¹�ã€�è´¢ç»�ã€�ç§‘æŠ€ã€�æˆ¿äº§ã€�æ±½è½¦ç‰30å¤šä¸ªå†…å®¹é¢‘é�“ï¼Œå�Œæ—¶å¼€è®¾å�šå®¢ã€�è§†é¢‘ã€�è®ºå�›ç‰è‡ªç”±äº’åŠ¨äº¤æµ�ç©ºé—´ã€‚" />
    <link rel="mask-icon" sizes="any" href="//www.sina.com.cn/favicon.svg" color="red">
`

response.text和response.content的区别(原因分析)

上面的中文乱码很可能是编码方式不匹配
可以试着在print(response.text)前指定响应编码方式

response.encoding = 'utf-8'

重点理解

response.text返回的类型是str
response.content返回的类型是bytes，可以通过decode()方法将bytes类型转为str类型

推荐使用：response.content.decode()的方式获取相应的html页面

扩展理解

response.text
解码类型：根据HTTP头部对响应的编码做出有根据的推测，推测的文本编码
如何修改编码方式：response.encoding = 'utf-8'
response.content
解码类型：没有指定
如何修改编码方式：response.content.decode('utf8')

小栗子2 request请求-响应-保存

通过requests获取网络上的图片并保存到本地

import requests
img_url = "http://imglf1.ph.126.net/pWRxzh6FRrG2qVL3JBvrDg==/6630172763234505196.png"
response = requests.get(img_url)
with open('baidu_tieba.jpg', 'ab') as f:
    f.write(response.content)
    f.close()

requests库的七个主要方法

image.png

几个请求调用的例子

# 发送一个 HTTP POST 请求：
r = requests.post("http://httpbin.org/post",data = {'key':'value'})
r = requests.delete('http://httpbin.org/delete')    # 发送一个 HTTP delete 请求：
r = requests.head('http://httpbin.org/get')         # 发送一个 HTTP head 请求：
r = requests.options('http://httpbin.org/get')      # 发送一个 HTTP options 请求：

网友评论

虫虫

本文标题：01 requests

本文链接：https://www.haomeiwen.com/subject/bjowyqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

01 requests

目标

爬虫基本流程

requests简介

requests安装

requests-get请求实例

1. 最基本的GET请求可以直接用get方法

也可以这么写

2. 添加 headers 和传参数

还可以将一个列表作为值传入

小栗子1

通过requests获取新浪首页

response.text

response.text和response.content的区别(原因分析)

小栗子2 request请求-响应-保存

通过requests获取网络上的图片并保存到本地

requests库的七个主要方法

几个请求调用的例子

相关文章

01 requests

requests01

Requests01.入门

爬虫 0&1

信息获取工具

requests xpath 爬虫

requests（1）

Python的Request包

requests库

python-requests库学习

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

虫虫

01 requests

目标

爬虫基本流程

requests简介

requests安装

requests-get请求实例

1. 最基本的GET请求可以直接用get方法

也可以这么写

2. 添加 headers 和 传参数

还可以将一个列表作为值传入

小栗子1

通过requests获取新浪首页

response.text

response.text和response.content的区别(原因分析)

小栗子2 request请求-响应-保存

通过requests获取网络上的图片并保存到本地

requests库的七个主要方法

几个请求调用的例子

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

2. 添加 headers 和传参数