Python使用Requests请求网页

作者: __豆约翰__ | 来源:发表于2018-12-19 07:08 被阅读97次

Requests库基本使用
Python使用Requests请求网页
Requests 使用笔记
使用Requests发送POST请求
20.Python使用Requests请求网页
爬虫进阶《requests 模块学习》
python3爬取12306余票，自动抢票
python3 requests详解
python接口测试
python 网页爬虫

安装方式

利用 pip 安装

$ pip install requests

基本GET请求（headers参数和 parmas参数）

1. 最基本的GET请求可以直接用get方法

response = requests.get("http://www.baidu.com/")

# 也可以这么写
# response = requests.request("get", "http://www.baidu.com/")

2. 添加 headers 和查询参数

如果想添加 headers，可以传入headers参数来增加请求头中的headers信息。如果要将参数放在url中传递，可以利用 params 参数。


import requests

kw = {'wd':'长城'}

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"}

# params 接收一个字典或者字符串的查询参数，字典类型自动转换为url编码，不需要urlencode()
response = requests.get("http://www.baidu.com/s?", params = kw, headers = headers)

# 查看响应内容，response.text 返回的是Unicode格式的数据
print (response.text)

# 查看响应内容，response.content返回的字节流数据
print (respones.content)

# 查看完整url地址
print (response.url)

# 查看响应头部字符编码
print (response.encoding)

# 查看响应码
print (response.status_code)

运行结果

......

......

'http://www.baidu.com/s?wd=%E9%95%BF%E5%9F%8E'

'utf-8'

200

使用response.text 时，Requests 会基于 HTTP 响应的文本编码自动解码响应内容，大多数 Unicode 字符集都能被无缝地解码。

使用response.content 时，返回的是服务器响应数据的原始二进制字节流，可以用来保存图片等二进制文件。

小栗子

通过requests获取新浪首页


import  requests
response = requests.get("http://www.sina.com.cn")
print(response.request.headers)
print(response.content.decode())

结果

{'User-Agent': 'python-requests/2.12.4', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
<!DOCTYPE html>
<!-- [ published at 2017-06-09 15:15:23 ] -->
<html>
<head>
    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge" />
    <title>新浪首页</title>
    <meta name="keywords" content="新浪,新浪网,SINA,sina,sina.com.cn,新浪首页,门户,资讯" />
  ...


import  requests
response = requests.get("http://www.sina.com.cn")
print(response.request.headers)
print(response.text)

结果

{'User-Agent': 'python-requests/2.12.4', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
<!DOCTYPE html>
<!-- [ published at 2017-06-09 15:18:10 ] -->
<html>
<head>
    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge" />
    <title>æ–°æµªé¦–é¡µ</title>
    <meta name="keywords" content="æ–°æµª,æ–°æµªç½‘,SINA,sina,sina.com.cn,æ–°æµªé¦–é¡µ,é—¨æˆ·,èµ„è®¯" />
    <meta name="description" content="æ–°æµªç½‘ä¸ºå…¨ç�ƒç”¨æˆ·24å°�æ—¶æ��ä¾›å…¨é�¢å�Šæ—¶çš„ä¸æ–‡èµ„è®¯ï¼Œå†…å®¹è¦†ç›–å›½å†…å¤–çª�å�‘æ–°é—»äº‹ä»¶ã€�ä½“å�›èµ›äº‹ã€�å¨±ä¹�æ—¶å°šã€�äº§ä¸šèµ„è®¯ã€�å®žç”¨ä¿¡æ�¯ç‰ï¼Œè®¾æœ‰æ–°é—»ã€�ä½“è‚²ã€�å¨±ä¹�ã€�è´¢ç»�ã€�ç§‘æŠ€ã€�æˆ¿äº§ã€�æ±½è½¦ç‰30å¤šä¸ªå†…å®¹é¢‘é�“ï¼Œå�Œæ—¶å¼€è®¾å�šå®¢ã€�è§†é¢‘ã€�è®ºå�›ç‰è‡ªç”±äº’åŠ¨äº¤æµ�ç©ºé—´ã€‚" />
    <link rel="mask-icon" sizes="any" href="//www.sina.com.cn/favicon.svg" color="red">
`

产生问题的原因分析

requests默认自带的Accept-Encoding导致或者新浪默认发送的就是压缩之后的网页
但是为什么content.read()没有问题，因为requests，自带解压压缩网页的功能
当收到一个响应时，Requests 会猜测响应的编码方式，用于在你调用response.text 方法时对响应进行解码。Requests 首先在 HTTP 头部检测是否存在指定的编码方式，如果不存在，则会使用 chardet.detect来尝试猜测编码方式（存在误差）
更推荐使用response.content.deocde()

通过requests获取网络上的图片

import requests
img_url = "http://imglf1.ph.126.net/pWRxzh6FRrG2qVL3JBvrDg==/6630172763234505196.png"
response = requests.get(img_url)
with open('baidu_tieba.jpg', 'ab') as f:
    f.write(response.content)
    f.close()

网友评论

本文标题：Python使用Requests请求网页

本文链接：https://www.haomeiwen.com/subject/mddrkqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Python使用Requests请求网页

安装方式

基本GET请求（headers参数和 parmas参数）

1. 最基本的GET请求可以直接用get方法

2. 添加 headers 和查询参数

小栗子

通过requests获取新浪首页

产生问题的原因分析

通过requests获取网络上的图片

相关文章

Requests库基本使用

Python使用Requests请求网页

Requests 使用笔记

使用Requests发送POST请求

20.Python使用Requests请求网页

爬虫进阶《requests 模块学习》

python3爬取12306余票，自动抢票

python3 requests详解

python接口测试

python 网页爬虫

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

Python使用Requests请求网页

安装方式

基本GET请求（headers参数 和 parmas参数）

1. 最基本的GET请求可以直接用get方法

2. 添加 headers 和 查询参数

小栗子

通过requests获取新浪首页

产生问题的原因分析

通过requests获取网络上的图片

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

基本GET请求（headers参数和 parmas参数）

2. 添加 headers 和查询参数