爬虫--urllib的使用

作者: qianxun0921 | 来源:发表于2018-12-23 18:02 被阅读0次

Python爬虫学习（十六）初窥Scrapy
python爬虫8: Requests库使用
python使用chrome driver做简单爬虫
爬虫--urllib的使用
Python爬虫基础之urllib与requests
tenliu的爬虫-抓包分析
tenliu的爬虫-python的urllib库
tenliu的爬虫-python库urllib、urllib2、
tenliu的爬虫-urllib2学习
tenliu的爬虫-requests学习

urllib库的基本使用

urlopen方法的使用

包含以下常用参数：
url：

设置目标url

data:

如果设置为None,则默认为get请求，反之，如果设置为False，则默认为post请求

timeout:

用于设置超时时间，单位为秒

context:

用来指定ssl设置，忽略未认证的CA证书

Request

Request类的相关参数：
url:

请求的目标url地址

data:

如果设置为None，则默认为get请求，反之，如果设置为False，则默认为post请求

headers:

是一个字典类型，用来添加请求头

unverifiable:

忽略ssl认证，默认为false

method:

指定发送请求的方式

urllib的异常处理

一、urllib出现异常的主要原因有：

没有网络连接

服务器连接失败

找不到指定的服务器

urllib的高级用法

自定义opener
示例代码：

import urllib.request

# 构建一个HTTPHandler 处理器对象，支持处理HTTP请求
http_handler = urllib.request.HTTPHandler()

# 构建一个HTTPHandler 处理器对象，支持处理HTTPS请求
# http_handler = urllib.request.HTTPSHandler()

# 调用urllib.request.build_opener()方法，创建支持处理HTTP请求的opener对象
opener = urllib.request.build_opener(http_handler)

# 构建 Request请求
request = urllib.request.Request("http://www.baidu.com/")

# 调用自定义opener对象的open()方法，发送request请求
response = opener.open(request)

# 获取服务器响应内容
print (response.read().decode())

设置代理

示例代码：

#使用requests模块设置代理
import requests

proxies = {
    'http':'219.238.186.188:8118',
    'https':'222.76.204.110:808',
    'https':'https://username:password@ip:port',
    'http':'http://username:password@ip:port'
}

url = 'https://httpbin.org/get'

response = requests.get(url,proxies=proxies,timeout=10)

print(response.text)

cookiejar的使用

示例代码：

import urllib
from http import cookiejar

# 构建一个CookieJar对象实例来保存cookie
cookiejar = cookiejar.CookieJar()

# 使用HTTPCookieProcessor()来创建cookie处理器对象，参数为CookieJar()对象
handler=urllib.request.HTTPCookieProcessor(cookiejar)

# 通过 build_opener() 来构建opener
opener = urllib.request.build_opener(handler)

# 4. 以get方法访问页面，访问之后会自动保存cookie到cookiejar中
opener.open("http://www.baidu.com")

## 可以按标准格式将保存的Cookie打印出来
cookieStr = ""
for item in cookiejar:
cookieStr = cookieStr + item.name + "=" + item.value + ";"

## 舍去最后一位的分号
print (cookieStr[:-1])
我们使用以上方法将Cookie保存到cookiejar对象中，

网友评论

本文标题：爬虫--urllib的使用

本文链接：https://www.haomeiwen.com/subject/wuinkqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

爬虫--urllib的使用

urllib库的基本使用

urlopen方法的使用

Request

urllib的异常处理

urllib的高级用法

设置代理

cookiejar的使用

相关文章

Python爬虫学习（十六）初窥Scrapy

python爬虫8: Requests库使用

python使用chrome driver做简单爬虫

爬虫--urllib的使用

Python爬虫基础之urllib与requests

tenliu的爬虫-抓包分析

tenliu的爬虫-python的urllib库

tenliu的爬虫-python库urllib、urllib2、

tenliu的爬虫-urllib2学习

tenliu的爬虫-requests学习

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读