Urllib库

作者: 苦瓜1512 | 来源:发表于2017-12-20 10:21 被阅读0次

Python3 urllib库的使用
Urllib
比较基础的urllib库来了解一下
Urllib库介绍
爬虫常用库介绍
02 urllib库的使用
爬虫第三讲：基本的urllib库
python2 python3 urllib区别
Urllib库
03_基本库的使用

Urllib是python内置的http请求库，分为以下几个模块

urllib.request：请求模块
urllib.error：url异常处理模块
urllib.parse：url解析模块
urllib.robotparser：robots.txt解析模块

1.`urllib.request`

1.1 `urllib.request.urlopen()`

urllib.request.urlopen(url, data=None, [timeout,]*, ...)

url：要打开的连接
data=None：附加数据，例如使用post方式的时候附加的数据
timeout：超时时间

1.1.1 `url`

import urllib.request
response = urllib.request.urlopen('http://www.baidu.com')
print(response.read().decode('utf-8'))

response.read()方法返回的是bytes类型数据，需要decode成相应编码的字符串
这是一个get请求方式

1.1.2 `data`

import urllib.request
import urllib.parse

data = bytes(urllib.parse.urlencode({'world':'hello'}), encoding='utf-8')
respon = urllib.request.urlopen('http://httpbin.org/post', data=data)
print(respon.read())

加入了data参数，这是一个post请求方式

1.1.3 `timeout`

import socket
import urllib.request
import urllib.error

try:
    respon = urllib.request.urlopen('http://www.baidu.com', timeout=1)
except urllib.error.URLError as e:
    if isinstance(e.reason, socket.timeout):
        print('Time Out')
try:
    respon = urllib.request.urlopen('http://www.baidu.com', timeout=0.01)
except urllib.error.URLError as e:
    if isinstance(e.reason, socket.timeout):
        print('Time Out')

1.2 响应

1.2.1 响应类型

import urllib.request
respon = urllib.request.urlopen('https://www.python.org')
print(type(respon))
================================================================================================
>> <class 'http.client.HTTPResponse'>

1.2.2 状态码与响应头

import urllib.request
respon = urllib.request.urlopen('http://www.python.org')
print(respon.status)
print(respon.getheaders())
print(respon.getheader('Server'))

respon.status：获取状态码
respon.getheaders()：所有的响应头
respon.getheader('Server')：获取特定响应头

1.3 `Request`对象

1.3.1 使用`Request`对象发起请求

import urllib.request as rq
requ = rq.Request('http://www.baidu.com')
resp = rq.urlopen(requ)
print(resp.read().decode('utf-8'))

声明一个Request对象
把Rquest对象传入urllib.request.urlopen()中

1.3.2 使用`Request`对象发起请求并携带额外数据

from urllib import request, parse
url = 'http://httpbin.org/post'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; …) Gecko/20100101 Firefox/57.0'
}
dict = {
    'name': 'doggy'
}
data = bytes(parse.urlencode(dict), encoding='utf-8')
req = request.Request(url=url, data=data, headers=headers, method='POST')
resp = request.urlopen(req)
print(resp.read().decode('utf-8'))

1.3.2 使用`Request`对象添加头信息

from urllib import request, parse
url = 'http://httpbin.org/post'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; …) Gecko/20100101 Firefox/57.0'
}
dict = {
    'name': 'doggy'
}
data = bytes(parse.urlencode(dict), encoding='utf-8')
req = request.Request(url=url, data=data, method='POST')
req.add_header(headers)
resp = request.urlopen(req)
print(resp.read().decode('utf-8'))

2. `urllib.error`

urllib.error模块定义了三个错误类：

urllib.error.URLError
- reason：出错原因
urllib.error.HTTPError
- code：出错码
- reason：出错原因
- headers：http响应头
- urllib.error.HTTPError是urllib.error.URLError的子类
urllib.error.ContentTooShortError

3. `urllib.parse`

3.1 `urllib.parse.urlencode()`

urllib.parse.urlencode()用来把一个字典数据转换成get请求的参数

from urllib.parse import urlencode
params = {
    'name': 'yindf',
    'age':22
}
base_url = 'http://www.baidu.com?'
url = base_url + urlencode(params)
print(url)
================================================================================================
>> http://www.baidu.com?name=yindf&age=22

4. `urllib.robotparse`

网友评论

本文标题：Urllib库

本文链接：https://www.haomeiwen.com/subject/qmowwxtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Urllib库

1.`urllib.request`

1.1 `urllib.request.urlopen()`

1.1.1 `url`

1.1.2 `data`

1.1.3 `timeout`

1.2 响应

1.2.1 响应类型

1.2.2 状态码与响应头

1.3 `Request`对象

1.3.1 使用`Request`对象发起请求

1.3.2 使用`Request`对象发起请求并携带额外数据

1.3.2 使用`Request`对象添加头信息

2. `urllib.error`

3. `urllib.parse`

3.1 `urllib.parse.urlencode()`

4. `urllib.robotparse`

相关文章

Python3 urllib库的使用

Urllib

比较基础的urllib库来了解一下

Urllib库介绍

爬虫常用库介绍

02 urllib库的使用

爬虫第三讲：基本的urllib库

python2 python3 urllib区别

Urllib库

03_基本库的使用

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

Urllib库

1.urllib.request

1.1 urllib.request.urlopen()

1.1.1 url

1.1.2 data

1.1.3 timeout

1.2 响应

1.2.1 响应类型

1.2.2 状态码与响应头

1.3 Request对象

1.3.1 使用Request对象发起请求

1.3.2 使用Request对象发起请求并携带额外数据

1.3.2 使用Request对象添加头信息

2. urllib.error

3. urllib.parse

3.1 urllib.parse.urlencode()

4. urllib.robotparse

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

1.`urllib.request`

1.1 `urllib.request.urlopen()`

1.1.1 `url`

1.1.2 `data`

1.1.3 `timeout`

1.3 `Request`对象

1.3.1 使用`Request`对象发起请求

1.3.2 使用`Request`对象发起请求并携带额外数据

1.3.2 使用`Request`对象添加头信息

2. `urllib.error`

3. `urllib.parse`

3.1 `urllib.parse.urlencode()`

4. `urllib.robotparse`