【为了尊严】爬虫（二）

作者: 歌兮舞兮独酌兮 | 来源:发表于2018-09-30 23:48 被阅读3次

【为了尊严】爬虫（二）
【为了尊严】爬虫（一）
【为了尊严】Python（二）
为了尊严
为了尊严
为了尊严
活着，为了尊严
（33）为了尊严
Python爬虫：花瓣美女 (*^▽^*)
一张图读懂Python爬虫与反爬虫大战！

一、urllib

** urllib是python内置的请求库，不需要额外安装**

request：模拟发送请求，给库方法传入URL以及额外参数即可
error：异常处理模块，可以捕获异常，重新调试
parse：工具模块
1.urlopen()（发起基本请求和抓取）
（1）代码示例（输出各类信息）

import urllib.request
response = urllib.request.urlopen('https://www.python.org')
#print(response.read().decode('utf-8')) #read()输出整个网页
print(type(response),'\n') #输出响应类型
print(response.status) #输出状态码
print(response.getheaders()) #获得响应头
print(response.getheader('Server'))#获得响应头的server

函数原型

urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)

添加data参数后，请求方法就变成了POST

（2）代码示例（加上data参数）

import urllib.parse
import urllib.request

data = bytes(urllib.parse.urlencode({'word':'hello'}),encoding='utf-8')
#bytes()方法第一个参数要求是str，用urllib.parse.urlencode转化为str
response = urllib.request.urlopen('http://httpbin.org/post', data=data)
#form中出现传递的参数，以PSOT方式模拟了表单提交
print(response.read())

（3）代码示例（timeout）

import socket
import urllib.request
import urllib.error
 

#response = urllib.request.urlopen('http://httpbin.org/post', timeout=1)
#超时时间为1s
#print(response.read())

try:
    respinse = urllib.request.urlopen('http://httpbin.org/get', timeout=0.1)
except urllib.error.URLError as e:
    if isinstance(e.reason,socket.timeout):
        print('time out')

#isinstance 判断错误原因是否为超时

2.request

网友评论

本文标题：【为了尊严】爬虫（二）

本文链接：https://www.haomeiwen.com/subject/ngjgoftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

【为了尊严】爬虫（二）

一、urllib

相关文章

【为了尊严】爬虫（二）

【为了尊严】爬虫（一）

【为了尊严】Python（二）

为了尊严

为了尊严

为了尊严

活着，为了尊严

（33）为了尊严

Python爬虫：花瓣美女 (^▽^)

一张图读懂Python爬虫与反爬虫大战！

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读