python爬虫笔记-urllib

作者: SWJTU_CC | 来源:发表于2018-04-11 20:26 被阅读0次

Python爬虫学习（十六）初窥Scrapy
Python爬虫基础之urllib与requests
python网络爬虫基础模块安装
Python网络爬虫（八） - 利用有道词典实现一个简单翻译程序
Python网络爬虫（七）- 深度爬虫CrawlSpider
Python网络爬虫（二）- urllib爬虫案例
Python网络爬虫（一）- 入门基础
Python网络爬虫（四）- XPath
Python网络爬虫（三）- 爬虫进阶
Python网络爬虫（六）- Scrapy框架

urllib提供了一系列用于操作URL的功能。

from urllib import request #引用urllib

resp = request.urlopen("http://www.baidu.com") #打开网页,可以直接urlopen也可以先传入Request再传入urlopen

print(resp.read().decode("UTF-8")) #读取内容设置编码模式

from urllib import request

req = request.Request("http://www.baidu.com")

req.add_header("User-Agent","Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko") #模拟浏览器，以免被网站识别为爬虫。

resp = request.urlopen(req)

print(resp.read().decode("UTF-8"))

http://www.thsrc.com.tw/tw/TimeTable/SearchResult网站为例，NetWork中的Doc的SearchResult的Origin和Use-Agent是我们需要的，为了不让网站认出你是爬虫。

#from urllibimport request 如果用了这种写法，后面要写request.Request 以及 request.urlopen

#from urllib.request import urlopen 如果用了这种写法，后面就要写Request不能写request.Request

#from urllib.request import Request 这个同上。

from urllib import request

#from urllib.request import urlopen

#from urllib.request import Request

from urllibimport parse

req = request.Request("http://www.thsrc.com.tw/tw/TimeTable/SearchResult")

#request.Request可以request.Request().add_header 可以 request.Request().data

postData = parse.urlencode([

("StartStation","2f940836-cedc-41ef-8e28-c2336ac8fe68"),

("EndStation","977abb69-413a-4ccf-a109-0272c24fd490"),

("SearchDate","2018/04/11"),

("SearchTime","19:00"),

("SearchWay","DepartureInMandarin")

])

req.add_header("Origin","http://www.thsrc.com.tw")

req.add_header("User-Agent","Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36")#模拟浏览器行为

resp = request.urlopen(req,data=postData.encode("utf-8"))

print(resp.read().decode("UTF-8"))

网友评论

本文标题：python爬虫笔记-urllib

本文链接：https://www.haomeiwen.com/subject/tpnzhftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

python爬虫笔记-urllib

相关文章

Python爬虫学习（十六）初窥Scrapy

Python爬虫基础之urllib与requests

python网络爬虫基础模块安装

Python网络爬虫（八） - 利用有道词典实现一个简单翻译程序

Python网络爬虫（七）- 深度爬虫CrawlSpider

Python网络爬虫（二）- urllib爬虫案例

Python网络爬虫（一）- 入门基础

Python网络爬虫（四）- XPath

Python网络爬虫（三）- 爬虫进阶

Python网络爬虫（六）- Scrapy框架

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读