美文网首页
总结python原生的几种爬虫

总结python原生的几种爬虫

作者: 伍只蚊 | 来源:发表于2017-07-30 16:35 被阅读72次

1. 最基本的get请求

In [7]: url = 'http://www.baidu.com'
response = urllib2.urlopen(url)

2. post请求

In [13]: data={'key':'key','p':3}

In [11]: data = urllib.urlencode(data)

In [12]: response = urllib2.urlopen(url,data)

    response = urllib2.urlopen(url,data)

3. 带有头部的请求

In [14]: headers = {
   ....:     'User-Agent':'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'
   ....: }
request = urllib2.Request(url,headers=hearders)
In [16]: response = urllib2.urlopen(request)

4. 使用代理ip的请求

要用代理ip就用到了urllib2包中ProxyHandler类

handler = urllib2.ProxyHandler({'http':'127.0.0.1:5555'})
In [17]: opener = urllib2.build_opener(handler)
urllib2.install_opener(opener)
In [19]: response = urllib2.urlopen(url)

5. 使用储存cookie的请求

python提供了cookielib模块用于处理cookies

import urllib2, cookielib
 
cookie_support= urllib2.HTTPCookieProcessor(cookielib.CookieJar())
opener = urllib2.build_opener(cookie_support)
urllib2.install_opener(opener)
content = urllib2.urlopen(url).read()

或者直接发送带有cookie的头部

cookie = "PHPSESSID=91rurfqm2329bopnosfu4fvmu7; kmsign=55d2c12c9b1e3; KMUID=b6Ejc1XSwPq9o756AxnBAg="
request.add_header("Cookie", cookie)

相关文章

网友评论

      本文标题:总结python原生的几种爬虫

      本文链接:https://www.haomeiwen.com/subject/sxhklxtx.html