美文网首页
002-代理user-agent和response信息、get请

002-代理user-agent和response信息、get请

作者: 豆瓣奶茶 | 来源:发表于2018-12-16 01:13 被阅读12次

    response网络详细信息

    python2中urllib和urllib2的区别
    参考地址:https://blog.csdn.net/qq_34327480/article/details/79161794

    在python2中,urllib和urllib2都是接受URL请求的相关模块,但是提供了不同的功能。两个最显著的不同如下:
    1、urllib2可以接受一个Request类的实例来设置URL请求的headers
    2、urllib仅可以接受URL。这意味着,你不可以伪装你的User Agent字符串等。

    python3使用urllib

    #py3
    import urllib.request
    #pycharm go declaration to search source code
    def download(url):
      response = urllib.request.urlopen(url, timeout = 5) 
    
    print(type(response))# class http.client.httpresponse
    print(response.info()) #class  http.client.HTTPMessage
    print(download("http://ww.baidu.com"))
    
    

    python2使用urllib2

    python2里面没有urllib.reqeust,我们直接用urllib2替换即可
    还有开头的coding:utf-8

    #py2
    #coding:utf-8
    import urllib2
    
    
    def download(url):
      response = urllib2.urlopen(url, timeout = 5) 
      print(type(response))# class http.client.httpresponse
      print(response.info()) #包含了网站的详细信息
      print(response.read()) #read source coad
    
    #括号内是控制多少字符的问题
    #写爬虫记得try catch
    try:
      print(download("http://ww.google.com"))
    except urllib2.URLError as e:
      print("网络异常", e) #抓住错误对象类型当作变量
    
    

    或者

    import urllib2
    req = urllib2.Request('http://www.example.com/')
    req.add_header('Referer', 'http://www.python.org/')
    # Customize the default User-Agent header value:
    req.add_header('User-Agent', 'urllib-example/0.1 (Contact: . . .)')
    r = urllib2.urlopen(req)
    

    response信息

    再贴一下打印response.info()的信息

    Bdpagetype: 1
    Bdqid: 0xe33c14ce00005740
    Cache-Control: private
    Content-Type: text/html
    Cxy_all: baidu+7b2f0340f919578bfe3264aa8c0016f8
    Date: Sun, 02 Dec 2018 03:03:24 GMT
    Expires: Sun, 02 Dec 2018 03:03:01 GMT
    P3p: CP=" OTI DSP COR IVA OUR IND COM "
    Server: BWS/1.1
    Set-Cookie: BAIDUID=00813CE4F82EFFA6488DB896F424E587:FG=1; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
    Set-Cookie: BIDUPSID=00813CE4F82EFFA6488DB896F424E587; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
    Set-Cookie: PSTM=1543719804; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
    Set-Cookie: delPer=0; path=/; domain=.baidu.com
    Set-Cookie: BDSVRTM=0; path=/
    Set-Cookie: BD_HOME=0; path=/
    Set-Cookie: H_PS_PSSID=1441_25810_21122_22159; path=/; domain=.baidu.com
    Vary: Accept-Encoding
    X-Ua-Compatible: IE=Edge,chrome=1
    Connection: close
    Transfer-Encoding: chunked
    

    注意参数里面有一个Bdqid,这是百度给每个用户的唯一标识

    response.read()

    查看全部网页源代码

    response.read(100)

    查看网页源代码的前100个字节

    写爬虫的时候多家try,catch

    agent

    就像大会狼冒充大白兔

    #encoding: utf-8
    import urllib2
    
    def download(url):
      headers = {"User Agent : "}
      request = urllib2.Request(url, headers = headers) #发起请求
      data = urllib2.urlopen(request).read() #打开请求,抓取数据
      return data
    
    url = "https://sou.zhaopin.com/?jl=613&kw=" + searchname + "&kt=3"
    print download(url)
    
    

    上面这段代码构造了一个request,在python2的情况下

    常见的代理

    手机代理

    safariiOS4.33–iPhone
    User-Agent:Mozilla/5.0(iPhone;U;CPUiPhoneOS4_3_3likeMacOSX;en-us)AppleWebKit/533.17.9(KHTML,likeGecko)Version/5.0.2Mobile/8J2Safari/6533.18.5
    safariiOS4.33–iPodTouch
    User-Agent:Mozilla/5.0(iPod;U;CPUiPhoneOS4_3_3likeMacOSX;en-us)AppleWebKit/533.17.9(KHTML,likeGecko)Version/5.0.2Mobile/8J2Safari/6533.18.5
    safariiOS4.33–iPad
    User-Agent:Mozilla/5.0(iPad;U;CPUOS4_3_3likeMacOSX;en-us)AppleWebKit/533.17.9(KHTML,likeGecko)Version/5.0.2Mobile/8J2Safari/6533.18.5
    AndroidN1
    User-Agent:Mozilla/5.0(Linux;U;Android2.3.7;en-us;NexusOneBuild/FRF91)AppleWebKit/533.1(KHTML,likeGecko)Version/4.0MobileSafari/533.1
    AndroidQQ浏览器Forandroid
    User-Agent:MQQBrowser/26Mozilla/5.0(Linux;U;Android2.3.7;zh-cn;MB200Build/GRJ22;CyanogenMod-7)AppleWebKit/533.1(KHTML,likeGecko)Version/4.0MobileSafari/533.1
    AndroidOperaMobile
    User-Agent:Opera/9.80(Android2.3.4;Linux;OperaMobi/build-1107180945;U;en-GB)Presto/2.8.149Version/11.10
    AndroidPadMotoXoom
    User-Agent:Mozilla/5.0(Linux;U;Android3.0;en-us;XoomBuild/HRI39)AppleWebKit/534.13(KHTML,likeGecko)Version/4.0Safari/534.13
    BlackBerry
    User-Agent:Mozilla/5.0(BlackBerry;U;BlackBerry9800;en)AppleWebKit/534.1+(KHTML,likeGecko)Version/6.0.0.337MobileSafari/534.1+
    WebOSHPTouchpad
    User-Agent:Mozilla/5.0(hp-tablet;Linux;hpwOS/3.0.0;U;en-US)AppleWebKit/534.6(KHTML,likeGecko)wOSBrowser/233.70Safari/534.6TouchPad/1.0
    NokiaN97
    User-Agent:Mozilla/5.0(SymbianOS/9.4;Series60/5.0NokiaN97-1/20.0.019;Profile/MIDP-2.1Configuration/CLDC-1.1)AppleWebKit/525(KHTML,likeGecko)BrowserNG/7.1.18124
    WindowsPhoneMango
    User-Agent:Mozilla/5.0(compatible;MSIE9.0;WindowsPhoneOS7.5;Trident/5.0;IEMobile/9.0;HTC;Titan)
    UC无
    User-Agent:UCWEB7.0.2.37/28/999
    UC标准
    User-Agent:NOKIA5700/UCWEB7.0.2.37/28/999
    UCOpenwave
    User-Agent:Openwave/UCWEB7.0.2.37/28/999
    UCOpera
    User-Agent:Mozilla/4.0(compatible;MSIE6.0;)Opera/UCWEB7.0.2.37/28/999
    
    

    电脑代理

    safari5.1–MAC
    User-Agent:Mozilla/5.0(Macintosh;U;IntelMacOSX10_6_8;en-us)AppleWebKit/534.50(KHTML,likeGecko)Version/5.1Safari/534.50
    safari5.1–Windows
    User-Agent:Mozilla/5.0(Windows;U;WindowsNT6.1;en-us)AppleWebKit/534.50(KHTML,likeGecko)Version/5.1Safari/534.50
    
    
    IE9.0
    User-Agent:Mozilla/5.0(compatible;MSIE9.0;WindowsNT6.1;Trident/5.0
    IE8.0
    User-Agent:Mozilla/4.0(compatible;MSIE8.0;WindowsNT6.0;Trident/4.0)
    IE7.0
    User-Agent:Mozilla/4.0(compatible;MSIE7.0;WindowsNT6.0)
    IE6.0
    User-Agent:Mozilla/4.0(compatible;MSIE6.0;WindowsNT5.1)
    Firefox4.0.1–MAC
    User-Agent:Mozilla/5.0(Macintosh;IntelMacOSX10.6;rv:2.0.1)Gecko/20100101Firefox/4.0.1
    Firefox4.0.1–Windows
    User-Agent:Mozilla/5.0(WindowsNT6.1;rv:2.0.1)Gecko/20100101Firefox/4.0.1
    Opera11.11–MAC
    User-Agent:Opera/9.80(Macintosh;IntelMacOSX10.6.8;U;en)Presto/2.8.131Version/11.11
    Opera11.11–Windows
    User-Agent:Opera/9.80(WindowsNT6.1;U;en)Presto/2.8.131Version/11.11
    Chrome17.0–MAC
    User-Agent:Mozilla/5.0(Macintosh;IntelMacOSX10_7_0)AppleWebKit/535.11(KHTML,likeGecko)Chrome/17.0.963.56Safari/535.11
    傲游(Maxthon)
    User-Agent:Mozilla/4.0(compatible;MSIE7.0;WindowsNT5.1;Maxthon2.0)
    腾讯TT
    User-Agent:Mozilla/4.0(compatible;MSIE7.0;WindowsNT5.1;TencentTraveler4.0)
    世界之窗(TheWorld)2.x
    User-Agent:Mozilla/4.0(compatible;MSIE7.0;WindowsNT5.1)
    世界之窗(TheWorld)3.x
    User-Agent:Mozilla/4.0(compatible;MSIE7.0;WindowsNT5.1;TheWorld)
    搜狗浏览器1.x
    User-Agent:Mozilla/4.0(compatible;MSIE7.0;WindowsNT5.1;Trident/4.0;SE2.XMetaSr1.0;SE2.XMetaSr1.0;.NETCLR2.0.50727;SE2.XMetaSr1.0)
    360浏览器
    User-Agent:Mozilla/4.0(compatible;MSIE7.0;WindowsNT5.1;360SE)
    Avant
    User-Agent:Mozilla/4.0(compatible;MSIE7.0;WindowsNT5.1;AvantBrowser)
    GreenBrowser
    User-Agent:Mozilla/4.0(compatible;MSIE7.0;WindowsNT5.1)
    

    模拟手机浏览器

    get模拟百度请求

    urllib的编码和解码

    浏览器的地址栏经常看到乱码的情况,这就是编码的问题

    words={'wd':'徐晓峰'}
    urllib.urlencode(words) #'wd=%E5%BE%90%E6%99%93%E5%B3%B0',注意这里是urllib不是urllib2
    
    
    #coding:urf-8
    import urllib
    import urllib2
    url = 'http://www.baidu.com/s'
    word = {'wd':'徐晓峰'}
    newurl = url+'?'+urllib.urlencoding(word)
    reqeust = urllib2.Request(newurl)
    request.add_header("Connection":"keep-alive") #可以自由的添加头信息
    print urllib2.urlopen(request).read()
    

    相关文章

      网友评论

          本文标题:002-代理user-agent和response信息、get请

          本文链接:https://www.haomeiwen.com/subject/lknzhqtx.html