美文网首页
1122|urllib

1122|urllib

作者: 喵在野 | 来源:发表于2015-11-22 16:53 被阅读0次

    http://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000/001432688314740a0aed473a39f47b09c8c7274c9ab6aee000

    https://docs.python.org/3/library/urllib.request.html?highlight=request.urlopen#urllib.request.urlopen

    urllib.request的说明:
    https://docs.python.org/3.5/library/urllib.request.html

    urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)
    Open the URL url, which can be either a string or a Request object.

    data must be a bytes object specifying additional data to be sent to the server, or None if no such data is needed. data may also be an iterable object and in that case Content-Length value must be specified in the headers. Currently HTTP requests are the only ones that use data; the HTTP request will be a POST instead of a GET when the data parameter is provided.

    
    Python 3 抓取网页资源的几种办法:
    
    1、最简单
    import urllib.request
    response = urllib.request.urlopen('http://python.org/')
    html = response.read()
    
    2、使用 Request
    import urllib.request
    
    req = urllib.request.Request('http://python.org/')
    response = urllib.request.urlopen(req)
    the_page = response.read()
    
    3、发送数据
    复制代码
    #! /usr/bin/env python3
    
    import urllib.parse
    import urllib.request
    
    url = 'http://localhost/login.php'
    user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)'
    values = {
    'act' : 'login',
    'login[email]' : 'yzhang@i9i8.com',
    'login[password]' : '123456'
    }
    
    data = urllib.parse.urlencode(values)
    req = urllib.request.Request(url, data)
    req.add_header('Referer', 'http://www.python.org/')
    response = urllib.request.urlopen(req)
    the_page = response.read()
    
    print(the_page.decode("utf8"))
    复制代码

    相关文章

      网友评论

          本文标题:1122|urllib

          本文链接:https://www.haomeiwen.com/subject/ypqrhttx.html