美文网首页编程
Python基础学习18

Python基础学习18

作者: ericblue | 来源:发表于2019-03-07 15:48 被阅读0次

    安装requests库

    ~  pip install requests
    Collecting requests
      Downloading https://files.pythonhosted.org/packages/7d/e3/20f3d364d6c8e5d2353c72a67778eb189176f08e873c9900e10c0287b84b/requests-2.21.0-py2.py3-none-any.whl (57kB)
        100% |████████████████████████████████| 61kB 66kB/s
    Collecting certifi>=2017.4.17 (from requests)
      Downloading https://files.pythonhosted.org/packages/9f/e0/accfc1b56b57e9750eba272e24c4dddeac86852c2bebd1236674d7887e8a/certifi-2018.11.29-py2.py3-none-any.whl (154kB)
        100% |████████████████████████████████| 163kB 110kB/s
    Collecting chardet<3.1.0,>=3.0.2 (from requests)
      Downloading https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl (133kB)
        100% |████████████████████████████████| 143kB 149kB/s
    Requirement already satisfied: idna<2.9,>=2.5 in ./.virtualenvs/py3env/lib/python3.6/site-packages (from requests) (2.7)
    Collecting urllib3<1.25,>=1.21.1 (from requests)
      Downloading https://files.pythonhosted.org/packages/62/00/ee1d7de624db8ba7090d1226aebefab96a2c71cd5cfa7629d6ad3f61b79e/urllib3-1.24.1-py2.py3-none-any.whl (118kB)
        100% |████████████████████████████████| 122kB 193kB/s
    Installing collected packages: certifi, chardet, urllib3, requests
    Successfully installed certifi-2018.11.29 chardet-3.0.4 requests-2.21.0 urllib3-1.24.1
    

    用requests库实现get和post:

    # get请求
    import requests
    url = 'http://httpbin.org/get'
    data = {'key': 'value', 'abc': 'xyz'}
    # .get是使用get方式请求url,字典类型的data不用进行额外处理
    response = requests.get(url, data)
    print(response.text)
    
    # post请求
    import requests
    url = 'http://httpbin.org/post'
    data = {'key': 'value', 'abc': 'xyz'}
    # .post表示为post方法
    response = requests.post(url, data)
    # 返回类型为json格式
    print(response.json())
    

    爬取案例:

    import requests
    import re
    content = requests.get('http://www.cnu.cc/discoveryPage/hot-人像').text
    # print(content)
    
    #正则匹配原文
    #< div class ="grid-item work-thumbnail" >
    #< a href = "http://www.cnu.cc/works/332291"
    #class ="thumbnail" target="_blank" >
    #< div class ="title" > On the STREET of daylight. < / div >
    #< div class ="author" >摄影师Gin< / div >
    
    #正则表达式
    # < div class ="grid-item work-thumbnail" >
    # < a href="(.*?)".*?title">(.*?)</div>#用括号分为两组字段提取
    # < div class ="author" > LynnWei < / div >
    
    pattern = re.compile(r'<a href="(.*?)".*?title">(.*?)</div>', re.S)#用括号分为两组字段提取
    results = re.findall(pattern, content)#输出类型为一个元组
    # print(results)
    
    for result in results:
        url, name = result
        print(url, re.sub('\s', '', name))
    

    相关文章

      网友评论

        本文标题:Python基础学习18

        本文链接:https://www.haomeiwen.com/subject/czngeqtx.html