Python基础学习18

作者: ericblue | 来源:发表于2019-03-07 15:48 被阅读0次

Python基础学习18
快速学习Python基础知识(4)
快速学习Python基础知识(3)
Python基础学习知识点总结！
RobotFramework接口测试分享（一）
python基础教程python编程入门python基础学习py
Python爬虫学习（十六）初窥Scrapy
第一阶段
python学习笔记
反馈2017-9-17--9-24有自己的步伐，却不能让自己太舒

安装requests库

~  pip install requests
Collecting requests
  Downloading https://files.pythonhosted.org/packages/7d/e3/20f3d364d6c8e5d2353c72a67778eb189176f08e873c9900e10c0287b84b/requests-2.21.0-py2.py3-none-any.whl (57kB)
    100% |████████████████████████████████| 61kB 66kB/s
Collecting certifi>=2017.4.17 (from requests)
  Downloading https://files.pythonhosted.org/packages/9f/e0/accfc1b56b57e9750eba272e24c4dddeac86852c2bebd1236674d7887e8a/certifi-2018.11.29-py2.py3-none-any.whl (154kB)
    100% |████████████████████████████████| 163kB 110kB/s
Collecting chardet<3.1.0,>=3.0.2 (from requests)
  Downloading https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl (133kB)
    100% |████████████████████████████████| 143kB 149kB/s
Requirement already satisfied: idna<2.9,>=2.5 in ./.virtualenvs/py3env/lib/python3.6/site-packages (from requests) (2.7)
Collecting urllib3<1.25,>=1.21.1 (from requests)
  Downloading https://files.pythonhosted.org/packages/62/00/ee1d7de624db8ba7090d1226aebefab96a2c71cd5cfa7629d6ad3f61b79e/urllib3-1.24.1-py2.py3-none-any.whl (118kB)
    100% |████████████████████████████████| 122kB 193kB/s
Installing collected packages: certifi, chardet, urllib3, requests
Successfully installed certifi-2018.11.29 chardet-3.0.4 requests-2.21.0 urllib3-1.24.1

用requests库实现get和post：

# get请求
import requests
url = 'http://httpbin.org/get'
data = {'key': 'value', 'abc': 'xyz'}
# .get是使用get方式请求url，字典类型的data不用进行额外处理
response = requests.get(url, data)
print(response.text)

# post请求
import requests
url = 'http://httpbin.org/post'
data = {'key': 'value', 'abc': 'xyz'}
# .post表示为post方法
response = requests.post(url, data)
# 返回类型为json格式
print(response.json())

爬取案例：

import requests
import re
content = requests.get('http://www.cnu.cc/discoveryPage/hot-人像').text
# print(content)

#正则匹配原文
#< div class ="grid-item work-thumbnail" >
#< a href = "http://www.cnu.cc/works/332291"
#class ="thumbnail" target="_blank" >
#< div class ="title" > On the STREET of daylight. < / div >
#< div class ="author" >摄影师Gin< / div >

#正则表达式
# < div class ="grid-item work-thumbnail" >
# < a href="(.*?)".*?title">(.*?)</div>#用括号分为两组字段提取
# < div class ="author" > LynnWei < / div >

pattern = re.compile(r'<a href="(.*?)".*?title">(.*?)</div>', re.S)#用括号分为两组字段提取
results = re.findall(pattern, content)#输出类型为一个元组
# print(results)

for result in results:
    url, name = result
    print(url, re.sub('\s', '', name))