安装requests库
~ pip install requests
Collecting requests
Downloading https://files.pythonhosted.org/packages/7d/e3/20f3d364d6c8e5d2353c72a67778eb189176f08e873c9900e10c0287b84b/requests-2.21.0-py2.py3-none-any.whl (57kB)
100% |████████████████████████████████| 61kB 66kB/s
Collecting certifi>=2017.4.17 (from requests)
Downloading https://files.pythonhosted.org/packages/9f/e0/accfc1b56b57e9750eba272e24c4dddeac86852c2bebd1236674d7887e8a/certifi-2018.11.29-py2.py3-none-any.whl (154kB)
100% |████████████████████████████████| 163kB 110kB/s
Collecting chardet<3.1.0,>=3.0.2 (from requests)
Downloading https://files.pythonhosted.org/packages/bc/a9/01ffebfb562e4274b6487b4bb1ddec7ca55ec7510b22e4c51f14098443b8/chardet-3.0.4-py2.py3-none-any.whl (133kB)
100% |████████████████████████████████| 143kB 149kB/s
Requirement already satisfied: idna<2.9,>=2.5 in ./.virtualenvs/py3env/lib/python3.6/site-packages (from requests) (2.7)
Collecting urllib3<1.25,>=1.21.1 (from requests)
Downloading https://files.pythonhosted.org/packages/62/00/ee1d7de624db8ba7090d1226aebefab96a2c71cd5cfa7629d6ad3f61b79e/urllib3-1.24.1-py2.py3-none-any.whl (118kB)
100% |████████████████████████████████| 122kB 193kB/s
Installing collected packages: certifi, chardet, urllib3, requests
Successfully installed certifi-2018.11.29 chardet-3.0.4 requests-2.21.0 urllib3-1.24.1
用requests库实现get和post:
# get请求
import requests
url = 'http://httpbin.org/get'
data = {'key': 'value', 'abc': 'xyz'}
# .get是使用get方式请求url,字典类型的data不用进行额外处理
response = requests.get(url, data)
print(response.text)
# post请求
import requests
url = 'http://httpbin.org/post'
data = {'key': 'value', 'abc': 'xyz'}
# .post表示为post方法
response = requests.post(url, data)
# 返回类型为json格式
print(response.json())
爬取案例:
import requests
import re
content = requests.get('http://www.cnu.cc/discoveryPage/hot-人像').text
# print(content)
#正则匹配原文
#< div class ="grid-item work-thumbnail" >
#< a href = "http://www.cnu.cc/works/332291"
#class ="thumbnail" target="_blank" >
#< div class ="title" > On the STREET of daylight. < / div >
#< div class ="author" >摄影师Gin< / div >
#正则表达式
# < div class ="grid-item work-thumbnail" >
# < a href="(.*?)".*?title">(.*?)</div>#用括号分为两组字段提取
# < div class ="author" > LynnWei < / div >
pattern = re.compile(r'<a href="(.*?)".*?title">(.*?)</div>', re.S)#用括号分为两组字段提取
results = re.findall(pattern, content)#输出类型为一个元组
# print(results)
for result in results:
url, name = result
print(url, re.sub('\s', '', name))
网友评论