人在上海,每天都要打开的网站是
百度实时疫情大数据
几乎是各种维度的数据统计,如何整理出全国各地区的数据变化呢?
第一步,首先确定数据位置,即分析数据源
第二步,请求数据
第三步,数据解析
第四步,数据保存
完整代码块
#加载模块
import requests
import re
import json
import csv
import pandas as pd
#身份伪装,其实没必要
headers={
'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.82 Safari/537.36'
}
#请求地址
url='https://voice.baidu.com/act/newpneumonia/newpneumonia/?from=osari_aladin_banner'
#发送请求
response=requests.get(url=url,headers=headers)
#数据解析
data_html=response.text
#【0】转换数据类型从list到str,强大的正则
json_str=re.findall('"component":\[(.*)\],',data_html)[0]
#转换字典
json_dict=json.loads(json_str)
caseList=json_dict['caseList']
for case in caseList:
area=case['area']#
confirmed=case['confirmed']#
curConfirm=case['curConfirm']
asymptomatic=case['asymptomatic']
crued=case['crued']#
died=case['died']#
confirmedRelative=case['confirmedRelative']
diedRelative=case['diedRelative']
curedRelative=case['curedRelative']
asymptomaticRelative=case['asymptomaticRelative']
nativeRelative=case['nativeRelative']
overseasInputRelative=case['overseasInputRelative']
#打印检查 print(area,confirmed,curConfirm,confirmedRelative,nativeRelative,overseasInputRelative, asymptomatic,asymptomaticRelative,crued,curedRelative,died,diedRelative)
#写入表格
with open('./data.csv',mode='a',encoding='utf-8',newline='')as f:
csv_writer=csv.writer(f)
csv_writer.writerow([area,confirmed,curConfirm,confirmedRelative,nativeRelative,overseasInputRelative,asymptomatic,asymptomaticRelative,crued,curedRelative,died,diedRelative])
最后,对比一下输出内容,数据一致~
截屏2022-04-02 13.39.21.png
截屏2022-04-02 13.49.07.png
网友评论