流程:
1.DrissionPage+Selenium自动爬虫工具采集漫画视频、详情、标签等约200万条漫画数据存入mysql数据库;
2.Mapreduce对采集的动漫数据进行数据清洗、拆分数据项等,转为.csv文件上传hadoop的hdfs集群;
3.hive建库建表导入.csv动漫数据;
4.一半指标使用hive_sql分析得出,一半指标使用Spark之Scala完成;
5.sqoop对分析结果导入mysql数据库;
6.Flask+echarts搭建可视化大屏;
创新点:Python全新DrissionPage+Selenium双爬虫使用、海量数据、爬虫、可视化大屏、离线hive+实时Spark双实现
![](https://img.haomeiwen.com/i21576447/1c92533050711865.png)
![](https://img.haomeiwen.com/i21576447/043628ea6a468dc6.png)
![](https://img.haomeiwen.com/i21576447/0da941cdcaac0f28.png)
![](https://img.haomeiwen.com/i21576447/80007a672122a676.png)
![](https://img.haomeiwen.com/i21576447/b0d42e6ea561383b.png)
![](https://img.haomeiwen.com/i21576447/dd574329dc26e296.png)
![](https://img.haomeiwen.com/i21576447/5b93087df83c08f7.png)
![](https://img.haomeiwen.com/i21576447/8989f949753ce53c.png)
![](https://img.haomeiwen.com/i21576447/3bdceaf394a0b820.png)
![](https://img.haomeiwen.com/i21576447/f48ec96493599e40.png)
![](https://img.haomeiwen.com/i21576447/ee341c17825de80b.png)
![](https://img.haomeiwen.com/i21576447/51ef5bf6b1cd544d.png)
![](https://img.haomeiwen.com/i21576447/4b37b3051e9fe43b.png)
![](https://img.haomeiwen.com/i21576447/1a9c9d61d98a1e46.png)
![](https://img.haomeiwen.com/i21576447/5cc1edb9ddedd4e2.png)
![](https://img.haomeiwen.com/i21576447/e3e3b2600eee6f50.png)
![](https://img.haomeiwen.com/i21576447/a66618c680dc2dda.png)
![](https://img.haomeiwen.com/i21576447/574791f28afe6f89.png)
![](https://img.haomeiwen.com/i21576447/2a99b2d4cb77e096.png)
![](https://img.haomeiwen.com/i21576447/e9845726ef684524.png)
![](https://img.haomeiwen.com/i21576447/1e3779f7f7ce4f3c.png)
![](https://img.haomeiwen.com/i21576447/e5929679388965e0.png)
![](https://img.haomeiwen.com/i21576447/51b563509271c192.png)
![](https://img.haomeiwen.com/i21576447/ffb59a2e69f59c6c.png)
![](https://img.haomeiwen.com/i21576447/c5b52ee49900da65.png)
![](https://img.haomeiwen.com/i21576447/c7d6ae82ac91409e.png)
网友评论