用python分析了 6000 款 App，竟有这么多佳软神器没

作者: 14e61d025165 | 来源:发表于2019-04-14 15:38 被阅读0次

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1555227430052" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;">

image

1. 分析背景

1.1. 为什么选择酷安

如果说 GitHub 是程序员的天堂，那么酷安则是手机 App 爱好者们（别称「搞机」爱好者）的天堂，相比于那些传统的手机应用下载市场，酷安有三点特别之处：

第一、可以搜索下载到各种 神器、佳软 ，其他应用下载市场几乎很难找得到。比如之前的文章中说过的终端桌面「Aris」、安卓最强阅读器「静读天下」、RSS 阅读器「Feedme」等。

第二、可以找到很多 App 的 破解版 。我们提倡「为好东西付费」，但是有些 App 很蛋疼，比如「百度网盘」，在这里面就可以找到很多 App 的破解版。

第三、可以找到 App 的 历史版本 。很多人喜欢用最新版本的 App，一有更新就马上升级，但是现在很多 App 越来越功利、越更新越臃肿、广告满天飞，倒不如 回归本源 ，使用体积小巧、功能精简、无广告的早期版本。

作为一名 App 爱好者，我在酷安上发现了很多不错的 App，越用越感觉自己知道的仅仅是冰山一角，便想扒一扒这个网站上到底有多少好东西，手动一个个去找肯定是不现实了，自然想到最好的方法——用爬虫来解决，为了实现此目的，最近就学习了一下 Scrapy 爬虫框架，爬取了该网 6000 款左右的 App，通过分析，找到了不同领域下的精品 App，下面我们就来一探究竟。

1.2. 分析内容

总体分析 6000 款 App 的评分、下载量、体积等指标。
根据日常使用功能场景，将 App 划分为：系统工具、资讯阅读、社交娱乐等 10 大类别，筛选出每个类别下的精品 App。

1.3. 分析工具

Python
Scrapy
MongoDB
Pyecharts
Matplotlib

2. 数据抓取

由于酷安手机端 App 设置了反扒措施，使用 Charles 尝试后发现无法抓包，暂退而求其次，使用 Scrapy 抓取网页端的 App 信息。抓取时期截止到 2018 年 11 月 23日，共计 6086 款 App，共抓取了 8 个字段信息：App 名称、下载量、评分、评分人数、评论数、关注人数、体积、App 分类标签。

2.1. 目标网站分析

这是我们要抓取的目标网页，点击翻页可以发现两点有用的信息：

每页显示了 10 条 App 信息，一共有610页，也就是 6100 个左右的 App 。
网页请求是 GET 形式，URL 只有一个页数递增参数，构造翻页非常简单。

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1555227430060" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;">

image

接下来，我们来看看选择抓取哪些信息，可以看到，主页面内显示了 App 名称、下载量、评分等信息，我们再点击 App 图标进入详情页，可以看到提供了更齐全的信息，包括：分类标签、评分人数、关注人数等。由于，我们后续需要对 App 进行分类筛选，故分类标签很有用，所以这里我们选择进入每个 App 主页抓取所需信息指标。

通过上述分析，我们就可以确定抓取流程了，首先遍历主页面，抓取 10 个 App 的详情页 URL，然后详情页再抓取每个 App 的指标，如此遍历下来，我们需要抓取 6000 个左右网页内容，抓取工作量不算小，所以，我们接下来尝试使用 Scrapy 框架进行抓取。

我自己是一名高级python开发工程师，这里有我自己整理了一套最新的python系统学习教程，包括从基础的python脚本到web开发、爬虫、数据分析、数据可视化、机器学习等。送给正在学习python的小伙伴！我们的python学习交流q–u--n【 784758214 】,这里是python学习者聚集地，欢迎初学和进阶中的小伙伴！

点击：加入

2.2. Scrapy 框架介绍

它是由国内大神编写的一个爬虫利器， Github Star 超过 10K，但是它的整体功能还是相对单薄一些，还有比它更强大的框架么？有的，就是这里要说的 Scrapy 框架，Github Star 超过 30K，是 Python 爬虫界使用最广泛的爬虫框架，玩爬虫这个框架必须得会。

网上关于 Scrapy 的官方文档和教程很多，这里罗列几个。

Scrapy 中文文档

崔庆才的 Scrapy 专栏

Scrapy 爬拉勾

Scrapy 爬豆瓣电影

Scrapy 框架相对于 Pyspider 相对要复杂一些，有不同的处理模块，项目文件也由好几个程序组成，不同的爬虫模块需要放在不同的程序中去，所以刚开始入门会觉得程序七零八散，容易把人搞晕，建议采取以下思路快速入门 Scrapy：

首先，快速过一下上面的参考教程，了解 Scrapy 的爬虫逻辑和各程序的用途与配合。
接着，看上面两个实操案例，熟悉在 Scrapy 中怎么写爬虫。
最后，找个自己感兴趣的网站作为爬虫项目，遇到不懂的就看教程或者 Google。

这样的学习路径是比较快速而有效的，比一直抠教程不动手要好很多。下面，我们就以酷安网为例，用 Scrapy 来爬取一下。

2.3. 抓取数据

首先要安装好 Scrapy 框架，如果是 Windwos 系统，且已经安装了 Anaconda，那么安装 Scrapy 框架就非常简单，只需打开 Anaconda Prompt 命令窗口，输入下面一句命令即可，会自动帮我们安装好 Scrapy 所有需要安装和依赖的库。

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">1conda pip scrapy
</pre>

2.3.1. 创建项目

接着，我们需要创建一个爬虫项目，所以我们先从根目录切换到需要放置项目的工作路径，比如我这里设置的存放路径为：E:\my_Python\training\kuan，接着继续输入下面一行代码即可创建 kuan 爬虫项目：

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">1# 切换工作路径2e:3cd E:\my_Python\training\kuan4# 生成项目5scrapy startproject kuspider
</pre>

执行上面的命令后，就会生成一个名为 kuan 的 scrapy 爬虫项目，包含以下几个文件：

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">1scrapy. cfg # Scrapy 部署时的配置文件2kuan # 项目的模块，需要从这里引入3_init__.py4items.py # 定义爬取的数据结构5middlewares.py # Middlewares 中间件6pipelines.py # 数据管道文件，可用于后续存储7settings.py # 配置文件8spiders # 爬取主程序文件夹9_init_.py
</pre>

下面，我们需要再 spiders 文件夹中创建一个爬取主程序：kuan.py，接着运行下面两行命令即可：

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">1cd kuan # 进入刚才生成的 kuan 项目文件夹2scrapy genspider kuan www.coolapk.com # 生成爬虫主程序文件 kuan.py
</pre>

2.3.2. 声明 item

项目文件创建好以后，我们就可以开始写爬虫程序了。

首先，需要在 items.py 文件中，预先定义好要爬取的字段信息名称，如下所示：

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">1class KuanItem(scrapy.Item): 2# define the fields for your item here like: 3name = scrapy.Field() 4volume = scrapy.Field() 5download = scrapy.Field() 6follow = scrapy.Field() 7comment = scrapy.Field() 8tags = scrapy.Field() 9score = scrapy.Field()10num_score = scrapy.Field()
</pre>

这里的字段信息就是我们前面在网页中定位的 8 个字段信息，包括：name 表示 App 名称、volume 表示体积、download 表示下载数量。在这里定义好之后，我们在后续的爬取主程序中会利用到这些字段信息。

2.3.3. 爬取主程序

创建好 kuan 项目后，Scrapy 框架会自动生成爬取的部分代码，我们接下来就需要在 parse 方法中增加网页抓取的字段解析内容。

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">1class KuanspiderSpider(scrapy.Spider):2 name = 'kuan'3 allowed_domains = ['www.coolapk.com']4 start_urls = ['http://www.coolapk.com/']56 def parse(self, response):7 pass
</pre>

打开主页 Dev Tools，找到每项抓取指标的节点位置，然后可以采用 CSS、Xpath、正则等方法进行提取解析，这些方法 Scrapy 都支持，可随意选择，这里我们选用 CSS 语法来定位节点，不过需要注意的是，Scrapy 的 CSS 语法和之前我们利用 pyquery 使用的 CSS 语法稍有不同，举几个例子，对比说明一下。

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1555227430070" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;">

image

首先，我们定位到第一个 APP 的主页 URL 节点，可以看到 URL 节点位于 class 属性为 app_left_list 的 div 节点下的 a 节点中，其 href 属性就是我们需要的 URL 信息，这里是相对地址，拼接后就是完整的 URL。

接着我们进入酷安详情页，选择 App 名称并进行定位，可以看到 App 名称节点位于 class 属性为 .detail_app_title 的 p 节点的文本中。

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1555227430072" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;">

image

定位到这两个节点之后，我们就可以使用 CSS 提取字段信息了，这里对比一下常规写法和 Scrapy 中的写法：

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">1# 常规写法2url = item('.app_left_list>a').attr('href')3name = item('.list_app_title').text()4# Scrapy 写法5url = item.css('::attr("href")').extract_first()6name = item.css('.detail_app_title::text').extract_first()
</pre>

可以看到，要获取 href 或者 text 属性，需要用 :: 表示，比如获取 text，则用 ::text。extract_first() 表示提取第一个元素，如果有多个元素，则用 extract() 。接着，我们就可以参照写出 8 个字段信息的解析代码。

首先，我们需要在主页提取 App 的 URL 列表，然后再进入每个 App 的详情页进一步提取 8 个字段信息。

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">1def parse(self, response):2 contents = response.css('.app_left_list>a')3 for content in contents:4 url = content.css('::attr("href")').extract_first()5 url = response.urljoin(url) # 拼接相对 url 为绝对 url6 yield scrapy.Request(url,callback=self.parse_url)
</pre>

这里，利用 response.urljoin() 方法将提取出的相对 URL 拼接为完整的 URL，然后利用 scrapy.Request() 方法构造每个 App 详情页的请求，这里我们传递两个参数：url 和 callback，url 为详情页 URL，callback 是回调函数，它将主页 URL 请求返回的响应 response 传给专门用来解析字段内容的 parse_url() 方法，如下所示：

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">1def parse_url(self,response): 2 item = KuanItem() 3 item['name'] = response.css('.detail_app_title::text').extract_first() 4 results = self.get_comment(response) 5 item['volume'] = results[0] 6 item['download'] = results[1] 7 item['follow'] = results[2] 8 item['comment'] = results[3] 9 item['tags'] = self.get_tags(response)10 item['score'] = response.css('.rank_num::text').extract_first()11 num_score = response.css('.apk_rank_p1::text').extract_first()12 item['num_score'] = re.search('共(.?)个评分',num_score).group(1)13 yield item1415def get_comment(self,response):16 messages = response.css('.apk_topba_message::text').extract_first()17 result = re.findall(r'\s+(.?)\s+/\s+(.?)下载\s+/\s+(.?)人关注\s+/\s+(.?)个评论.?',messages) # \s+ 表示匹配任意空白字符一次以上18 if result: # 不为空19 results = list(result[0]) # 提取出list 中第一个元素20 return results2122def get_tags(self,response):23 data = response.css('.apk_left_span2')24 tags = [item.css('::text').extract_first() for item in data]25 return tags
</pre>

这里，单独定义了 get_comment() 和 get_tags() 两个方法.

get_comment() 方法通过正则匹配提取 volume、download、follow、comment 四个字段信息，正则匹配结果如下：

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">1result = re.findall(r'\s+(.?)\s+/\s+(.?)下载\s+/\s+(.?)人关注\s+/\s+(.?)个评论.*?',messages) 2print(result) # 输出第一页的结果信息 3# 结果如下： 4[('21.74M', '5218万', '2.4万', '5.4万')] 5[('75.53M', '2768万', '2.3万', '3.0万')] 6[('46.21M', '1686万', '2.3万', '3.4万')] 7[('54.77M', '1603万', '3.8万', '4.9万')] 8[('3.32M', '1530万', '1.5万', '3343')] 9[('75.07M', '1127万', '1.6万', '2.2万')]10[('92.70M', '1108万', '9167', '1.3万')]11[('68.94M', '1072万', '5718', '9869')]12[('61.45M', '935万', '1.1万', '1.6万')]13[('23.96M', '925万', '4157', '1956')]
</pre>

然后利用 result[0]、result[1] 等分别提取出四项信息，以 volume 为例，输出第一页的提取结果：

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">1item['volume'] = results[0] 2print(item['volume']) 321.74M 475.53M 546.21M 654.77M 73.32M 875.07M 992.70M1068.94M1161.45M1223.96M
</pre>

这样一来，第一页 10 款 App 的所有字段信息都被成功提取出来，然后返回到 yied item 生成器中，我们输出一下它的内容：

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">1[2{'name': '酷安', 'volume': '21.74M', 'download': '5218万', 'follow': '2.4万', 'comment': '5.4万', 'tags': "['酷市场', '酷安', '市场', 'coolapk', '装机必备']", 'score': '4.4', 'num_score': '1.4万'}, 3{'name': '微信', 'volume': '75.53M', 'download': '2768万', 'follow': '2.3万', 'comment': '3.0万', 'tags': "['微信', 'qq', '腾讯', 'tencent', '即时聊天', '装机必备']",'score': '2.3', 'num_score': '1.1万'},4...5]
</pre>

2.3.4. 分页爬取

以上，我们爬取了第一页内容，接下去需要遍历爬取全部 610 页的内容，这里有两种思路：

第一种是提取翻页的节点信息，然后构造出下一页的请求，然后重复调用 parse 方法进行解析，如此循环往复，直到解析完最后一页。
第二种是先直接构造出 610 页的 URL 地址，然后批量调用 parse 方法进行解析。

这里，我们分别写出两种方法的解析代码，第一种方法很简单，直接接着 parse 方法继续添加以下几行代码即可：

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">1def parse(self, response):2 contents = response.css('.app_left_list>a')3 for content in contents:4 ...56 next_page = response.css('.pagination li:nth-child(8) a::attr(href)').extract_first()7 url = response.urljoin(next_page)8 yield scrapy.Request(url,callback=self.parse )
</pre>

第二种方法，我们在最开头的 parse() 方法前，定义一个 start_requests() 方法，用来批量生成 610 页的 URL，然后通过 scrapy.Request() 方法中的 callback 参数，传递给下面的 parse() 方法进行解析。

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">1def start_requests(self):2 pages = []3 for page in range(1,610): # 一共有610页4 url = 'https://www.coolapk.com/apk/?page=%s'%page5 page = scrapy.Request(url,callback=self.parse)6 pages.append(page)7 return pages
</pre>

以上就是全部页面的爬取思路，爬取成功后，我们需要存储下来。这里，我面选择存储到 MongoDB 中，不得不说，相比 MySQL，MongoDB 要方便省事很多。

2.3.5. 存储结果

我们在 pipelines.py 程序中，定义数据存储方法，MongoDB 的一些参数，比如地址和数据库名称，需单独存放在 settings.py 设置文件中去，然后在 pipelines 程序中进行调用即可。

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">1import pymongo 2class MongoPipeline(object): 3 def init(self,mongo_url,mongo_db): 4 self.mongo_url = mongo_url 5 self.mongo_db = mongo_db 6 @classmethod 7 def from_crawler(cls,crawler): 8 return cls( 9 mongo_url = crawler.settings.get('MONGO_URL'),10 mongo_db = crawler.settings.get('MONGO_DB')11 )12 def open_spider(self,spider):13 self.client = pymongo.MongoClient(self.mongo_url)14 self.db = self.client[self.mongo_db]15 def process_item(self,item,spider):16 name = item.class.__name__17 self.db[name].insert(dict(item))18 return item19 def close_spider(self,spider):20 self.client.close()
</pre>

首先，我们定义一个 MongoPipeline(）存储类，里面定义了几个方法，简单进行一下说明：

from crawler() 是一个类方法，用＠class method 标识，这个方法的作用主要是用来获取我们在 settings.py 中设置的这几项参数：

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">1MONGO_URL = 'localhost'2MONGO_DB = 'KuAn'3ITEM_PIPELINES = {4 'kuan.pipelines.MongoPipeline': 300,5}
</pre>

open_spider() 方法主要进行一些初始化操作，在 Spider 开启时，这个方法就会被调用。

process_item() 方法是最重要的方法，实现插入数据到 MongoDB 中。

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1555227430081" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;">

image

完成上述代码以后，输入下面一行命令就可以开始整个爬虫的抓取和存储过程了，单机跑的话，6000 个网页需要不少时间才能完成，保持耐心。

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">1scrapy crawl kuan
</pre>

这里，还有两点补充：

第一，为了减轻网站压力，我们最好在每个请求之间设置几秒延时，可以在 KuanSpider() 方法开头出，加入以下几行代码：

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">1custom_settings = {2 "DOWNLOAD_DELAY": 3, # 延迟3s,默认是0，即不延迟3 "CONCURRENT_REQUESTS_PER_DOMAIN": 8 # 每秒默认并发8次，可适当降低4 }
</pre>

第二，为了更好监控爬虫程序运行，有必要 设置输出日志文件 ，可以通过 Python 自带的 logging 包实现：

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">1import logging23logging.basicConfig(filename='kuan.log',filemode='w',level=logging.WARNING,format='%(asctime)s %(message)s',datefmt='%Y/%m/%d %I:%M:%S %p')4logging.warning("warn message")5logging.error("error message")
</pre>

这里的 level 参数表示警告级别，严重程度从低到高分别是：DEBUG < INFO < WARNING < ERROR < CRITICAL，如果想日志文件不要记录太多内容，可以设置高一点的级别，这里设置为 WARNING，意味着只有 WARNING 级别以上的信息才会输出到日志中去。

添加 datefmt 参数是为了在每条日志前面加具体的时间，这点很有用处。

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1555227430085" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;">

image

以上，我们就完成了整个数据的抓取，有了数据我们就可以着手进行分析，不过这之前还需简单地对数据做一下清洗和处理。

3. 数据清洗处理

首先，我们从 MongoDB 中读取数据并转化为 DataFrame，然后查看一下数据的基本情况。

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">1def parse_kuan(): 2 client = pymongo.MongoClient(host='localhost', port=27017) 3 db = client['KuAn'] 4 collection = db['KuAnItem'] 5 # 将数据库数据转为DataFrame 6 data = pd.DataFrame(list(collection.find())) 7 print(data.head()) 8 print(df.shape) 9 print(df.info())10 print(df.describe())
</pre>

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1555227430088" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;">

image

从 data.head() 输出的前 5 行数据中可以看到，除了 score 列是 float 格式以外，其他列都是 object 文本类型。

comment、download、follow、num_score 这 5 列数据中部分行带有「万」字后缀，需要将字符去掉再转换为数值型；volume 体积列，则分别带有「M」和「K」后缀，为了统一大小，则需将「K」除以 1024，转换为「M」体积。

整个数据一共有 6086 行 x 8 列，每列均没有缺失值。

df.describe() 方法对 score 列做了基本统计，可以看到，所有 App 的平均得分是 3.9 分（5 分制），最低得分 1.6 分，最高得分 4.8 分。

下面，我们将以上几列文本型数据转换为数值型数据，代码实现如下：

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">1def data_processing(df): 2#处理'comment','download','follow','num_score','volume' 5列数据，将单位万转换为单位1，再转换为数值型 3 str = '_ori' 4 cols = ['comment','download','follow','num_score','volume'] 5 for col in cols: 6 colori = col+str 7 df[colori] = df[col] # 复制保留原始列 8 if not (col == 'volume'): 9 df[col] = clean_symbol(df,col)# 处理原始列生成新列10 else:11 df[col] = clean_symbol2(df,col)# 处理原始列生成新列1213 # 将download单独转换为万单位14 df['download'] = df['download'].apply(lambda x:x/10000)15 # 批量转为数值型16 df = df.apply(pd.to_numeric,errors='ignore')1718def clean_symbol(df,col):19 # 将字符“万”替换为空20 con = df[col].str.contains('万/pre>)21 df.loc[con,col] = pd.to_numeric(df.loc[con,col].str.replace('万','')) * 1000022 df[col] = pd.to_numeric(df[col])23 return df[col]2425def clean_symbol2(df,col):26 # 字符M替换为空27 df[col] = df[col].str.replace('M/pre>,'')28 # 体积为K的除以 1024 转换为M29 con = df[col].str.contains('K/pre>)30 df.loc[con,col] = pd.to_numeric(df.loc[con,col].str.replace('K/pre>,''))/102431 df[col] = pd.to_numeric(df[col])32 return df[col]
</pre>

以上，就完成了几列文本型数据的转换，我们再来查看一下基本情况：

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1555227430091" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;">

image

download 列为 App 下载数量， 下载量最多的 App 有 5190 万次 ，最少的为 0 (很少很少)，平均下载次数为 14 万次；从中可以看出以下几点信息：

volume 列为 App 体积，体积最大的 App 达到近 300M，体积最小的几乎为 0，平均体积在 18M 左右。
comment 列为 App 评分，评分数最多的达到了 5 万多条，平均有 200 多条。

以上，就完成了基本的数据清洗处理过程，下面将对数据进行探索性分析。

4. 数据分析

我们主要从总体和分类两个维度对 App 下载量、评分、体积等指标进行分析。

4.1. 总体情况

4.1.1. 下载量排名

首先来看一下 App 的下载量情况，很多时候我们下载一个 App ，下载量是一个非常重要的参考指标，由于绝大多数 App 的下载量都相对较少，直方图无法看出趋势，所以我们择将数据进行分段，离散化为柱状图，绘图工具采用的是 Pyecharts。

<tt-image data-tteditor-tag="tteditorTag" contenteditable="false" class="syl1555227430094" data-render-status="finished" data-syl-blot="image" style="box-sizing: border-box; cursor: text; color: rgb(34, 34, 34); font-family: "PingFang SC", "Hiragino Sans GB", "Microsoft YaHei", "WenQuanYi Micro Hei", "Helvetica Neue", Arial, sans-serif; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; white-space: pre-wrap; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial; display: block;">

image

可以看到多达 5517 款（占总数 84%）App 的下载量不到 10 万，而下载量超过 500 万的仅有 20 款，开发一个要想盈利的 App ，用户下载量尤为重要，从这一点来看， 大部分 App 的处境都比较尴尬，至少是在酷安平台上。

代码实现如下：

<pre spellcheck="false" style="box-sizing: border-box; margin: 5px 0px; padding: 5px 10px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 16px; line-height: inherit; font-family: inherit; vertical-align: baseline; cursor: text; counter-reset: list-1 0 list-2 0 list-3 0 list-4 0 list-5 0 list-6 0 list-7 0 list-8 0 list-9 0; background-color: rgb(240, 240, 240); border-radius: 3px; white-space: pre-wrap; color: rgb(34, 34, 34); letter-spacing: normal; orphans: 2; text-align: left; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">1from pyecharts import Bar 2# 下载量分布 3bins = [0,10,100,500,10000] 4group_names = ['<=10万','10-100万','100-500万','>500万'] 5cats = pd.cut(df['download'],bins,labels=group_names) # 用 pd.cut() 方法进行分段 6cats = pd.value_counts(cats) 7bar = Bar('App 下载数量区间分布','绝大部分 App 下载量低于 10 万') 8# bar.use_theme('macarons') 9bar.add(10 'App 数量 (个)',11 list(cats.index),12 list(cats.values),13 is_label_show = True,14 is_splitline_show = False,15)16bar.render(path='download_interval.png',pixel_ration=1)
</pre>

接下来，我们看看 下载量最多的 20 款 App 是哪些：

我自己是一名高级python开发工程师，这里有我自己整理了一套最新的python系统学习教程，包括从基础的python脚本到web开发、爬虫、数据分析、数据可视化、机器学习等。送给正在学习python的小伙伴！我们的python学习交流q–u--n【 683380553】,这里是python学习者聚集地，欢迎初学和进阶中的小伙伴！

网友评论

本文标题：用python分析了 6000 款 App，竟有这么多佳软神器没

本文链接：https://www.haomeiwen.com/subject/htcswqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

用python分析了 6000 款 App，竟有这么多佳软神器没

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

大数据爬虫Python AI Sql

Python小哥哥

用python分析了 6000 款 App，竟有这么多佳软神器没

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

大数据 爬虫Python AI Sql

Python小哥哥

大数据爬虫Python AI Sql