谷歌应用商品APP数据集的分析

作者: 任海亮 | 来源:发表于2018-12-21 19:09 被阅读0次

谷歌应用商品APP数据集的分析
应用商店数据分析报告
谷歌应用商店APP的分析
商品分析怎么做？可视化模板来了
在线商品购物网站数据分析报告
有趣的生活 355 -- 简单的东西会说话，有趣的应用商品名称对
谷歌云的公开数据集资源
ASO获客入门— 让你的App在App Store从上架到获取更
淘宝用户行为分析
开启数据挖掘及数据分析学习之旅

谷歌应用商品APP的分析报告

Google play store analysis

数据集来自kaggle,爬取的谷歌应用商店的APP数据
我们今天来探索一下数据，并且看下哪些因素可以影响顾客评分Rating

环境 python 3.6, windows 10, jupyter notebook

首先导入相关分析包

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

#导入数据集
data =pd.read_csv('googleplaystore.csv')

探索数据

# 首先看下数据头
data.head()

image.png

#看下总体情况
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10841 entries, 0 to 10840
Data columns (total 13 columns):
App               10841 non-null object
Category          10841 non-null object
Rating            9367 non-null float64
Reviews           10841 non-null object
Size              10841 non-null object
Installs          10841 non-null object
Type              10840 non-null object
Price             10841 non-null object
Content Rating    10840 non-null object
Genres            10841 non-null object
Last Updated      10841 non-null object
Current Ver       10833 non-null object
Android Ver       10838 non-null object
dtypes: float64(1), object(12)
memory usage: 1.1+ MB

这份数据有10841行，13个字段包括APP名，分类，打分，下载量，评论量，是否付费，价格，最新更新日期，版本

首先要转化数据成我们需要的格式，Rating,Size,Price要转换成数值型，Last updated要转换成时间序列

#改变为数值型
#data.Reviews.value_counts()
pd.to_numeric(data['Reviews'])

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

pandas\src\inference.pyx in pandas.lib.maybe_convert_numeric (pandas\lib.c:55708)()


ValueError: Unable to parse string "3.0M"


During handling of the above exception, another exception occurred:


ValueError                                Traceback (most recent call last)

<ipython-input-5-e509e4352e56> in <module>()
      1 #改变为数值型
      2 #data.Reviews.value_counts()
----> 3 pd.to_numeric(data['Reviews'])


C:\Users\renhl1\Anaconda3\lib\site-packages\pandas\tools\util.py in to_numeric(arg, errors, downcast)
    193             coerce_numeric = False if errors in ('ignore', 'raise') else True
    194             values = lib.maybe_convert_numeric(values, set(),
--> 195                                                coerce_numeric=coerce_numeric)
    196 
    197     except Exception:


pandas\src\inference.pyx in pandas.lib.maybe_convert_numeric (pandas\lib.c:56097)()


ValueError: Unable to parse string "3.0M" at position 10472

#第10472行有问题，看下什么原因
data.loc[10472,]

App               Life Made WI-Fi Touchscreen Photo Frame
Category                                              1.9
Rating                                                 19
Reviews                                              3.0M
Size                                               1,000+
Installs                                             Free
Type                                                    0
Price                                            Everyone
Content Rating                                        NaN
Genres                                  February 11, 2018
Last Updated                                       1.0.19
Current Ver                                    4.0 and up
Android Ver                                           NaN
Name: 10472, dtype: object

#可以看出这行数据错误，直接删除
data.drop(10472,inplace=True)

data['Reviews']=data['Reviews'].astype(int)

#更改Size为数值型
data.Size.unique()

array(['19M', '14M', '8.7M', '25M', '2.8M', '5.6M', '29M', '33M', '3.1M',
       '28M', '12M', '20M', '21M', '37M', '2.7M', '5.5M', '17M', '39M',
       '31M', '4.2M', '7.0M', '23M', '6.0M', '6.1M', '4.6M', '9.2M',
       '5.2M', '11M', '24M', 'Varies with device', '9.4M', '15M', '10M',
       '1.2M', '26M', '8.0M', '7.9M', '56M', '57M', '35M', '54M', '201k',
       '3.6M', '5.7M', '8.6M', '2.4M', '27M', '2.5M', '16M', '3.4M',
       '8.9M', '3.9M', '2.9M', '38M', '32M', '5.4M', '18M', '1.1M', '2.2M',
       '4.5M', '9.8M', '52M', '9.0M', '6.7M', '30M', '2.6M', 
       ……
       '892k', '154k', '860k', '364k', '387k', '626k', '161k', '879k',
       '39k', '970k', '170k', '141k', '160k', '144k', '143k', '190k',
       '376k', '193k', '246k', '73k', '658k', '992k', '253k', '420k',
       '404k', '470k', '226k', '240k', '89k', '234k', '257k', '861k',
       '467k', '157k', '44k', '676k', '67k', '552k', '885k', '1020k',
       '582k', '619k'], dtype=object)

# 去掉错误值 Varies with device，替换为空值
data['Size'].replace('Varies with device', np.nan, inplace = True )

data['Size'].isnull().sum() #空值总数

#由于size里有‘kM’字符，为了转换成数值型，我们需要用正则表达式进行匹配
import re  #导入正则表达式包

#定义一个函数来，k改为1000，M改为1000,000
def change(i):
    if i is not np.nan:
        A,B=re.split('[kM]+',i)
        C,D=re.split('[0-9.]+',i)
        if D=='M':
            A=float(A)*1000000
        elif D =='k':
            A=float(A)*1000
        return A

#转换size列为数值型
data['Size'] =data['Size'].apply(lambda x:change(x))

#用平均值来填充空置
data['Size'].fillna(data.groupby('Category')['Size'].transform('mean'),inplace=True)

#data['Price'].value_counts()   
#看下price里具体哪些数据

#变更price为float型
data['Price']=data['Price'].apply(lambda x: float(x[1:]) if x !='0' else 0 )

#首先看下有多少款APPs
len(data.App.unique())

#比data行数少，说明有重复项，看下具体是哪些APP
data.App.value_counts()

ROBLOX                                                9
CBS Sports App - Scores, News, Stats & Watch Live     8
Candy Crush Saga                                      7
ESPN                                                  7
Duolingo: Learn Languages Free                        7
……

#选择第一个APP看下内容
data[data['App']=='ROBLOX']

image.png

#可以看到Reviews不一样，去除重复项
#对于多个分类的，只保留一个分类（有100多个APP）
data=data.drop_duplicates(subset=['App'])

data.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 9659 entries, 0 to 10840
Data columns (total 13 columns):
App               9659 non-null object
Category          9659 non-null object
Rating            8196 non-null float64
Reviews           9659 non-null int32
Size              9659 non-null float64
Installs          9659 non-null object
Type              9658 non-null object
Price             9659 non-null float64
Content Rating    9659 non-null object
Genres            9659 non-null object
Last Updated      9659 non-null object
Current Ver       9651 non-null object
Android Ver       9657 non-null object
dtypes: float64(3), int32(1), object(9)
memory usage: 1018.7+ KB

具体分析每个字段

#分析category

cate= data['Category'].groupby(data['Category']).count()

cate=cate.sort_values(ascending=False)

plt.figure(figsize=(15,10))
sns.barplot(x=cate.index,y=cate.values)
plt.xticks(rotation=90)
plt.xlabel('Category')
plt.ylabel('App qty')
plt.title("App qty by category")

<matplotlib.text.Text at 0x1b76369a2e8>

image.png

labels=data['Category'].value_counts().index
sizes= data['Category'].value_counts().values
#做饼图看各分类占比
plt.figure(figsize = (10,10))
plt.pie(sizes, labels=labels, autopct='%1.1f%%')
plt.title('App qty by category',color = 'blue',fontsize = 15)

<matplotlib.text.Text at 0x1b7638f8128>

image.png

结论：按分类数量，排名前3的APP是family 19.6%,game 9.9%,tool 8.5%,而且显著高于之后分类的APP数量

#分析Genres
len(data.Genres.value_counts())

#genres 总共有120个类型
genr= data['Genres'].groupby(data['Genres']).count()

genr=genr.sort_values(ascending=False)

genr.index[:15] #选择前15个类型

Index(['Tools', 'Entertainment', 'Education', 'Business', 'Medical',
       'Personalization', 'Productivity', 'Lifestyle', 'Finance', 'Sports',
       'Communication', 'Action', 'Health & Fitness', 'Photography',
       'News & Magazines'],
      dtype='object', name='Genres')

plt.figure(figsize=(15,10))
sns.barplot(x=genr.index[:15],y=genr.values[:15])
plt.xticks(rotation=90)
plt.xlabel('Genres')
plt.ylabel('App qty')
plt.title("App qty by Genres")

<matplotlib.text.Text at 0x1b764a7c278>

image.png

data.describe()

image.png

#看下Rating数据分布
fig=plt.figure(figsize=(15,6))
ax1 = fig.add_subplot(131)
ax2 = fig.add_subplot(132)
ax3 = fig.add_subplot(133)
sns.violinplot(y=data['Rating'],data=data,ax=ax1)
sns.kdeplot(data.Rating,ax=ax2,shade=True)
sns.boxplot(y=data.Rating,ax=ax3)

image.png

结论：50% app 评分在4-4.5之间，均值4.17分

#看下reivews数据
#data['Reviews'].value_counts()

fig=plt.figure(figsize=(12,8))
sns.kdeplot(data.Reviews,shade=True)  #Reviews 的密度分布

image.png

绝大部分APP的评论少于10个

#具体看下评论少于200的APP的分布
a=[]
for i in range(0,200,5):
    a.append(i)

fig=plt.figure(figsize=(15,8))
plt.hist(data['Reviews'],a,histtype="bar",rwidth=0.8,alpha=0.4)
plt.xticks(np.arange(0, 100, step=5))

image.png

#找出top 10 reiviews
b=data['Reviews'].value_counts()
b.sort_index(ascending=False)

78158306      1
69119316      1
66577313      1
56642847      1
44891723      1
42916526      1
27722264      1
25655305      1
24900999      1
23133508      1
22426677      1
 ……
16           35
15           30
14           41
13           49
12           58
11           52
10           62
9            64
8            72
7            88
6            945           
4           137
3           170
2           213
1           272
0           593
Name: Reviews, dtype: int64

data[data['Reviews']>20000000]

image.png

看下评论最高的APP除了4个游戏类，竟然主要是facebook系，谷歌系的只有youtube上榜，最后两个是猎豹移动的

接下来分析下价格的影响，包括tpye和price两个字段


a=data.Type.value_counts()


labels=data['Type'].value_counts().index
explode = [0.2,0]  #每一块饼离中心的距离
sizes= data['Type'].value_counts().values
#colors = ['grey','blue','red','yellow','green','brown']

plt.figure(figsize = (9,9))
plt.pie(sizes, labels=labels, autopct='%1.1f%%',explode=explode)
plt.rcParams.update({'font.size': 10})
plt.title('App qty by type',color = 'blue',fontsize = 20)

<matplotlib.text.Text at 0x1b765455208>

image.png

可以看到92.2%的APP免费，付费APP占比7.8%

#分析下price
data['Price'].value_counts()

0.00      8903
0.99       145
2.99       124
1.99        73
4.99        70
3.99        57
1.49        46
5.99        26
2.49        25
9.99        19
399.99      12
6.99        11
14.99        9
4.49         9  
          ... 
Name: Price, dtype: int64


price = data['Price'].value_counts()
price.drop(0,inplace=True)  #删除免费的，分析付费APP情况

price=price.sort_values(ascending=False)

fig = plt.figure(figsize=(15,10))
sns.kdeplot(data[data['Price']!=0]['Price']) #分析付费APP的密度分布图

image.png

可以看到绝大部分APP价格低于30美元，但是看到400美元价位有一个凸起，把这类选中看下什么情况

data[data['Price']==399.99]

image.png

在网上查了后发现这是一个恶搞软件,没有任何用处。看了play确实有几千评论，10W下载，不过没明白为什么有这么说下载量，有人知道的话可以告诉我

可以之后价格分析中把这些异常值删除

#我们再具体看下所有分类
#num = str(a.tolist()).count("1")
#num
#绝大部分APP会定价0.99,1.99,2.99等，为了更改的分析，我们把价格值唯一的删除（也就是只有一个APP定的是这个价格),总共63个值
price =price[price>1]

#a=data['Price'].value_counts().values

fig = plt.figure(figsize=(12,10))
sns.kdeplot(price.values,shade=True)

C:\Users\renhl1\Anaconda3\lib\site-packages\statsmodels\nonparametric\kdetools.py:20: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  y = X[:m/2+1] + np.r_[0,X[m/2+1:],0]*1j





<matplotlib.axes._subplots.AxesSubplot at 0x1b765bcbb38>

image.png


fig = plt.figure(figsize=(18,10))
sns.barplot(price.index,price.values)

image.png

付费的绝大部分在10美元以下，排名top5 依次是0.99,2.99,1.99,4.99,3.99美元

#转变last undated 为日期型
data['Last Updated']=pd.to_datetime(data['Last Updated'])

fig = plt.figure(figsize=(10,7))
plt.plot(data['Last Updated'],'.')

image.png

#看 installs情况
data['Installs'].value_counts()

1,000,000+        1417
100,000+          1112
10,000+           1031
10,000,000+        937
1,000+             888
100+               710
5,000,000+         607
500,000+           505
50,000+            469
5,000+             468
10+                385
500+               328
50+                204
50,000,000+        202
100,000,000+       188
5+                  82
1+                  67
500,000,000+        24
1,000,000,000+      20
0+                  14
0                    1
Name: Installs, dtype: int64

install=data['Installs'].groupby(data['Installs']).count()
install =install.sort_values(ascending=False)

fig = plt.figure(figsize=(9,12))
sns.barplot(x=install.values,y=install.index)
plt.ylabel('installed times')
plt.xlabel('App qty')
plt.title("App qty by installed times")

image.png

可以看到APP数量最多的是1M次下载的，另外还有一个好玩的地方，5开头的下载量显著的少于10开头的下载量

# 下载量超过10亿次的APP情况
data[data['Installs']=='1,000,000,000+']

image.png

超过10亿下载量的大多数是google的产品

#看下下载量跟reviews有没有关系
reviews=data['Reviews'].groupby(data['Installs']).mean()

fig = plt.figure(figsize=(15,9))
sns.barplot(x=reviews.values,y=reviews.index)
plt.ylabel('installed times')
plt.xlabel('reviews')
plt.title("avg.reivew by installed times")
plt.xscale('log') #刻度改为log

image.png

可以看到下载量确实和评论数呈正相关

探索评分可能跟哪些参数有关

首先清理评分为0的数据并赋值到一个新数据集

#data['Rating'].value_counts()

newdata=data[data['Rating'].notnull()]  #删除没有评分的数据

newdata.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 8196 entries, 0 to 10840
Data columns (total 13 columns):
App               8196 non-null object
Category          8196 non-null object
Rating            8196 non-null float64
Reviews           8196 non-null int32
Size              8196 non-null float64
Installs          8196 non-null object
Type              8196 non-null object
Price             8196 non-null float64
Content Rating    8196 non-null object
Genres            8196 non-null object
Last Updated      8196 non-null datetime64[ns]
Current Ver       8192 non-null object
Android Ver       8194 non-null object
dtypes: datetime64[ns](1), float64(3), int32(1), object(8)
memory usage: 864.4+ KB

#看下last update 和 rating 有没有关系
fig = plt.figure(figsize=(10,7))
plt.plot(newdata['Last Updated'],newdata['Rating'],'.')

[<matplotlib.lines.Line2D at 0x1b7670ec5f8>]

image.png

#把年份单独提取出来，作为新的一列
from datetime import datetime

newdata['updated_year']=newdata['Last Updated'].dt.year


fig = plt.figure(figsize=(15,9))
sns.boxplot(newdata['updated_year'],newdata['Rating'])
plt.xlabel('updated year')
plt.ylabel('rating')
plt.title('rating with different updated year')

image.png

可以得出结论随着时间APP的中位数打分在越来越高，到了2018年首次超过75%的APP分数超过4分，说明随着移动应用的完善，低质的APP基本没有了市场

plt.figure(figsize=(12,9))
sns.boxplot(x=newdata['Type'],y=newdata['Rating'],data=newdata)

image.png

可以看到付费APP的评分比免费APP的评分高

#看下reivews和rating是否有相关性
#pearson相关性，值在-1和+1之间，+1表示完全正相关，-1表示完全负相关，0表示没有相关性
plt.figure(figsize=(10,10))
sns.jointplot(newdata['Reviews'],newdata['Rating'],kind='reg',size =7)

image.png

#看下size是否有相关性
plt.figure(figsize=(10,10))
sns.jointplot(newdata['Size'],newdata['Rating'],kind='reg',size =7)

image.png

结论：Rating跟Reviews 和 Size 没有相关性

#看下category 和 rating 的关系
fig =plt.figure(figsize=(15,12))
sns.boxplot(y=newdata['Category'],x=newdata['Rating'],data=newdata)
#plt.xticks(rotation=90)
plt.ylabel('category')
plt.xlabel('rating')
plt.title('rating distribution by category')

<matplotlib.text.Text at 0x1b768aabb00>

image.png

可以看到评分最低的是dating :),评分比较高的分类有art and design, events,personalization,parenting

#看installs和Rating关系
installrate =newdata['Rating'].groupby(newdata['Installs']).count()
installrate

Installs
1+                   3
1,000+             697
1,000,000+        1415
1,000,000,000+      20
10+                 69
10,000+            987
10,000,000+        937
100+               303
100,000+          1094
100,000,000+       188
5+                   9
5,000+             425
5,000,000+         607
50+                 56
50,000+            457
50,000,000+        202
500+               199
500,000+           504
500,000,000+        24
Name: Rating, dtype: int64

#把下载人数过少的评论去掉，只查看高于100下载的
slected =newdata.loc[(newdata['Installs'] != '1+')&(newdata['Installs'] != '5+')&(newdata['Installs'] != '10+')&(newdata['Installs'] != '50+')]

#看下 installs 和 rating 的关系
fig =plt.figure(figsize=(15,9))
sns.boxplot(x=slected['Installs'],y=slected['Rating'])
plt.xticks(rotation=45)
plt.xlabel('Installed qty')
plt.ylabel('rating')
plt.title('rating distribution by category')

image.png

#分数集中在4-4.5，rating跟installs 没有很强的相关性

#看下跟Price关系，前面tpye相当于付费0元 和大于0元的比较，这里再细分付费金额的区别
#drop 0元 和 异常的i'm rick APP
selected =newdata.loc[(newdata['Price']!=0) & (newdata['Price']<200)]

#看下 installs 和 rating 的关系    
fig =plt.figure(figsize=(15,9))
sns.jointplot(x=selected['Price'],y=selected['Rating'],kind='reg')
#xplt.xticks(rotation=45)
plt.xlabel('Price')
plt.ylabel('rating')
plt.title('rating distribution vs. price')

image.png

分值-0.029，price和rating 缺乏相关性

#看下category 和 genres 
data['App'].groupby([data['Category'],data['Genres']]).count()

Category             Genres                               
ART_AND_DESIGN       Art & Design                              57
                     Art & Design;Action & Adventure            1
                     Art & Design;Creativity                    5
                     Art & Design;Pretend Play                  1
AUTO_AND_VEHICLES    Auto & Vehicles                           85
BEAUTY               Beauty                                    53
BOOKS_AND_REFERENCE  Books & Reference                        222
BUSINESS             Business                                 420
COMICS               Comics                                    55
                     Comics;Creativity                          1
COMMUNICATION        Communication                            315
DATING               Dating                                   171
EDUCATION            Education                                 99
                     Education;Action & Adventure               1
                     Education;Brain Games                      3
                     Education;Creativity                       3
                     Education;Education                        8
                     Education;Music & Video                    1
                     Education;Pretend Play                     4
ENTERTAINMENT        Entertainment                             92
                     Entertainment;Brain Games                  2
                     Entertainment;Creativity                   1
                     Entertainment;Music & Video                7
EVENTS               Events                                    64
FAMILY               Action;Action & Adventure                  9
                     Adventure;Action & Adventure               4
                     Adventure;Brain Games                      1
                     Adventure;Education                        1
                     Arcade;Action & Adventure                 14
                     Arcade;Pretend Play                        1
                                                             ... 
GAME                 Simulation;Education                       1
                     Sports                                     6
                     Strategy                                  17
                     Trivia                                    38
                     Word                                      23
HEALTH_AND_FITNESS   Health & Fitness                         288
HOUSE_AND_HOME       House & Home                              74
LIBRARIES_AND_DEMO   Libraries & Demo                          84
LIFESTYLE            Lifestyle                                368
                     Lifestyle;Pretend Play                     1
MAPS_AND_NAVIGATION  Maps & Navigation                        131
MEDICAL              Medical                                  395
NEWS_AND_MAGAZINES   News & Magazines                         254
PARENTING            Parenting                                 46
                     Parenting;Brain Games                      1
                     Parenting;Education                        7
                     Parenting;Music & Video                    6
PERSONALIZATION      Personalization                          376
PHOTOGRAPHY          Photography                              281
PRODUCTIVITY         Productivity                             374
SHOPPING             Shopping                                 202
SOCIAL               Social                                   239
SPORTS               Sports                                   325
TOOLS                Tools                                    826
                     Tools;Education                            1
TRAVEL_AND_LOCAL     Travel & Local                           218
                     Travel & Local;Action & Adventure          1
VIDEO_PLAYERS        Video Players & Editors                  162
                     Video Players & Editors;Music & Video      1
WEATHER              Weather                                   79
Name: App, dtype: int64

#不同category，付费用户比例
a=data['App'].groupby([data['Category'],data['Type']]).count()

c=[]
d=[]
for i in a.index.values:
    c.append(i[0])
    d.append(i[1])

typedata=pd.DataFrame({'Category':c,'Type':d,'values':list(a.values)})

fig =plt.figure(figsize=(15,12))
sns.barplot(y=typedata[typedata['Type']=='Paid']['Category'],x=typedata[typedata['Type']=='Paid']['values'],color='yellow',alpha=0.8,label='Paid')
sns.barplot(y=typedata[typedata['Type']=='Free']['Category'],x=typedata[typedata['Type']=='Free']['values'],color='green',alpha = 0.2,label='Free')

<matplotlib.axes._subplots.AxesSubplot at 0x1b76968add8>

image.png

可以看出付费用户占最高的是ENTERTAINMENT,'LIBRARIES_AND_DEMO,BEAUTY,SHOPPING

#看下安卓版本和rating 关系
newdata['Android Ver'].value_counts()

4.1 and up            1811
4.0.3 and up          1141
4.0 and up            1042
Varies with device     947
4.4 and up             713
2.3 and up             547
5.0 and up             447
4.2 and up             316
2.3.3 and up           232
2.2 and up             203
3.0 and up             201
4.3 and up             185
2.1 and up             112
1.6 and up              87
6.0 and up              42
7.0 and up              41
3.2 and up              31
2.0 and up              27
5.1 and up              16
1.5 and up              16
3.1 and up               8
2.0.1 and up             7
4.4W and up              5
8.0 and up               5
7.1 and up               3
4.0.3 - 7.1.1            2
1.0 and up               2
5.0 - 8.0                2
4.1 - 7.1.1              1
7.0 - 7.1.1              1
5.0 - 6.0                1
Name: Android Ver, dtype: int64

fig = plt.figure(figsize=(15,9))
sns.boxplot(x=newdata['Rating'],y=newdata['Android Ver'])
plt.xlabel('rating')
plt.ylabel('android ver')

<matplotlib.text.Text at 0x1b76a0e3240>

image.png

支持安卓版本和rating没有特别相关性

#看戏分级和Rating关系
data['Content Rating'].value_counts()

Everyone           7903
Teen               1036
Mature 17+          393
Everyone 10+        322
Adults only 18+       3
Unrated               2
Name: Content Rating, dtype: int64

fig = plt.figure(figsize=(15,9))
sns.boxplot(x=newdata['Content Rating'],y=newdata['Rating'])
plt.xlabel('content rating')
plt.ylabel('rating')

<matplotlib.text.Text at 0x1b76b76c198>

image.png

conclusion

本篇共分析了谷歌应用商店APP数据集，共9659个APPs

评分rating的均值是4.17, 50%的APP分值在4-4.5分

app分类数量排名前3的APP是family 19.6%,game 9.9%,tool 8.5%的总APP数量占比

付费用户占比7.8%，其中ENTERTAINMENT,'LIBRARIES_AND_DEMO,BEAUTY,SHOPPING等分类的付费APP最高，付费的价格绝大部分在10美元以下，排名top5 依次是0.99,2.99,1.99,4.99,3.99美元、

大多数APP支持安卓4.0以上版本，还支持安卓2.0,3.0的APP已经很少了

超过10亿下载量的大多数是google系的产品，但是评论量最高的是facebook系产品

影响Rating分值的因子有Type,Category,updated year

谷歌应用商品APP数据集的分析
谷歌应用商品APP的分析报告 Google play store analysis 数据集来自kaggle,爬取的...
应用商店数据分析报告
一、数据背景数据来源于kaggle，是谷歌应用商店APP的数据，结合艾瑞分析数据，了解当前谷歌应用商店APP的下载...
谷歌应用商店APP的分析
谷歌应用商店Google Play Store，囊括了应用、视频、音乐、电子图书及其他数字产品。本篇通过对谷歌应...
商品分析怎么做？可视化模板来了
对商品进行分析可以从采购、销售、库存等方面分析，通过对数据的应用，提升商品的销售业绩，提升商品库存利用率，包括价格...
在线商品购物网站数据分析报告
一、背景和数据报告中的数据来源与谷歌商品在线购物网站数据，本次分析通过分析购物网站2017年10月1日至2018年...
有趣的生活 355 -- 简单的东西会说话，有趣的应用商品名称对
谷歌的应用叫Google Play，苹果的应用叫App Store; 华为的应用市场，叫App Gallery； ...
谷歌云的公开数据集资源
谷歌云托管了一些数据集，每月有1TB的免费额度。Kaggle有的比赛所用的数据集也来自谷歌云的公开数据集。可用的...
ASO获客入门— 让你的App在App Store从上架到获取更
对苹果iOS应用开发者而言，App Store是其用户获取下载App的唯一渠道，根据移动应用数据分析机构App A...
淘宝用户行为分析
数据来自天池开发数据集，淘宝用户行为数据集；数据信息为：用户id，商品id，商品类目id，用户行为类型（pv：浏...
开启数据挖掘及数据分析学习之旅
数据挖掘之旅数据挖掘简介及其应用场景搭建Python数据挖掘环境亲和性分析示例：根据购买习惯推荐商品经典分...