美文网首页大数据 爬虫Python AI SqlPython小哥哥
药品是真的贵!利用Python对药品销售进行数据分析!

药品是真的贵!利用Python对药品销售进行数据分析!

作者: 14e61d025165 | 来源:发表于2019-04-18 15:15 被阅读0次

    我们可以用Python中的numpy、pandas、matplotlib等包对数据进行可视化分析。

    一、数据分析的基本步骤

    image

    欢迎加入新手技术交流基地:1004391443 群里有大牛解答,有资源,有源码,学不学的会就看你了!

    二、提出问题

    一切的数据分析目的是为了解决问题,问题明确了才能为后续的分析过程确定了方向,帮助我们有效的选取数据,分析研究。

    本案的分析目标是从销售数据中分析出以下业务指标:

    1)月均消费次数

    2)月均消费金额

    3)客单价

    4)消费趋势

    三、理解问题

    1、导入数据

    需提前安装numpy 、pandas包

    <pre style="margin: 0px; padding: 0px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 18px; line-height: inherit; font-family: inherit; vertical-align: baseline; word-break: break-word; color: rgb(93, 93, 93); letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial;">

    import numpy as np
    import pandas as pd
    
    

    </pre>

    加载数据文件

    <pre style="margin: 0px; padding: 0px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 18px; line-height: inherit; font-family: inherit; vertical-align: baseline; word-break: break-word; color: rgb(93, 93, 93); letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial;">

    x1= pd.ExcelFile(r'D:\数据分析\朝阳医院2018年销售数据.xlsx', dtype='object')
    
    

    </pre>

    读取数据表

    <pre style="margin: 0px; padding: 0px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 18px; line-height: inherit; font-family: inherit; vertical-align: baseline; word-break: break-word; color: rgb(93, 93, 93); letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial;">

    salesDF =x1.parse('Sheet1',dtype='object')
    
    

    </pre>

    2、查看数据基本状况

    <pre style="margin: 0px; padding: 0px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 18px; line-height: inherit; font-family: inherit; vertical-align: baseline; word-break: break-word; color: rgb(93, 93, 93); letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial;">

    salesDF.head(10)
    
    

    </pre>

    image

    查看数据的基本大小

    <pre style="margin: 0px; padding: 0px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 18px; line-height: inherit; font-family: inherit; vertical-align: baseline; word-break: break-word; color: rgb(93, 93, 93); letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial;">

    salesDF.shape
    
    

    </pre>

    (6578, 7)

    查看数据数据表中的每项的基本类型

    <pre style="margin: 0px; padding: 0px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 18px; line-height: inherit; font-family: inherit; vertical-align: baseline; word-break: break-word; color: rgb(93, 93, 93); letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial;">

    salesDF.dtypes
    
    

    </pre>

    image

    用描述函数查看各项的基本状况

    <pre style="margin: 0px; padding: 0px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 18px; line-height: inherit; font-family: inherit; vertical-align: baseline; word-break: break-word; color: rgb(93, 93, 93); letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial;">

    salesDF.describe()
    
    

    </pre>

    image

    通过以上操作我们可以了解到,该项目的有七项基本项目(购药时间、社保卡号、商品编码、商品名称、销售数量、应收金额、实收金额),项目行数为6578。

    四、数据清洗

    获得数据后,不能马上进行数据分析。往往第一手获得数据并不符合我们的数据分析的要求,而且数据表格可能会含有缺失值、异常值等,这使得我们数据分析产生偏差。这就要求我们在数据分析前需要对数据进行清洗。而且在数据分析中,有大约60%的时间花在数据清洗过程。

    数据清洗步骤:

    1.选择子集

    2.列名重命名

    3.缺失数据处理

    4.数据类型转换

    5.数据排序

    6.异常值处理

    1、选择子集

    有事遇到项目较多,分析中只需用到某几项,我们需要选择数据集的子集作为研究对象。

    本案不需要选择子集,下面示范怎么选择子集(一般方法):

    <pre style="margin: 0px; padding: 0px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 18px; line-height: inherit; font-family: inherit; vertical-align: baseline; word-break: break-word; color: rgb(93, 93, 93); letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial;">

    salesDF1=salesDF.loc[0:10,'社保卡号':'销售数量']
    salesDF1
    
    

    </pre>

    image

    2、列名重命名

    如果数据列名符合习惯名称和数据分析,这就需要对列名进行重命名。

    <pre style="margin: 0px; padding: 0px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 18px; line-height: inherit; font-family: inherit; vertical-align: baseline; word-break: break-word; color: rgb(93, 93, 93); letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial;">

    #要用花括号,购药时间->销售时间
    namechang={'购药时间':'销售时间'}
    salesDF.rename(columns = namechang,inplace=True) 
    #参数inplace=True表示覆盖元数据集
    
    

    </pre>

    改好后,再次查看数据

    <pre style="margin: 0px; padding: 0px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 18px; line-height: inherit; font-family: inherit; vertical-align: baseline; word-break: break-word; color: rgb(93, 93, 93); letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial;">

    salesDF.head()
    
    

    </pre>

    image

    3、缺失数据处理

    对于缺失数据,如果缺失数据较少时我们可以直接删除数据。如果缺失数据较多,可以通过建立模型插值来填充数据。

    <pre style="margin: 0px; padding: 0px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 18px; line-height: inherit; font-family: inherit; vertical-align: baseline; word-break: break-word; color: rgb(93, 93, 93); letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial;">

    #查看缺失值的数量
    salesDF[salesDF[['销售时间','社保卡号']].isnull().values == True]
    
    

    </pre>

    image

    序号6574因为销售时间和社保卡号都缺失,所以出现了两次,需要去掉重复数据。

    <pre style="margin: 0px; padding: 0px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 18px; line-height: inherit; font-family: inherit; vertical-align: baseline; word-break: break-word; color: rgb(93, 93, 93); letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial;">

    NaDF = salesDF[salesDF[['销售时间','社保卡号']].isnull().values ==True].drop_duplicates()
    NaDF
    
    

    </pre>

    image

    查看缺失值数量

    <pre style="margin: 0px; padding: 0px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 18px; line-height: inherit; font-family: inherit; vertical-align: baseline; word-break: break-word; color: rgb(93, 93, 93); letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial;">

    NaDF.shape
    
    

    </pre>

    (3, 7)

    查看原数据规模

    <pre style="margin: 0px; padding: 0px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 18px; line-height: inherit; font-family: inherit; vertical-align: baseline; word-break: break-word; color: rgb(93, 93, 93); letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial;">

    salesDF.shape
    
    

    </pre>

    (6578, 7)

    删除缺失值

    <pre style="margin: 0px; padding: 0px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 18px; line-height: inherit; font-family: inherit; vertical-align: baseline; word-break: break-word; color: rgb(93, 93, 93); letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial;">

    salesDF = salesDF.dropna(subset=['销售时间','社保卡号'],how = 'any')
    #how='any' 在给定的任何一列中有缺失值就删除,
    #how='all' 在给定的所有列中都有缺失值就删除。
    
    

    </pre>

    数据删除后并不能自动更新数据序号,如未能更新会导致后续数据合并等操作出错。

    <pre style="margin: 0px; padding: 0px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 18px; line-height: inherit; font-family: inherit; vertical-align: baseline; word-break: break-word; color: rgb(93, 93, 93); letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial;">

    #重新更新序号,使得序号修改为从0到N按顺序的索引值
    salesDF=salesDF.reset_index(drop=True)
    
    

    </pre>

    4、数据类型转换

    把数量、金额项目从字符串类型转换为浮点型类型。

    <pre style="margin: 0px; padding: 0px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 18px; line-height: inherit; font-family: inherit; vertical-align: baseline; word-break: break-word; color: rgb(93, 93, 93); letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial;">

    salesDF['销售数量'] = salesDF['销售数量'].astype('float')
    salesDF['应收金额'] = salesDF['应收金额'].astype('float')
    salesDF['实收金额'] = salesDF['实收金额'].astype('float')
    print('转换后的数据类型:\n',salesDF.dtypes)
    
    

    </pre>

    image

    把日期项目从字符串类型转换为日期类型

    销售日期中含有日期和星期,这里只需用到日期,此需要对数据分割。

    <pre style="margin: 0px; padding: 0px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 18px; line-height: inherit; font-family: inherit; vertical-align: baseline; word-break: break-word; color: rgb(93, 93, 93); letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial;">

    #日期转换
    def dateChange(dateLaw):
     dateList = [] #建立空列表
     for i in dateLaw:
     #例如2018-01-01 星期五,分割后为:2018-01-01
     str = i.split(' ')[0] #按空格分割,取第一列
     dateList.append(str)
     dateChangeOut = pd.Series(dateList) #数组化
     return dateChangeOut
    dateChangeOut = dateChange(salesDF['销售时间'])
    dateChangeOut
    
    

    </pre>

    0 2018-01-01

    1 2018-01-02

    2 2018-01-06

    3 2018-01-11

    4 2018-01-15

    5 2018-01-20

    6 2018-01-31

    7 2018-02-17

    8 2018-02-22

    9 2018-02-24

    10 2018-03-05

    11 2018-03-05

    12 2018-03-05

    13 2018-03-07

    14 2018-03-09

    15 2018-03-15

    16 2018-03-15

    17 2018-03-15

    18 2018-03-20

    19 2018-03-22

    20 2018-03-23

    21 2018-03-24

    22 2018-03-24

    23 2018-03-28

    24 2018-03-29

    25 2018-04-05

    26 2018-04-07

    27 2018-04-13

    28 2018-04-22

    29 2018-05-01

    ...

    6545 2018-04-05

    6546 2018-04-05

    6547 2018-04-09

    6548 2018-04-10

    6549 2018-04-10

    6550 2018-04-10

    6551 2018-04-12

    6552 2018-04-13

    6553 2018-04-13

    6554 2018-04-14

    6555 2018-04-15

    6556 2018-04-15

    6557 2018-04-15

    6558 2018-04-15

    6559 2018-04-16

    6560 2018-04-17

    6561 2018-04-18

    6562 2018-04-21

    6563 2018-04-22

    6564 2018-04-24

    6565 2018-04-25

    6566 2018-04-25

    6567 2018-04-25

    6568 2018-04-26

    6569 2018-04-26

    6570 2018-04-27

    6571 2018-04-27

    6572 2018-04-27

    6573 2018-04-27

    6574 2018-04-28

    Length: 6575, dtype: object

    替换原销售时间数据

    <pre style="margin: 0px; padding: 0px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 18px; line-height: inherit; font-family: inherit; vertical-align: baseline; word-break: break-word; color: rgb(93, 93, 93); letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial;">

    salesDF['销售时间'] = dateChangeOut
    
    

    </pre>

    查看替换情况

    <pre style="margin: 0px; padding: 0px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 18px; line-height: inherit; font-family: inherit; vertical-align: baseline; word-break: break-word; color: rgb(93, 93, 93); letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial;">

    salesDF.head()
    
    

    </pre>

    image

    在转换后查询是否产生新的缺失值

    <pre style="margin: 0px; padding: 0px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 18px; line-height: inherit; font-family: inherit; vertical-align: baseline; word-break: break-word; color: rgb(93, 93, 93); letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial;">

    salesDF['销售时间'].isnull().any()
    
    

    </pre>

    False

    没有产生新的缺失值

    把销售时间的数据类型转换为日期型

    <pre style="margin: 0px; padding: 0px; border: 0px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-variant-numeric: inherit; font-variant-east-asian: inherit; font-weight: 400; font-stretch: inherit; font-size: 18px; line-height: inherit; font-family: inherit; vertical-align: baseline; word-break: break-word; color: rgb(93, 93, 93); letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; background-color: rgb(255, 255, 255); text-decoration-style: initial; text-decoration-color: initial;">

    dateOut=pd.to_datetime(salesDF['销售时间'], format = '%Y-%m-%d', errors='coerce')
    #format 是原始数据中日期的格式
    #errors='cperce' 如果原始数据不符合日期的格式,这输出值为NaT
    dateOut
    

    </pre>

    相关文章

      网友评论

        本文标题:药品是真的贵!利用Python对药品销售进行数据分析!

        本文链接:https://www.haomeiwen.com/subject/cfktgqtx.html