美文网首页
记一次面试题:excel数据处理

记一次面试题:excel数据处理

作者: 弦好想断 | 来源:发表于2020-08-26 20:54 被阅读0次

Sales表:

Institution表:

Product表:

Price表:

Question:求各省份2017全年销售金额(全部产品)Top1的医院;(请保留计算过程)

import pandas as pd
import openpyxl 
path = r'C:\Users\谢谢你的爱1998\Desktop\Excel能力测试解答-章文格.xlsx'
df = pd.read_excel(path,sheet_name= 'Sales')
df = df[pd.to_datetime(df.SalesDate,format='%Y-%m-%d').dt.year==2017]
df2 = pd.read_excel(path,sheet_name= 'Institution')
# print(df2.info())
#检查并发现两表之间连接键出现不一致
for i in range(len(df)):
    if df.loc[i,'InstitutionCode']== 'F6302 ':
        df.loc[i,'InstitutionCode'] = 'F6302'
    else:
        pass
df_12 = pd.merge(df,df2,on='InstitutionCode',how='left')
# print(df_12.info())
# a = df_12[df_12.Province.isnull()] 检查是否还存在缺失值
# a.InstitutionCode.unique()
df4 = pd.read_excel(path,sheet_name= 'Price')
df4 = df4[pd.to_datetime(df4.EffectiveDate,format='%Y-%m-%d').dt.year==2017]
df4 = df4.drop_duplicates(['SKU'])#删去重复SKU值
ddff = pd.merge(df_12,df4,on='SKU')
ddff['money'] = ddff['Quantity']*ddff['Price']
# print(ddff.money.sum())#销售额总和
groupby_ddff = ddff.groupby(['Province','InstitutionName'])['money'].sum().reset_index()
groupby_ddff['group_sort']=groupby_ddff['money'].groupby(groupby_ddff['Province']).rank(ascending=False,method='first')
groupby_ddff_top1 = groupby_ddff[groupby_ddff['group_sort']==1]
groupby_ddff_top1

写入到原excel

# import xlwings as xw
book = openpyxl.load_workbook(path)   #读取你要写入的workbook
#和pd.read_excel() 用于将Dataframe写入excel。xls用xlwt。xlsx用openpyxl
writer = pd.ExcelWriter(path, engine='openpyxl')   
##此时的writer里还只是读写器. 然后将上面读取的book复制给writer  
writer.book = book
#转化为字典的形式
writer.sheets = dict((ws.title, ws) for ws in book.worksheets)
#将data写入writer
groupby_ddff_top1.to_excel(writer,sheet_name="Answer_Python",index=False)
writer.save()

相关文章

网友评论

      本文标题:记一次面试题:excel数据处理

      本文链接:https://www.haomeiwen.com/subject/mugfsktx.html