美文网首页
2019-08-23数据清洗作业

2019-08-23数据清洗作业

作者: 有人喜欢你 | 来源:发表于2019-08-23 21:03 被阅读0次
'''
作业要求:
1、成功读取“商铺数据.csv”文件
2、解析数据,存成列表字典格式:[{'var1':value1,'var2':value2,'var3':values,...},...,{}]
3、数据清洗:
① comment,price两个字段清洗成数字
② 清除字段缺失的数据
③ commentlist拆分成三个字段,并且清洗成数字
4、结果存为.pkl文件

'''
import numpy as np
import pandas as pd

shop = pd.read_csv(r'C:\Users\heart\Documents\Tencent Files\592409588\FileRecv\【非常重要】课程资料\CLASSDATA_ch01数据思维导论:如何从数据中挖掘价值?\CLASSDATA_ch01数据思维导论:如何从数据中挖掘价值?\CLASSDATA_ch01数据思维导论:如何从数据中挖掘价值?\练习01_商铺数据加载及存储_资料\商铺数据.csv'
                   ,engine='python'
                   ,sep=','
                   ,header=1
                   ,encoding='utf8'
                   ,names=['classify','name','comment','star','price','address','commentlist'])

shop.head()
shc = list(shop.columns)
shc
list(shop['classify'].values)

#先构建空列表、空字典 ,使用双层遍历循环,进行添加
lst = []
dic = {}

for col in shc:
    values = list(shop[col].values)
    
    for value in values:
        dic[col] = value
        lst.append(dic)
    
len(lst)
lst[300]

#清洗数据,去除值中的文本字符
shop['comment'][1]
shop['comment'] = shop['comment'].str.split('                    ')[0][0]
#shop['comment'].str.split('                    ')[0][0]
shop['price'] = shop['price'].str.split('                                        ¥')
shop['price'] = shop['price'][1][1]
shop['price'].astype('float64')

#commentlist拆分成三个字段,并且清洗成数字
shop['commentlist'] = shop['commentlist'].str.split('                                ')
shop['commentlist_zl'] = shop['commentlist'][0][0][2:]
shop['commentlist_hj'] = shop['commentlist'][0][1][2:]
shop['commentlist_hw'] = shop['commentlist'][0][2][2:]

shop.head()

#结果存为.pkl文件
import pickle 

#shop.to_csv('shop.csv',index=False,sep=',')

with open('shop.txt','wb') as f:
    pickle.dump(shop,f)    #写入

with open('shop.pkl','rb') as fo:
    data1 = pickle.load(fo,encoding='byes')   #读取
    
data1.head()

相关文章

网友评论

      本文标题:2019-08-23数据清洗作业

      本文链接:https://www.haomeiwen.com/subject/rirxectx.html