美文网首页
数据预处理(Pandas&Numpy部分)

数据预处理(Pandas&Numpy部分)

作者: 数据与风控 | 来源:发表于2019-11-27 11:25 被阅读0次

整理了一些利用pandas和numpy对文件进行预处理的常用方法,数据为加州房价预测数据,仅供参考(to be continued,我太懒了- -!)

#载入数据函数
def load_housing_data(path):
    return pd.read_csv(path)

#载入数据,查看前五行
path = "D://housing.csv"
df = load_housing_data(path)
print(df['total_rooms'].head(5))  #查看某列的前五行
print(df.dtypes)  #查看数据类型
print(df.index)  #查看行
print(df.columns)  #查看列
print(df.describe()) #数据集统计描述
print(df.T)  #数据集转置
print(df.sort_values(by = 'total_bedrooms',ascending = False).head(20)) #按照某列累加并降序排列,取前20
print(df.housing_median_age.head(4))#某列前四行
#about pandas
print(df.iloc[0:3,0:10])  #数据切片(索引,连续)
print(df.iloc[0:6,[1,3,6,9]]) #数据切片数据切片(索引,不连续)
print(df.ix[0:6,[1,3,6,8,9]])  #ix完美兼容loc和iloc,推荐
print(df.ix[:3,["longitude","latitude","housing_median_age","total_rooms"]])
print(df.ix[:,["longitude","latitude","housing_median_age","total_rooms"]])
print(df[df["housing_median_age"] > 41].sort_values(by='housing_median_age',ascending=True))  #根据某列条件进行判断
df.iloc[0,0] = 99999      #某个值置为新数字
print(df.head(2))
print(np.shape(df))   #数据集形状
df[df.housing_median_age>41] = 1000
print(df.ix[[1,2,4],[1,3,6,8,9]])  ##ix通用行列切分
print(df.head(10))
df['total_rooms'] = np.nan  #置为null
df = df.head(10)
print(df.isnull())  把空值标记为True
df = df.dropna(axis=0)  #清洗null数据
print(df.head(10))
df.to_csv("D://housing12345.csv") #导出文件到D盘

df1 = df.ix[0:4,0:3]
df2 = df.ix[8:10,0:3]
print(df1)
print("xxxxxxxxx")
print(df2)
print("xxxxxxxxx")
print(pd.concat([df1,df2],axis=0)) #按行concat连接
print("inner")
print(pd.concat([df1,df2],axis=1,join='inner')) #按列concat连接,inner,outer=full out类似于sql的连表方式
print("outer")
print(pd.concat([df1,df2],axis=1,join='outer'))
print(df1.append(df2))
print(df2.append(df1))

left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
                     'A': ['A0', 'A1', 'A2', 'A3'],
                     'B': ['B0', 'B1', 'B2', 'B3']})

right = pd.DataFrame({'key': ['K0', 'K1', 'K5', 'K4'],
                      'C': ['C0', 'C1', 'C2',  'C3'],
                      'D': ['D0', 'D1', 'D2', 'D3']})
print(left)
print(right)
print(pd.merge(left,right,on='key',how='inner')) #类似于sql中的inner/outer/left/right join on='key',参加merge函数完美兼容join,类似于ix兼容lioc和loc,推荐
print(pd.merge(left,right,on='key',how='outer'))
print(pd.merge(left,right,on='key',how='left'))
print(pd.merge(left,right,on='key',how='right'))

相关文章

网友评论

      本文标题:数据预处理(Pandas&Numpy部分)

      本文链接:https://www.haomeiwen.com/subject/jyqpwctx.html