python版本:3.6.2
编辑器:jupyter notebook
完整代码如下:
import numpy as np
import pandas as pd
df = pd.read_excel('D:\\python_test\\num.xls')
df.sort_values(by=['Number'])# 按Number列排序
# 使用merge函数实现excel表格的合并,合并方式为inner
df_inner = pd.merge(df, df1, how='inner')
#设置索引列
df.set_index('id')
# 使用where对数据进行判断和分组,并使用group字段进行标记
df['group'] = np.where(df['Number'] > 50, 'high', 'low')# where条件语句:添加group列,若Number>50则为high,否则为low
# 对符合多个条件的数据进行分组标记
df.loc[(df['Author2'] == '刘宏伟') & (df['Number'] >= 50), 'sign']=1# 对Author2列等于刘宏伟且Number列大于等于50的数据标记为1
# 取单行数据
df.loc[1]
# 按索引区域提取
df.loc[0:5]
print(df)
运行结果截图:
# group和sign标记后
data:image/s3,"s3://crabby-images/b6140/b6140e155e797eabe39ce799ce2c8817ef09f809" alt=""
# df.loc[1]
data:image/s3,"s3://crabby-images/ff707/ff707387ffbbfefe181e2153ca7e8229291275c0" alt=""
# df.loc[0:5]
data:image/s3,"s3://crabby-images/5e28b/5e28bbad8d7ac2e74cd792589dc5daeba462202a" alt=""
补充:
# 条件筛选
df.loc[df['Number']== 51,['Author2','Number']]
data:image/s3,"s3://crabby-images/74887/748876188bbf3a152f8ee5c1ca408a9262c28a77" alt=""
可能出现的报错1:报错798: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison result = getattr(x, name)(y)
可能出现的报错2:TypeError: invalid type comparison
无效。。。因为之前把51加了引号。。。‘51’为无效类型。。。只有str类型才加引号,例如'张三’。论基础知识的重要性......
# 条件筛选
df.loc[(df['Number']>66) | (df['Author2']== '刘宏伟'),['Author2','Number']]
data:image/s3,"s3://crabby-images/f6429/f6429a871795f83895334426f526aa45381b74d3" alt=""
# 条件筛选并排序
df.loc[(df['Number']>66) | (df['Author2']== '刘宏伟'),['Author2','Number']].sort(['Number'])
data:image/s3,"s3://crabby-images/985f7/985f79ad2f0aaa8d8842c07233d4760e162654e3" alt=""
悲剧来了,报错了。。。AttributeError: 'DataFrame' object has no attribute 'sort'
【解决方案】把sort改为sort_values
data:image/s3,"s3://crabby-images/371fc/371fc23fe95ac764a329863c6731c7abfdf8dc3a" alt=""
# 将所筛选字段Number列求和
df.loc[(df['Number']>66) | (df['Author2']== '刘宏伟'),['Author2','Number']].sort_values(['Number']).Number.sum()
data:image/s3,"s3://crabby-images/6f92e/6f92ee4df59eb47e1fe22fbc67e6760d7ef7c40f" alt=""
# 将所筛选字段Number列计数
df.loc[(df['Number']>66) | (df['Author2']== '刘宏伟'),['Author2','Number']].sort_values(['Number']).Number.count()
data:image/s3,"s3://crabby-images/873f3/873f3e87a3f32939a3ee1395babdb72d5f4dfbc9" alt=""
data:image/s3,"s3://crabby-images/a67e1/a67e18b888c768057e595bd053c41be1d2bcbbb8" alt=""
data:image/s3,"s3://crabby-images/a8c12/a8c128b2dda594bacab71b2e96c7f0e1124467dd" alt=""
data:image/s3,"s3://crabby-images/50886/50886b61b64e1762766e1083ebb68024733434f5" alt=""
DataFrame的增删查改
data:image/s3,"s3://crabby-images/47476/47476dbe6b28aa32a2745a41082235c5a2cde86a" alt=""
data:image/s3,"s3://crabby-images/d965b/d965b439dccee8008c073b52f53c30bdb70b3205" alt=""
data:image/s3,"s3://crabby-images/411e7/411e7f776849b1cb966f23fec04a307fa5a01a28" alt=""
# 随机采样
df.sample(n=3)
# 对数据表进行描述性统计,小数位为2,并进行转置
df.describe().round(2).T
# 输出为excel格式
df.to_excel('D:\\python_test\\num.xlsx', sheet_name='practice')
# 输出为csv文件
df.to_csv('D:\\python_test\\num.csv')
网友评论