Pandas 的groupby操作

作者: ledao | 来源:发表于2017-07-25 00:12 被阅读0次

pandas[2]
Pandas高级教程之:GroupBy用法
Pandas 的groupby操作
pandas groupby函数
pandas groupby用法之as_index
pandas 数据操作
python--pandas分组聚合
Pandas使用笔记
Day18 - 2018-04-20
《利用Python进行数据分析》 12.2高阶GroupBy应用

目的

在做数据分析的时候，我们的数据一般从数据库来，那么就涉及到groupby操作。例如，我们要预测一个居民小区的未来一段时间的电费，那么就要将数据按照小区groupby，然后按照时间排序，这里groupby操作可完美的完成这个任务。
假设数据表cellfee结构为：
reportdate, cidyid, cellid, fee。

读取表数据

import pandas as pd
from sqlalchemy import create_engine
# default
engine = create_engine('mysql+pymysql://ledao:ledao123@localhost/pandas_learn')
original_data = pd.read_sql_table('cellfee', engine)
original_data

groupby分组汇总指定类别的所有数据

for k, v in original_data.groupby([original_data['cityid'], original_data['cellid']]):
    print('key: {}, type is {}'.format(k, type(k)))
    print('vale:\n {}, \ntype is {}'.format(v, type(v)))

上述的代码的结果为：
key: ('1', '1'), type is <class 'tuple'>
vale:
reportdate cityid cellid fee
0 2017-07-20 1 1 10.0
1 2017-07-21 1 1 10.0
2 2017-07-22 1 1 10.0
3 2017-07-23 1 1 10.0,
type is <class 'pandas.core.frame.DataFrame'>
通过一个简单的groupby函数，我们就能将数据库中以列存储的数据根据分组要求全部汇总到一个形成一个DataFrame。后续对分组形成的DataFrame可以做形成特征向量，排序，再继续汇总等常见的数据分析的操作。
在groupby的操作上，我只服气scala、kotlin的模式，即groupby加map（flatMap），希望pandas以后也会支持这种函数式的操作。