美文网首页大数据 爬虫Python AI Sqlpython入门基础学习
Python数据处理:比Excel更方便的,pandas数据透视

Python数据处理:比Excel更方便的,pandas数据透视

作者: 编程新视野 | 来源:发表于2019-01-28 14:48 被阅读2次

数据透视表(pivot table)

数据透视表与GroupBy抽象类,操作方法类似,常见于 Excel 表格应用中。数据透视表,将每一列数据作为输入,将数据不断细分成,多个维度累计信息的二维数据表。两者之间的区别:是数据透视表更像是一种多维的 GroupBy累计操作。也就是说,分割与组合不是发生在一维索引上,而是在二维网格上(行列同时分组)。

一、DataFrame 的 pivot_table 方法:

<pre class="ql-align-justify" style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">DataFrame.pivot_table(data, values=None, index=None, columns=None,

aggfunc='mean', fill_value=None, margins=False,

dropna=True, margins_name='All')

</pre>

二、导入数据:

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">import numpy as np

import pandas as pd

import seaborn as sns

titanic= sns.load_dataset('titanic')

titanic.head()

</pre>

​​代码结果:

三、数据透视表语法:

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">titanic.pivot_table('survived', index='sex', columns='class')

</pre>

​​代码结果:​

1.多级数据透视表:

与GroupBy类似,数据透视表中的分组,也可以通过各种参数,指定多个等级。

列如把年龄('age')也加进去作为第三个维度,这就可以通过 pd.cut 函数将年龄进行分段。

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">age= pd.cut(titanic['age'], [0, 18, 80])

titanic.pivot_table('survived', ['sex', age], 'class')

</pre>

​​代码结果:

对列也可以使用同样的操作,用pd.qcut将船票价格按照计数项等分为两份。

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">fare= pd.qcut(titanic['fare'], 2)

titanic.pivot_table('survived', ['sex', age], [fare, 'class'])

</pre>

​代码结果:

2.其他数据透视表选项:

1.aggfunc 参数

用于设置累计函数类型,默认值是均值(mean)。常见的字符串'sum'、'mean'、'count'、'min'、'max' 等,也可以用标准的累计函数np.sum()、min()、sum() 等。

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">titanic.pivot_table('survived',index='sex', columns='class',

aggfunc='count')

</pre>

​代码结果:

2.字典

用字典为不同的列,指定不同的累计函数。

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">titanic.pivot_table(index='sex', columns='class',

aggfunc={'survived':sum, 'fare':'mean'})

</pre>

​代码结果:

3.margins参数

计算每一组的总数时,可以通过margins参数来设置。

<pre style="-webkit-tap-highlight-color: transparent; box-sizing: border-box; font-family: Consolas, Menlo, Courier, monospace; font-size: 16px; white-space: pre-wrap; position: relative; line-height: 1.5; color: rgb(153, 153, 153); margin: 1em 0px; padding: 12px 10px; background: rgb(244, 245, 246); border: 1px solid rgb(232, 232, 232); font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-color: initial;">titanic.pivot_table('survived', index='sex', columns='class', margins=True)

</pre>

​代码结果:

好啦,本文就分享到这里,跟多python学习可以关注我们哦

注:python学习关注我们企鹅qun: 696541369 各类入门学习资料免费分享哦!

相关文章

网友评论

    本文标题:Python数据处理:比Excel更方便的,pandas数据透视

    本文链接:https://www.haomeiwen.com/subject/fruajqtx.html