Python从零开始第三章数据处理与分析python中的dply

作者: 柳叶刀与小鼠标 | 来源:发表于2019-01-01 03:39 被阅读150次

Python从零开始第三章数据处理与分析python中的dply
Python从零开始第三章数据处理与分析python中的dply
Python从零开始第三章数据处理与分析python.query
Python从零开始第三章数据处理与分析python中的dply
Python从零开始第三章数据处理与分析python中的dply
Python数据处理从零开始----第三章（pandas）④数据
Python数据处理从零开始----第三章（pandas）③数据
Python数据处理从零开始----第三章（pandas）⑤pa
Python数据处理从零开始----第三章（pandas）②处理
Python数据处理从零开始----第三章（pandas）⑥相关

column：要拆分的列。
into：新列的名称。
sep：可以根据字符串或整数位置以拆分列。
remove：指示是否删除原始列。
convert：指示是否应将新列转换为适当的类型（与spreadabove相同）。
extra：指示对多余列的处理。可以选择丢弃，或者合并给最后一列。
fill：可以是'right，要么在最右边的列中填充'np.nan值来填充缺失的部分，也可以在left中填充np.nan值在最左边的列中填充。

print(d)

         a
0    1-a-3
1      1-b
2  1-c-3-4
3    9-d-1
4       10

d >> separate(X.a, ['col1', 'col2'], remove=True, convert=True,
              extra='drop', fill='right')

   col1 col2
0     1    a
1     1    b
2     1    c
3     9    d
4    10  NaN

d >> separate(X.a, ['col1', 'col2'], remove=True, convert=True,
              extra='drop', fill='left')

   col1 col2
0   1.0    a
1   1.0    b
2   1.0    c
3   9.0    d
4   NaN   10

d >> separate(X.a, ['col1', 'col2'], remove=False, convert=True,
              extra='merge', fill='right')

         a  col1   col2
0    1-a-3     1    a-3
1      1-b     1      b
2  1-c-3-4     1  c-3-4
3    9-d-1     9    d-1
4       10    10    NaN

d >> separate(X.a, ['col1', 'col2', 'col3'], sep=[2,4], remove=True, convert=True,
              extra='merge', fill='right')

  col1 col2 col3
0   1-   a-    3
1   1-    b  NaN
2   1-   c-  3-4
3   9-   d-    1
4   10  NaN  NaN

unite()函数

unite（colname，* args，sep ='_'，remove = True，na_action ='maintain'）函数与separate（）相反，通过分隔符将列连接在一起。任何非字符串的列都将转换为字符串。 unite（）的参数是：

*colname：新连接列的名称。
** args：要连接的列的列表，可以是字符串，符号或列的整数位置。
*sep：用于连接列的字符串分隔符。
*remove：指示是否删除用于合并的原始列。
*na_action：可以是maintain（默认值），ignore或”as_string之一。默认的maintain 将使新列行成为“NaN”值如果该行中的任何原始列单元格包含“NaN”。 ignore会在加入时将任何NaN值视为空字符串。 as_string将在加入之前将任何NaN值转换为字符串“nan“。

print(d)

a  b      c
0  1  a   True
1  2  b  False
2  3  c    NaN

d >> unite('united', X.a, 'b', 2, remove=False, na_action='maintain')

   a  b      c     united
0  1  a   True   1_a_True
1  2  b  False  2_b_False
2  3  c    NaN        NaN

d >> unite('united', ['a','b','c'], remove=True, na_action='ignore', sep='*')

      united
0   1*a*True
1  2*b*False
2        3*c

d >> unite('united', d.columns, remove=True, na_action='as_string')

      united
0   1_a_True
1  2_b_False
2    3_c_nan

`Joining`函数

1.internal_join（其他，by ='column'）
*outer_join（其他，by ='column'）（与full_join（）的作用相同）
*right_join（其他，by ='column'）
*left_join（其他，by ='column'）
*semi_join（其他，by ='column'）
*anti_join（其他，by ='column'）

这些函数基本与R语言中类似。直接看例子就好

下面的示例DataFrame概述了连接函数的功能。

a = pd.DataFrame({
        'x1':['A','B','C'],
        'x2':[1,2,3]
    })
b = pd.DataFrame({
    'x1':['A','B','D'],
    'x3':[True,False,True]
})

`inner_join()`

inner_join() joins on values present in both DataFrames' by columns.

a >> inner_join(b, by='x1')

  x1  x2     x3
0  A   1   True
1  B   2  False

`outer_join()` or `full_join()`

outer_join merges DataFrame's together on values present in either frame's by columns.

a >> outer_join(b, by='x1')

  x1   x2     x3
0  A  1.0   True
1  B  2.0  False
2  C  3.0    NaN
3  D  NaN   True

`left_join()`

left_join merges on the values present in the left DataFrame's by columns.

a >> left_join(b, by='x1')

  x1  x2     x3
0  A   1   True
1  B   2  False
2  C   3    NaN

`right_join()`

right_join merges on the values present in the right DataFrame's by columns.

a >> right_join(b, by='x1')

  x1   x2     x3
0  A  1.0   True
1  B  2.0  False
2  D  NaN   True

`semi_join()`

semi_join() returns all of the rows in the left DataFrame that have a match in the right DataFrame in the by columns.

a >> semi_join(b, by='x1')

  x1  x2
0  A   1
1  B   2

`anti_join()`

anti_join() returns all of the rows in the left DataFrame that do not have a match in the right DataFrame within the bycolumns.

a >> anti_join(b, by='x1')

  x1  x2
2  C   3

Binding函数

dfply 同样有类似于pandas.concat() 这样在行和列上用于合并数据框的函数。 bind_rows(other, join='outer', ignore_index=False) 功能和 pandas.concat([df, other], join=join, ignore_index=ignore_index, axis=0)一致，在竖直方向合并数据框.

a >> bind_rows(b, join='inner')

x1
0  A
1  B
2  C
0  A
1  B
2  D

a >> bind_rows(b, join='outer')

  x1   x2     x3
0  A  1.0    NaN
1  B  2.0    NaN
2  C  3.0    NaN
0  A  NaN   True
1  B  NaN  False
2  D  NaN   True

请注意两个数据框的index。

bind_cols(other, join='outer', ignore_index=False) 类似于 pandas.concat([df, other], join=join, ignore_index=ignore_index, axis=1),在水平方向合并数据框.

a >> bind_cols(b)

  x1  x2 x1     x3
0  A   1  A   True
1  B   2  B  False
2  C   3  D   True

网友评论

本文标题：Python从零开始第三章数据处理与分析python中的dply

本文链接：https://www.haomeiwen.com/subject/qcwglqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Python从零开始第三章数据处理与分析python中的dply

目录

第二章（pandas）

Python从零开始第三章数据处理与分析python中的dplyr（1）

Python从零开始第三章数据处理与分析python中的dplyr（2）

Python从零开始第三章数据处理与分析python中的dplyr（3）

Python从零开始第三章数据处理与分析python中的dplyr（4）

`separate()`