基本功能

作者: 庵下桃花仙 | 来源:发表于2019-02-03 12:16 被阅读3次

    重建索引(改变索引顺序)

    重要方法,创建一个符合新索引的新对象。

    In [1]: import pandas as pd
    
    In [2]: obj = pd.Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])
    
    In [3]: obj
    Out[3]:
    d    4.5
    b    7.2
    a   -5.3
    c    3.6
    dtype: float64
    
    In [4]: obj2 = obj.reindex(['a', 'b', 'c', 'd', 'e'])
    
    In [5]: obj2
    Out[5]:
    a   -5.3
    b    7.2
    c    3.6
    d    4.5
    e    NaN
    dtype: float64
    

    method可选参数允许使用ffill 方法将值前向填充。

    In [6]: obj3 = pd.Series(['blue', 'purple', 'yellow'], index=[0, 2, 4])
    
    In [7]: obj3
    Out[7]:
    0      blue
    2    purple
    4    yellow
    dtype: object
    
    In [8]: obj3.reindex(range(6), method='ffill')
    Out[8]:
    0      blue
    1      blue
    2    purple
    3    purple
    4    yellow
    5    yellow
    dtype: object
    

    在 DataFrame 中,reindex 可以改变行索引,列索引,也可以同时改变两者。只传入一个序列时,默认改变行索引。

    In [9]: import numpy as np
    In [16]: frame = pd.DataFrame(np.arange(9).reshape((3, 3)), index=['a', 'b', 'c'], columns=['Ohio', 'Texas', 'Californi
        ...: a'])
    
    In [17]: frame
    Out[17]:
       Ohio  Texas  California
    a     0      1           2
    b     3      4           5
    c     6      7           8
    
    In [18]: frame2 = frame.reindex(['a', 'b', 'c', 'd'])
    
    In [20]: frame2
    Out[20]:
       Ohio  Texas  California
    a   0.0    1.0         2.0
    b   3.0    4.0         5.0
    c   6.0    7.0         8.0
    d   NaN    NaN         NaN
    

    使用 columns 关键字重建索引

    In [21]: states = ['Texas', 'Utah', 'California']
    
    In [22]: frame.reindex(columns=states)
    Out[22]:
       Texas  Utah  California
    a      1   NaN           2
    b      4   NaN           5
    c      7   NaN           8
    

    更多人使用 loc 进行更简洁的索引

    In [23]: frame.loc[['a', 'b', 'c', 'd'], states]
    c:\users\a\appdata\local\programs\python\python36\lib\site-packages\pandas\core\indexing.py:1494: FutureWarning:
    Passing list-likes to .loc or [] with any missing label will raise
    KeyError in the future, you can use .reindex() as an alternative.
    
    See the documentation here:
    https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
      return self._getitem_tuple(key)
    Out[23]:
       Texas  Utah  California
    a    1.0   NaN         2.0
    b    4.0   NaN         5.0
    c    7.0   NaN         8.0
    d    NaN   NaN         NaN
    

    轴向上删除条目

    如果已经拥有索引数组,drop 方法会返回一个含有指示值或轴向上删除值的新对象。

    In [24]: obj = pd.Series(np.arange(5), index=['a', 'b', 'c', 'd', 'e'])
    
    In [25]: obj
    Out[25]:
    a    0
    b    1
    c    2
    d    3
    e    4
    dtype: int32
    
    IIn [26]: new_obj = obj.drop('c')
    
    In [27]: new_obj
    Out[27]:
    a    0
    b    1
    d    3
    e    4
    dtype: int32
    
    In [28]: obj
    Out[28]:
    a    0
    b    1
    c    2
    d    3
    e    4
    dtype: int32
    
    In [29]: obj.drop(['d', 'c'])
    Out[29]:
    a    0
    b    1
    e    4
    dtype: int32
    

    在 DataFrame 中,索引值可以从轴向上删除

    删除行

    In [30]: data = pd.DataFrame(np.arange(16).reshape((4, 4)),
        ...:                     index=['Ohio', 'Colorado', 'Utah', 'New York'],
        ...:                     columns=['one', 'two', 'three', 'four'])
    
    In [31]: data
    Out[31]:
              one  two  three  four
    Ohio        0    1      2     3
    Colorado    4    5      6     7
    Utah        8    9     10    11
    New York   12   13     14    15
    
    In [32]: data.drop(['Colorado', 'Ohio'])
    Out[32]:
              one  two  three  four
    Utah        8    9     10    11
    New York   12   13     14    15
    

    删除列

    In [33]: data.drop('two', axis=1)
    Out[33]:
              one  three  four
    Ohio        0      2     3
    Colorado    4      6     7
    Utah        8     10    11
    New York   12     14    15
    
    In [34]: data.drop(['two', 'four'], axis='columns')
    Out[34]:
              one  three
    Ohio        0      2
    Colorado    4      6
    Utah        8     10
    New York   12     14
    

    drop 会修改 Series 或 DataFrame 的尺寸或形状,直接操作原对象而不返回新对象。

    In [35]: obj.drop('c')
    Out[35]:
    a    0
    b    1
    d    3
    e    4
    dtype: int32
    
    In [36]: obj
    Out[36]:
    a    0
    b    1
    c    2
    d    3
    e    4
    dtype: int32
    
    In [37]: obj.drop('c', inplace=True)
    
    In [38]: obj
    Out[38]:
    a    0
    b    1
    d    3
    e    4
    dtype: int32
    

    相关文章

      网友评论

        本文标题:基本功能

        本文链接:https://www.haomeiwen.com/subject/ndsysqtx.html