美文网首页
pandas Data Structures - DataFra

pandas Data Structures - DataFra

作者: 闫_锋 | 来源:发表于2018-12-04 11:23 被阅读4次

    A DataFrame represents a rectangular table of data and contains an ordered collec‐
    tion of columns, each of which can be a different value type (numeric, string,
    boolean, etc.).

    It can be thought of
    as a dict of Series all sharing the same index.

    data = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada', 'Nevada'],
    'year': [2000, 2001, 2002, 2001, 2002, 2003],
    'pop': [1.5, 1.7, 3.6, 2.4, 2.9, 3.2]}
    frame = pd.DataFrame(data)
    
    In [45]: frame
    Out[45]:
    pop state year
    0 1.5 Ohio 2000
    1 1.7 Ohio 2001
    2 3.6 Ohio 2002
    3 2.4 Nevada 2001
    4 2.9 Nevada 2002
    5 3.2 Nevada 2003
    
    In [47]: pd.DataFrame(data, columns=['year', 'state', 'pop'])
    Out[47]:
    year state pop
    0 2000 Ohio 1.5
    1 2001 Ohio 1.7
    2 2002 Ohio 3.6
    3 2001 Nevada 2.4
    4 2002 Nevada 2.9
    5 2003 Nevada 3.2
    
    In [51]: frame2['state']
    Out[51]:
    one Ohio
    two Ohio
    three Ohio
    four Nevada
    five Nevada
    six Nevada
    Name: state, dtype: object
    
    In [52]: frame2.year
    Out[52]:
    one 2000
    two 2001
    three 2002
    four 2001
    five 2002
    six 2003
    Name: year, dtype: int64
    

    Note that the returned Series have the same index as the DataFrame, and their name
    attribute has been appropriately set.

    Rows can also be retrieved by position or name with the special loc attribute.

    In [53]: frame2.loc['three']
    Out[53]:
    year 2002
    state Ohio
    pop 3.6
    debt NaN
    Name: three, dtype: object
    
    In [54]: frame2['debt'] = 16.5
    In [55]: frame2
    Out[55]:
    year state pop debt
    one 2000 Ohio 1.5 16.5
    two 2001 Ohio 1.7 16.5
    three 2002 Ohio 3.6 16.5
    four 2001 Nevada 2.4 16.5
    five 2002 Nevada 2.9 16.5
    six 2003 Nevada 3.2 16.5
    In [56]: frame2['debt'] = np.arange(6.)
    In [57]: frame2
    Out[57]:
    year state pop debt
    one 2000 Ohio 1.5 0.0
    two 2001 Ohio 1.7 1.0
    three 2002 Ohio 3.6 2.0
    four 2001 Nevada 2.4 3.0
    five 2002 Nevada 2.9 4.0
    six 2003 Nevada 3.2 5.0
    
    In [58]: val = pd.Series([-1.2, -1.5, -1.7], index=['two', 'four', 'five'])
    In [59]: frame2['debt'] = val
    In [60]: frame2
    Out[60]:
    year state pop debt
    one 2000 Ohio 1.5 NaN
    two 2001 Ohio 1.7 -1.2
    three 2002 Ohio 3.6 NaN
    four 2001 Nevada 2.4 -1.5
    five 2002 Nevada 2.9 -1.7
    six 2003 Nevada 3.2 NaN
    

    Another common form of data is a nested dict of dicts:

    In [65]: pop = {'Nevada': {2001: 2.4, 2002: 2.9},
    ....: 'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}
    

    If the nested dict is passed to the DataFrame, pandas will interpret the outer dict keys
    as the columns and the inner keys as the row indices:

    In [66]: frame3 = pd.DataFrame(pop)
    In [67]: frame3
    Out[67]:
    Nevada Ohio
    2000 NaN 1.5
    2001 2.4 1.7
    2002 2.9 3.6
    
    In [80]: labels = pd.Index(np.arange(3))
    In [81]: labels
    Out[81]: Int64Index([0, 1, 2], dtype='int64')
    In [82]: obj2 = pd.Series([1.5, -2.5, 0], index=labels)
    In [83]: obj2
    Out[83]:
    0 1.5
    1 -2.5
    2 0.0
    dtype: float64
    In [84]: obj2.index is labels
    Out[84]: True
    

    相关文章

      网友评论

          本文标题:pandas Data Structures - DataFra

          本文链接:https://www.haomeiwen.com/subject/esqmcqtx.html