美文网首页
Pandas笔记(一)

Pandas笔记(一)

作者: xpz_39f8 | 来源:发表于2019-03-07 22:44 被阅读0次

    使用Pandas 首先要import 相关模块

    In [1]: import numpy as np
    In [2]: import pandas as pd
    

    对象创建

    Series 是一维的标签化的数组,能够存储任何数据类型(integers, strings, floating point numbers, Python objects, etc.)。 轴标签( is a one-dimensional labeled array capable of holding any data type (integers, strings, floating point numbers, Python objects, etc.). axis标签统称为索引,创建Series的基本方法

    >>> s = pd.Series(data, index=index)
    

    数据选取

    选取单列会生成一个Series

    In [23]: df['A']
    Out[23]: 
    2013-01-01    0.469112
    2013-01-02    1.212112
    2013-01-03   -0.861849
    2013-01-04    0.721555
    2013-01-05   -0.424972
    2013-01-06   -0.673690
    Freq: D, Name: A, dtype: float64
    

    通过[]切片选取多行

    In [24]: df[0:3]
    Out[24]: 
                       A         B         C         D
    2013-01-01  0.469112 -0.282863 -1.509059 -1.135632
    2013-01-02  1.212112 -0.173215  0.119209 -1.044236
    2013-01-03 -0.861849 -2.104569 -0.494929  1.071804
    
    In [25]: df['20130102':'20130104']
    Out[25]: 
                       A         B         C         D
    2013-01-02  1.212112 -0.173215  0.119209 -1.044236
    2013-01-03 -0.861849 -2.104569 -0.494929  1.071804
    2013-01-04  0.721555 -0.706771 -1.039575  0.271860
    

    通过标签选取

    使用标签获取横截面

    In [26]: df.loc[dates[0]]
    Out[26]: 
    A    0.469112
    B   -0.282863
    C   -1.509059
    D   -1.135632
    Name: 2013-01-01 00:00:00, dtype: float64
    

    通过标签在多轴上选取

    In [27]: df.loc[:, ['A', 'B']]
    Out[27]: 
                       A         B
    2013-01-01  0.469112 -0.282863
    2013-01-02  1.212112 -0.173215
    2013-01-03 -0.861849 -2.104569
    2013-01-04  0.721555 -0.706771
    2013-01-05 -0.424972  0.567020
    2013-01-06 -0.673690  0.113648
    

    显示标签切片,包括两个端点

    In [28]: df.loc['20130102':'20130104', ['A', 'B']]
    Out[28]: 
                       A         B
    2013-01-02  1.212112 -0.173215
    2013-01-03 -0.861849 -2.104569
    2013-01-04  0.721555 -0.706771
    

    缩小返回对象的尺寸

    In [29]: df.loc['20130102', ['A', 'B']]
    Out[29]: 
    A    1.212112
    B   -0.173215
    Name: 2013-01-02 00:00:00, dtype: float64
    

    获取标量

    In [30]: df.loc[dates[0], 'A']
    Out[30]: 0.46911229990718628
    

    为了快速访问标量(等价于之前的方法)

    In [31]: df.at[dates[0], 'A']
    Out[31]: 0.46911229990718628
    

    通过位置选取

    通过传递的整数位置选择

    In [32]: df.iloc[3]
    Out[32]: 
    A    0.721555
    B   -0.706771
    C   -1.039575
    D    0.271860
    Name: 2013-01-04 00:00:00, dtype: float64
    

    通过整数切片,作用类似于numpy/python

    In [33]: df.iloc[3:5, 0:2]
    Out[33]: 
                       A         B
    2013-01-04  0.721555 -0.706771
    2013-01-05 -0.424972  0.567020
    

    通过整数位置位置列表,类似于numpy/python风格:

    In [34]: df.iloc[[1, 2, 4], [0, 2]]
    Out[34]: 
                       A         C
    2013-01-02  1.212112  0.119209
    2013-01-03 -0.861849 -0.494929
    2013-01-05 -0.424972  0.276232
    

    准确地切分行

    In [35]: df.iloc[1:3, :]
    Out[35]: 
                       A         B         C         D
    2013-01-02  1.212112 -0.173215  0.119209 -1.044236
    2013-01-03 -0.861849 -2.104569 -0.494929  1.071804
    

    准确地切分列

    In [36]: df.iloc[:, 1:3]
    Out[36]: 
                       B         C
    2013-01-01 -0.282863 -1.509059
    2013-01-02 -0.173215  0.119209
    2013-01-03 -2.104569 -0.494929
    2013-01-04 -0.706771 -1.039575
    2013-01-05  0.567020  0.276232
    2013-01-06  0.113648 -1.478427
    

    准确地得到一个数据

    In [37]: df.iloc[1, 1]
    Out[37]: -0.17321464905330858
    

    为了快速访问标量(等价于之前的方法)

    In [38]: df.iat[1, 1]
    Out[38]: -0.17321464905330858
    

    还有布尔索引暂时先不写

    相关文章

      网友评论

          本文标题:Pandas笔记(一)

          本文链接:https://www.haomeiwen.com/subject/saihpqtx.html