美文网首页数据蛙第七期就业班
pandas describe 函数的参数理解及应用

pandas describe 函数的参数理解及应用

作者: 精灵鼠小弟_fb22 | 来源:发表于2020-05-03 17:24 被阅读0次

    percentile:它是一个可选参数, 它是一个列表, 如数字的数据类型, 应在0到1之间。其默认值为[.25, .5, .75], 它返回第25、50和75个百分位数。

    include:它也是一个可选参数, 在描述DataFrame时包括数据类型列表。其默认值为无。

    exclude:它也是一个可选参数, 在描述DataFrame时不包括数据类型列表。其默认值为无。

    用法:DataFrame.describe(percentiles=None, include=None, exclude=None)
    
    info = pd.DataFrame({'categorical': pd.Categorical(['s', 't', 'u']),
                       'numeric': [1, 2, 3], 'object': ['p', 'q', 'r']})
    
    print(info.describe(),'\n')
              numeric
    count      3.0
    mean       2.0
    std        1.0
    min        1.0
    25%        1.5
    50%        2.0
    75%        2.5
    max        3.0
    
    print(info.describe(include='all'),'\n')       
               categorical  numeric object
    count            3      3.0      3
    unique           3      NaN      3
    top              u      NaN      p
    freq             1      NaN      1
    mean           NaN      2.0    NaN
    std            NaN      1.0    NaN
    min            NaN      1.0    NaN
    25%            NaN      1.5    NaN
    50%            NaN      2.0    NaN
    75%            NaN      2.5    NaN
    max            NaN      3.0    NaN
    
    print(info.numeric.describe(),'\n')
    count    3.0
    mean     2.0
    std      1.0
    min      1.0
    25%      1.5
    50%      2.0
    75%      2.5
    max      3.0
    Name: numeric, dtype: float64
    
    print(info.describe(include=[np.number]),'\n')       
              numeric
    count      3.0
    mean       2.0
    std        1.0
    min        1.0
    25%        1.5
    50%        2.0
    75%        2.5
    max        3.0
    
    print(info.describe(include=[np.object]),'\n')      
             object
    count       3
    unique      3
    top         p
    freq        1
    
    print(info.describe(include=['category']),'\n')       
                categorical
    count            3
    unique           3
    top              u
    freq             1
    
    print(info.describe(exclude=[np.number]),'\n')       
               categorical object
    count            3      3
    unique           3      3
    top              u      p
    freq             1      1
    
    print(info.describe(exclude=[np.object]),'\n')       
               categorical  numeric
    count            3      3.0
    unique           3      NaN
    top              u      NaN
    freq             1      NaN
    mean           NaN      2.0
    std            NaN      1.0
    min            NaN      1.0
    25%            NaN      1.5
    50%            NaN      2.0
    75%            NaN      2.5
    max            NaN      3.0
    
    pandas.loc函数理解及用法
    
    >>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
    ...      index=['cobra', 'viper', 'sidewinder'],
    ...      columns=['max_speed', 'shield'])
    >>> df
                max_speed  shield
    cobra               1       2
    viper               4       5
    sidewinder          7       8
    
    
    Single label. Note this returns the row as a Series.
    取出某列
    
    >>> df.loc['viper']
    max_speed    4
    shield       5
    Name: viper, dtype: int64
    
    
    List of labels. Note using ``[[]]`` returns a DataFrame.
    用双[[ ]]取出数据框
    
    
    >>> df.loc[['viper', 'sidewinder']]
                max_speed  shield
    viper               4       5
    sidewinder          7       8
    
    
    Single label for row and column
    用行/列标签取某个元素
    
    
    >>> df.loc['cobra', 'shield']
    2
    
    
    Slice with labels for row and single label for column. As mentioned
    above, note that both the start and stop of the slice are included
    多行标签,单列,注意是一个闭区间
    
    >>> df.loc['cobra':'viper', 'max_speed']
    cobra    1
    viper    4
    Name: max_speed, dtype: int64
    
    
    Boolean list with the same length as the row axis
    用跟行数相等长度的布尔值,来表示该行是否要取用
    
    >>> df.loc[[False, False, True]]
                max_speed  shield
    sidewinder          7       8
    
    
    Conditional that returns a boolean Series
    设定条件的返回
    
    >>> df.loc[df['shield'] > 6]
                max_speed  shield
    sidewinder          7       8
    
    
    Conditional that returns a boolean Series with column labels specified
    
    
    >>> df.loc[df['shield'] > 6, ['max_speed']]
                max_speed
    sidewinder          7
    
    
    Callable that returns a boolean Series
    用可调用的方法返回的布尔序列来取用数据
    
    >>> df.loc[lambda df: df['shield'] == 8]
                max_speed  shield
    sidewinder          7       8
    
    
    **Setting values**
    
    
    Set value for all items matching the list of labels
    对能匹配标签的的项设定值
    
    >>> df.loc[['viper', 'sidewinder'], ['shield']] = 50
    >>> df
                max_speed  shield
    cobra               1       2
    viper               4      50
    sidewinder          7      50
    
    
    Set value for an entire row
    对整行设值
    
    >>> df.loc['cobra'] = 10
    >>> df
                max_speed  shield
    cobra              10      10
    viper               4      50
    sidewinder          7      50
    
    
    Set value for an entire column
    对全列设值,注意要在逗号后,因为逗号前表示要设定的行的范围
    
    >>> df.loc[:, 'max_speed'] = 30
    >>> df
                max_speed  shield
    cobra              30      10
    viper              30      50
    sidewinder         30      50
    
    
    Set value for rows matching callable condition
    对满足返回值的条件的行设定值
    
    >>> df.loc[df['shield'] > 35] = 0
    >>> df
                max_speed  shield
    cobra              30      10
    viper               0       0
    sidewinder          0       0
    
    
    **Getting values on a DataFrame with an index that has integer labels**
    
    
    Another example using integers for the index
    数字索引
    
    >>> df = pd.DataFrame([[1, 2], [4, 5], [7, 8]],
    ...      index=[7, 8, 9], columns=['max_speed', 'shield'])
    >>> df
       max_speed  shield
    7          1       2
    8          4       5
    9          7       8
    
    
    Slice with integer labels for rows. As mentioned above, note that both
    the start and stop of the slice are included.
    
    
    >>> df.loc[7:9]
       max_speed  shield
    7          1       2
    8          4       5
    9          7       8
    
    
    **Getting values with a MultiIndex**
    用多项索引获值
    
    A number of examples using a DataFrame with a MultiIndex
    
    
    >>> tuples = [
    ...    ('cobra', 'mark i'), ('cobra', 'mark ii'),
    ...    ('sidewinder', 'mark i'), ('sidewinder', 'mark ii'),
    ...    ('viper', 'mark ii'), ('viper', 'mark iii')
    ... ]
    >>> index = pd.MultiIndex.from_tuples(tuples)
    >>> values = [[12, 2], [0, 4], [10, 20],
    ...         [1, 4], [7, 1], [16, 36]]
    >>> df = pd.DataFrame(values, columns=['max_speed', 'shield'], index=index)
    >>> df
                         max_speed  shield
    cobra      mark i           12       2
               mark ii           0       4
    sidewinder mark i           10      20
               mark ii           1       4
    viper      mark ii           7       1
               mark iii         16      36
    
    
    Single label. Note this returns a DataFrame with a single index.
    
    
    >>> df.loc['cobra']
             max_speed  shield
    mark i          12       2
    mark ii          0       4
    
    
    Single index tuple. Note this returns a Series.
    元组索引,返回序列
    
    >>> df.loc[('cobra', 'mark ii')]
    max_speed    0
    shield       4
    Name: (cobra, mark ii), dtype: int64
    
    
    Single label for row and column. Similar to passing in a tuple, this
    returns a Series.
    单个索引,返回序列
    
    >>> df.loc['cobra', 'mark i']
    max_speed    12
    shield        2
    Name: (cobra, mark i), dtype: int64
    
    
    Single tuple. Note using ``[[]]`` returns a DataFrame.
    返回数据框
    
    >>> df.loc[[('cobra', 'mark ii')]]
                   max_speed  shield
    cobra mark ii          0       4
    
    
    Single tuple for the index with a single label for the column
    一个元组索引和一个标签,返回某个元素值
    
    >>> df.loc[('cobra', 'mark i'), 'shield']
    2
    
    
    Slice from index tuple to single label
    索引切片,返回数据框
    
    >>> df.loc[('cobra', 'mark i'):'viper']
                         max_speed  shield
    cobra      mark i           12       2
               mark ii           0       4
    sidewinder mark i           10      20
               mark ii           1       4
    viper      mark ii           7       1
               mark iii         16      36
    
    
    Slice from index tuple to index tuple
    元组索引:元素索引的切片,返回值同上一个
    
    >>> df.loc[('cobra', 'mark i'):('viper', 'mark ii')]
                        max_speed  shield
    cobra      mark i          12       2
               mark ii          0       4
    sidewinder mark i          10      20
               mark ii          1       4
    viper      mark ii          7       1
    

    数据及解析源自官方文档

    相关文章

      网友评论

        本文标题:pandas describe 函数的参数理解及应用

        本文链接:https://www.haomeiwen.com/subject/vjxjghtx.html