美文网首页
精选23个Pandas函数

精选23个Pandas函数

作者: 皮皮大 | 来源:发表于2022-01-14 10:31 被阅读0次

    公众号:尤而小屋
    作者:Peter
    编辑:Peter

    大家好,我是Peter~

    从26个字母中精选出23个Pandas常用的函数,将它们的使用方法介绍给大家。其中o、y、z没有相应的函数。

    image
    import pandas as pd
    import numpy as np
    

    下面介绍每个函数的使用方法,更多详细的内容请移步官网:https://pandas.pydata.org/docs/reference/general_functions.html

    assign函数

    df = pd.DataFrame({
        'temp_c': [17.0, 25.0]},
        index=['Portland', 'Berkeley'])
    df
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>temp_c</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>Portland</th>
    <td>17.0</td>
    </tr>
    <tr>
    <th>Berkeley</th>
    <td>25.0</td>
    </tr>
    </tbody>
    </table>

    </div>

    # 生成新的字段
    
    df.assign(temp_f=df['temp_c'] * 9 / 5 + 32)
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>temp_c</th>
    <th>temp_f</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>Portland</th>
    <td>17.0</td>
    <td>62.6</td>
    </tr>
    <tr>
    <th>Berkeley</th>
    <td>25.0</td>
    <td>77.0</td>
    </tr>
    </tbody>
    </table>

    </div>

    df  # 原来DataFrame是不改变的
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>temp_c</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>Portland</th>
    <td>17.0</td>
    </tr>
    <tr>
    <th>Berkeley</th>
    <td>25.0</td>
    </tr>
    </tbody>
    </table>

    </div>

    df["temp_f1"] = df["temp_c"] * 9 / 5 + 32
    df
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>temp_c</th>
    <th>temp_f1</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>Portland</th>
    <td>17.0</td>
    <td>62.6</td>
    </tr>
    <tr>
    <th>Berkeley</th>
    <td>25.0</td>
    <td>77.0</td>
    </tr>
    </tbody>
    </table>

    </div>

    df
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>temp_c</th>
    <th>temp_f1</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>Portland</th>
    <td>17.0</td>
    <td>62.6</td>
    </tr>
    <tr>
    <th>Berkeley</th>
    <td>25.0</td>
    <td>77.0</td>
    </tr>
    </tbody>
    </table>

    </div>

    bool函数

    返回单个Series或者DataFrame中单个元素的bool值:True或者False

    pd.Series([True]).bool()
    
    True
    
    pd.Series([False]).bool()
    
    False
    
    pd.DataFrame({'col': [True]}).bool()
    
    True
    
    pd.DataFrame({'col': [False]}).bool()
    
    False
    
    # # 多个元素引发报错
    
    # pd.DataFrame({'col': [True,False]}).bool()
    
    image

    concat函数

    该函数是用来表示多个DataFrame的拼接,横向或者纵向皆可。

    df1 = pd.DataFrame({
        "sid":["s1","s2"],
        "name":["xiaoming","Mike"]})
    df1
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>sid</th>
    <th>name</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>s1</td>
    <td>xiaoming</td>
    </tr>
    <tr>
    <th>1</th>
    <td>s2</td>
    <td>Mike</td>
    </tr>
    </tbody>
    </table>

    </div>

    df2 = pd.DataFrame({
        "sid":["s3","s4"],
        "name":["Tom","Peter"]})
    df2
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>sid</th>
    <th>name</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>s3</td>
    <td>Tom</td>
    </tr>
    <tr>
    <th>1</th>
    <td>s4</td>
    <td>Peter</td>
    </tr>
    </tbody>
    </table>

    </div>

    df3 = pd.DataFrame({
        "address":["北京","深圳"],             
        "sex":["Male","Female"]})
    df3
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>address</th>
    <th>sex</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>北京</td>
    <td>Male</td>
    </tr>
    <tr>
    <th>1</th>
    <td>深圳</td>
    <td>Female</td>
    </tr>
    </tbody>
    </table>

    </div>

    # 使用1:纵向
    pd.concat([df1,df2])
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>sid</th>
    <th>name</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>s1</td>
    <td>xiaoming</td>
    </tr>
    <tr>
    <th>1</th>
    <td>s2</td>
    <td>Mike</td>
    </tr>
    <tr>
    <th>0</th>
    <td>s3</td>
    <td>Tom</td>
    </tr>
    <tr>
    <th>1</th>
    <td>s4</td>
    <td>Peter</td>
    </tr>
    </tbody>
    </table>

    </div>

    # 使用2:横向
    pd.concat([df1,df3],axis=1)
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>sid</th>
    <th>name</th>
    <th>address</th>
    <th>sex</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>s1</td>
    <td>xiaoming</td>
    <td>北京</td>
    <td>Male</td>
    </tr>
    <tr>
    <th>1</th>
    <td>s2</td>
    <td>Mike</td>
    <td>深圳</td>
    <td>Female</td>
    </tr>
    </tbody>
    </table>

    </div>

    dropna函数

    删除空值

    df4 = pd.DataFrame({
        "sid":["s1","s2", np.nan],             
        "name":["xiaoming",np.nan, "Mike"]})
    df4
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>sid</th>
    <th>name</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>s1</td>
    <td>xiaoming</td>
    </tr>
    <tr>
    <th>1</th>
    <td>s2</td>
    <td>NaN</td>
    </tr>
    <tr>
    <th>2</th>
    <td>NaN</td>
    <td>Mike</td>
    </tr>
    </tbody>
    </table>

    </div>

    df4.dropna()
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>sid</th>
    <th>name</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>s1</td>
    <td>xiaoming</td>
    </tr>
    </tbody>
    </table>

    </div>

    df4.dropna(subset=["name"])
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>sid</th>
    <th>name</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>s1</td>
    <td>xiaoming</td>
    </tr>
    <tr>
    <th>2</th>
    <td>NaN</td>
    <td>Mike</td>
    </tr>
    </tbody>
    </table>

    </div>

    explode函数

    爆炸函数的使用:将宽表转成长表

    df5 = pd.DataFrame({
        "sid":["s1","s2"],       
        "phones":[["华为","小米","一加"],["三星","苹果"]]
                       })
    df5
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>sid</th>
    <th>phones</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>s1</td>
    <td>[华为, 小米, 一加]</td>
    </tr>
    <tr>
    <th>1</th>
    <td>s2</td>
    <td>[三星, 苹果]</td>
    </tr>
    </tbody>
    </table>

    </div>

    df5.explode("phones")
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>sid</th>
    <th>phones</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>s1</td>
    <td>华为</td>
    </tr>
    <tr>
    <th>0</th>
    <td>s1</td>
    <td>小米</td>
    </tr>
    <tr>
    <th>0</th>
    <td>s1</td>
    <td>一加</td>
    </tr>
    <tr>
    <th>1</th>
    <td>s2</td>
    <td>三星</td>
    </tr>
    <tr>
    <th>1</th>
    <td>s2</td>
    <td>苹果</td>
    </tr>
    </tbody>
    </table>

    </div>

    df5
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>sid</th>
    <th>phones</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>s1</td>
    <td>[华为, 小米, 一加]</td>
    </tr>
    <tr>
    <th>1</th>
    <td>s2</td>
    <td>[三星, 苹果]</td>
    </tr>
    </tbody>
    </table>

    </div>

    fillna函数

    填充缺失值

    df4
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>sid</th>
    <th>name</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>s1</td>
    <td>xiaoming</td>
    </tr>
    <tr>
    <th>1</th>
    <td>s2</td>
    <td>NaN</td>
    </tr>
    <tr>
    <th>2</th>
    <td>NaN</td>
    <td>Mike</td>
    </tr>
    </tbody>
    </table>

    </div>

    df4.fillna({"sid":"s3","name":"Peter"})
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>sid</th>
    <th>name</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>s1</td>
    <td>xiaoming</td>
    </tr>
    <tr>
    <th>1</th>
    <td>s2</td>
    <td>Peter</td>
    </tr>
    <tr>
    <th>2</th>
    <td>s3</td>
    <td>Mike</td>
    </tr>
    </tbody>
    </table>

    </div>

    groupby函数

    同组统计的功能

    # 借用这个结果
    df6 = df5.explode("phones")
    df6
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>sid</th>
    <th>phones</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>s1</td>
    <td>华为</td>
    </tr>
    <tr>
    <th>0</th>
    <td>s1</td>
    <td>小米</td>
    </tr>
    <tr>
    <th>0</th>
    <td>s1</td>
    <td>一加</td>
    </tr>
    <tr>
    <th>1</th>
    <td>s2</td>
    <td>三星</td>
    </tr>
    <tr>
    <th>1</th>
    <td>s2</td>
    <td>苹果</td>
    </tr>
    </tbody>
    </table>

    </div>

    df6.groupby("sid")["phones"].count()
    
    sid
    s1    3
    s2    2
    Name: phones, dtype: int64
    

    head函数

    查看前几行的数据,默认是前5行

    df7 = pd.DataFrame({
        "sid":list(range(10)),                
        "name":list(range(80,100,2))})
    df7
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>sid</th>
    <th>name</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>0</td>
    <td>80</td>
    </tr>
    <tr>
    <th>1</th>
    <td>1</td>
    <td>82</td>
    </tr>
    <tr>
    <th>2</th>
    <td>2</td>
    <td>84</td>
    </tr>
    <tr>
    <th>3</th>
    <td>3</td>
    <td>86</td>
    </tr>
    <tr>
    <th>4</th>
    <td>4</td>
    <td>88</td>
    </tr>
    <tr>
    <th>5</th>
    <td>5</td>
    <td>90</td>
    </tr>
    <tr>
    <th>6</th>
    <td>6</td>
    <td>92</td>
    </tr>
    <tr>
    <th>7</th>
    <td>7</td>
    <td>94</td>
    </tr>
    <tr>
    <th>8</th>
    <td>8</td>
    <td>96</td>
    </tr>
    <tr>
    <th>9</th>
    <td>9</td>
    <td>98</td>
    </tr>
    </tbody>
    </table>

    </div>

    df7.head()   # 默认前5行
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>sid</th>
    <th>name</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>0</td>
    <td>80</td>
    </tr>
    <tr>
    <th>1</th>
    <td>1</td>
    <td>82</td>
    </tr>
    <tr>
    <th>2</th>
    <td>2</td>
    <td>84</td>
    </tr>
    <tr>
    <th>3</th>
    <td>3</td>
    <td>86</td>
    </tr>
    <tr>
    <th>4</th>
    <td>4</td>
    <td>88</td>
    </tr>
    </tbody>
    </table>

    </div>

    df7.head(3)  # 指定前3行
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>sid</th>
    <th>name</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>0</td>
    <td>80</td>
    </tr>
    <tr>
    <th>1</th>
    <td>1</td>
    <td>82</td>
    </tr>
    <tr>
    <th>2</th>
    <td>2</td>
    <td>84</td>
    </tr>
    </tbody>
    </table>

    </div>

    isnull函数

    判断是否存在缺失值,超级常用的函数

    df4
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>sid</th>
    <th>name</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>s1</td>
    <td>xiaoming</td>
    </tr>
    <tr>
    <th>1</th>
    <td>s2</td>
    <td>NaN</td>
    </tr>
    <tr>
    <th>2</th>
    <td>NaN</td>
    <td>Mike</td>
    </tr>
    </tbody>
    </table>

    </div>

    df4.isnull()  # True表示缺失
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>sid</th>
    <th>name</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>False</td>
    <td>False</td>
    </tr>
    <tr>
    <th>1</th>
    <td>False</td>
    <td>True</td>
    </tr>
    <tr>
    <th>2</th>
    <td>True</td>
    <td>False</td>
    </tr>
    </tbody>
    </table>

    </div>

    df4.isnull().sum()  # 每个字段缺失的总和
    
    sid     1
    name    1
    dtype: int64
    
    df6.isnull().sum()   # 没有缺失值
    
    sid       0
    phones    0
    dtype: int64
    

    join函数

    用于连接不同的DataFrame:

    df7 = pd.DataFrame({
        'key': ['K0', 'K1', 'K2', 'K3', 'K4', 'K5'],
        'A': ['A0', 'A1', 'A2', 'A3', 'A4', 'A5']})
    df7
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>key</th>
    <th>A</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>K0</td>
    <td>A0</td>
    </tr>
    <tr>
    <th>1</th>
    <td>K1</td>
    <td>A1</td>
    </tr>
    <tr>
    <th>2</th>
    <td>K2</td>
    <td>A2</td>
    </tr>
    <tr>
    <th>3</th>
    <td>K3</td>
    <td>A3</td>
    </tr>
    <tr>
    <th>4</th>
    <td>K4</td>
    <td>A4</td>
    </tr>
    <tr>
    <th>5</th>
    <td>K5</td>
    <td>A5</td>
    </tr>
    </tbody>
    </table>

    </div>

    df8 = pd.DataFrame({
        'key': ['K0', 'K1', 'K2'],
        'B': ['B0', 'B1', 'B2']})
    df8
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>key</th>
    <th>B</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>K0</td>
    <td>B0</td>
    </tr>
    <tr>
    <th>1</th>
    <td>K1</td>
    <td>B1</td>
    </tr>
    <tr>
    <th>2</th>
    <td>K2</td>
    <td>B2</td>
    </tr>
    </tbody>
    </table>

    </div>

    df7.join(df8,lsuffix="_df7",rsuffix="_df8")
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>key_df7</th>
    <th>A</th>
    <th>key_df8</th>
    <th>B</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>K0</td>
    <td>A0</td>
    <td>K0</td>
    <td>B0</td>
    </tr>
    <tr>
    <th>1</th>
    <td>K1</td>
    <td>A1</td>
    <td>K1</td>
    <td>B1</td>
    </tr>
    <tr>
    <th>2</th>
    <td>K2</td>
    <td>A2</td>
    <td>K2</td>
    <td>B2</td>
    </tr>
    <tr>
    <th>3</th>
    <td>K3</td>
    <td>A3</td>
    <td>NaN</td>
    <td>NaN</td>
    </tr>
    <tr>
    <th>4</th>
    <td>K4</td>
    <td>A4</td>
    <td>NaN</td>
    <td>NaN</td>
    </tr>
    <tr>
    <th>5</th>
    <td>K5</td>
    <td>A5</td>
    <td>NaN</td>
    <td>NaN</td>
    </tr>
    </tbody>
    </table>

    </div>

    kurt函数

    查找数据的峰度值

    df9 = pd.DataFrame({
        "A":[12, 4, 5, 44, 1], 
        "B":[5, 2, 54, 3, 2], 
        "C":[20, 16, 7, 3, 8], 
        "D":[14, 3, 17, 2, 6]}) 
    df9
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>A</th>
    <th>B</th>
    <th>C</th>
    <th>D</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>12</td>
    <td>5</td>
    <td>20</td>
    <td>14</td>
    </tr>
    <tr>
    <th>1</th>
    <td>4</td>
    <td>2</td>
    <td>16</td>
    <td>3</td>
    </tr>
    <tr>
    <th>2</th>
    <td>5</td>
    <td>54</td>
    <td>7</td>
    <td>17</td>
    </tr>
    <tr>
    <th>3</th>
    <td>44</td>
    <td>3</td>
    <td>3</td>
    <td>2</td>
    </tr>
    <tr>
    <th>4</th>
    <td>1</td>
    <td>2</td>
    <td>8</td>
    <td>6</td>
    </tr>
    </tbody>
    </table>

    </div>

    df9.kurt()
    
    A    3.936824
    B    4.941512
    C   -1.745717
    D   -2.508808
    dtype: float64
    

    loc函数

    loc就是location的缩写,定位查找数据

    df9
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>A</th>
    <th>B</th>
    <th>C</th>
    <th>D</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>12</td>
    <td>5</td>
    <td>20</td>
    <td>14</td>
    </tr>
    <tr>
    <th>1</th>
    <td>4</td>
    <td>2</td>
    <td>16</td>
    <td>3</td>
    </tr>
    <tr>
    <th>2</th>
    <td>5</td>
    <td>54</td>
    <td>7</td>
    <td>17</td>
    </tr>
    <tr>
    <th>3</th>
    <td>44</td>
    <td>3</td>
    <td>3</td>
    <td>2</td>
    </tr>
    <tr>
    <th>4</th>
    <td>1</td>
    <td>2</td>
    <td>8</td>
    <td>6</td>
    </tr>
    </tbody>
    </table>

    </div>

    df9.loc[1,:]  # 第一行全部列的数据
    
    A     4
    B     2
    C    16
    D     3
    Name: 1, dtype: int64
    
    df9.loc[1:3,"B"]  # 1到3行的B列
    
    1     2
    2    54
    3     3
    Name: B, dtype: int64
    

    merge函数

    同样也是数据的合并函数,类似SQL中的join,功能最为强大

    df7
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>key</th>
    <th>A</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>K0</td>
    <td>A0</td>
    </tr>
    <tr>
    <th>1</th>
    <td>K1</td>
    <td>A1</td>
    </tr>
    <tr>
    <th>2</th>
    <td>K2</td>
    <td>A2</td>
    </tr>
    <tr>
    <th>3</th>
    <td>K3</td>
    <td>A3</td>
    </tr>
    <tr>
    <th>4</th>
    <td>K4</td>
    <td>A4</td>
    </tr>
    <tr>
    <th>5</th>
    <td>K5</td>
    <td>A5</td>
    </tr>
    </tbody>
    </table>

    </div>

    df8
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>key</th>
    <th>B</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>K0</td>
    <td>B0</td>
    </tr>
    <tr>
    <th>1</th>
    <td>K1</td>
    <td>B1</td>
    </tr>
    <tr>
    <th>2</th>
    <td>K2</td>
    <td>B2</td>
    </tr>
    </tbody>
    </table>

    </div>

    pd.merge(df7,df8)  # 默认how的参数是inner
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>key</th>
    <th>A</th>
    <th>B</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>K0</td>
    <td>A0</td>
    <td>B0</td>
    </tr>
    <tr>
    <th>1</th>
    <td>K1</td>
    <td>A1</td>
    <td>B1</td>
    </tr>
    <tr>
    <th>2</th>
    <td>K2</td>
    <td>A2</td>
    <td>B2</td>
    </tr>
    </tbody>
    </table>

    </div>

    pd.merge(df7,df8,how="outer")  
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>key</th>
    <th>A</th>
    <th>B</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>K0</td>
    <td>A0</td>
    <td>B0</td>
    </tr>
    <tr>
    <th>1</th>
    <td>K1</td>
    <td>A1</td>
    <td>B1</td>
    </tr>
    <tr>
    <th>2</th>
    <td>K2</td>
    <td>A2</td>
    <td>B2</td>
    </tr>
    <tr>
    <th>3</th>
    <td>K3</td>
    <td>A3</td>
    <td>NaN</td>
    </tr>
    <tr>
    <th>4</th>
    <td>K4</td>
    <td>A4</td>
    <td>NaN</td>
    </tr>
    <tr>
    <th>5</th>
    <td>K5</td>
    <td>A5</td>
    <td>NaN</td>
    </tr>
    </tbody>
    </table>

    </div>

    nunique函数

    用于统计数据的唯一值

    df10 = pd.DataFrame({
        "sid":list("acbdefg"),
        "score":[9,8,9,7,8,9,3]
                        })
    df10
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>sid</th>
    <th>score</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>a</td>
    <td>9</td>
    </tr>
    <tr>
    <th>1</th>
    <td>c</td>
    <td>8</td>
    </tr>
    <tr>
    <th>2</th>
    <td>b</td>
    <td>9</td>
    </tr>
    <tr>
    <th>3</th>
    <td>d</td>
    <td>7</td>
    </tr>
    <tr>
    <th>4</th>
    <td>e</td>
    <td>8</td>
    </tr>
    <tr>
    <th>5</th>
    <td>f</td>
    <td>9</td>
    </tr>
    <tr>
    <th>6</th>
    <td>g</td>
    <td>3</td>
    </tr>
    </tbody>
    </table>

    </div>

    df10.nunique()
    
    sid      7
    score    4
    dtype: int64
    

    pct_change函数

    计算当前时期和前一个时期的比值

    s = pd.Series([90, 91, 85])
    s
    
    0    90
    1    91
    2    85
    dtype: int64
    
    s.pct_change()
    
    0         NaN
    1    0.011111
    2   -0.065934
    dtype: float64
    
    (91 - 90) / 90
    
    0.011111111111111112
    
    (85 - 91) / 91
    
    -0.06593406593406594
    
    # 和前两个时期相比
    s.pct_change(periods=2) 
    
    0         NaN
    1         NaN
    2   -0.055556
    dtype: float64
    
    # 如果存在空值,用填充方法
    s = pd.Series([90, 91, None, 85])
    s  
    
    0    90.0
    1    91.0
    2     NaN
    3    85.0
    dtype: float64
    
    s.pct_change(fill_method='ffill')
    
    0         NaN
    1    0.011111
    2    0.000000
    3   -0.065934
    dtype: float64
    

    query函数

    根据条件查询取值

    df10
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>sid</th>
    <th>score</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>a</td>
    <td>9</td>
    </tr>
    <tr>
    <th>1</th>
    <td>c</td>
    <td>8</td>
    </tr>
    <tr>
    <th>2</th>
    <td>b</td>
    <td>9</td>
    </tr>
    <tr>
    <th>3</th>
    <td>d</td>
    <td>7</td>
    </tr>
    <tr>
    <th>4</th>
    <td>e</td>
    <td>8</td>
    </tr>
    <tr>
    <th>5</th>
    <td>f</td>
    <td>9</td>
    </tr>
    <tr>
    <th>6</th>
    <td>g</td>
    <td>3</td>
    </tr>
    </tbody>
    </table>

    </div>

    df10.query("score >= 8")
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>sid</th>
    <th>score</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>a</td>
    <td>9</td>
    </tr>
    <tr>
    <th>1</th>
    <td>c</td>
    <td>8</td>
    </tr>
    <tr>
    <th>2</th>
    <td>b</td>
    <td>9</td>
    </tr>
    <tr>
    <th>4</th>
    <td>e</td>
    <td>8</td>
    </tr>
    <tr>
    <th>5</th>
    <td>f</td>
    <td>9</td>
    </tr>
    </tbody>
    </table>

    </div>

    rank函数

    进行排名的函数,类似SQL的窗口函数功能:

    df10
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>sid</th>
    <th>score</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>a</td>
    <td>9</td>
    </tr>
    <tr>
    <th>1</th>
    <td>c</td>
    <td>8</td>
    </tr>
    <tr>
    <th>2</th>
    <td>b</td>
    <td>9</td>
    </tr>
    <tr>
    <th>3</th>
    <td>d</td>
    <td>7</td>
    </tr>
    <tr>
    <th>4</th>
    <td>e</td>
    <td>8</td>
    </tr>
    <tr>
    <th>5</th>
    <td>f</td>
    <td>9</td>
    </tr>
    <tr>
    <th>6</th>
    <td>g</td>
    <td>3</td>
    </tr>
    </tbody>
    </table>

    </div>

    df10["rank_10"] = df10["score"].rank()
    df10
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>sid</th>
    <th>score</th>
    <th>rank_10</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>a</td>
    <td>9</td>
    <td>6.0</td>
    </tr>
    <tr>
    <th>1</th>
    <td>c</td>
    <td>8</td>
    <td>3.5</td>
    </tr>
    <tr>
    <th>2</th>
    <td>b</td>
    <td>9</td>
    <td>6.0</td>
    </tr>
    <tr>
    <th>3</th>
    <td>d</td>
    <td>7</td>
    <td>2.0</td>
    </tr>
    <tr>
    <th>4</th>
    <td>e</td>
    <td>8</td>
    <td>3.5</td>
    </tr>
    <tr>
    <th>5</th>
    <td>f</td>
    <td>9</td>
    <td>6.0</td>
    </tr>
    <tr>
    <th>6</th>
    <td>g</td>
    <td>3</td>
    <td>1.0</td>
    </tr>
    </tbody>
    </table>

    </div>

    df10["rank_10_max"] = df10["score"].rank(method="max")
    df10
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>sid</th>
    <th>score</th>
    <th>rank_10</th>
    <th>rank_10_max</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>a</td>
    <td>9</td>
    <td>6.0</td>
    <td>7.0</td>
    </tr>
    <tr>
    <th>1</th>
    <td>c</td>
    <td>8</td>
    <td>3.5</td>
    <td>4.0</td>
    </tr>
    <tr>
    <th>2</th>
    <td>b</td>
    <td>9</td>
    <td>6.0</td>
    <td>7.0</td>
    </tr>
    <tr>
    <th>3</th>
    <td>d</td>
    <td>7</td>
    <td>2.0</td>
    <td>2.0</td>
    </tr>
    <tr>
    <th>4</th>
    <td>e</td>
    <td>8</td>
    <td>3.5</td>
    <td>4.0</td>
    </tr>
    <tr>
    <th>5</th>
    <td>f</td>
    <td>9</td>
    <td>6.0</td>
    <td>7.0</td>
    </tr>
    <tr>
    <th>6</th>
    <td>g</td>
    <td>3</td>
    <td>1.0</td>
    <td>1.0</td>
    </tr>
    </tbody>
    </table>

    </div>

    df10["rank_10_min"] = df10["score"].rank(method="min")
    df10
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>sid</th>
    <th>score</th>
    <th>rank_10</th>
    <th>rank_10_max</th>
    <th>rank_10_min</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>a</td>
    <td>9</td>
    <td>6.0</td>
    <td>7.0</td>
    <td>5.0</td>
    </tr>
    <tr>
    <th>1</th>
    <td>c</td>
    <td>8</td>
    <td>3.5</td>
    <td>4.0</td>
    <td>3.0</td>
    </tr>
    <tr>
    <th>2</th>
    <td>b</td>
    <td>9</td>
    <td>6.0</td>
    <td>7.0</td>
    <td>5.0</td>
    </tr>
    <tr>
    <th>3</th>
    <td>d</td>
    <td>7</td>
    <td>2.0</td>
    <td>2.0</td>
    <td>2.0</td>
    </tr>
    <tr>
    <th>4</th>
    <td>e</td>
    <td>8</td>
    <td>3.5</td>
    <td>4.0</td>
    <td>3.0</td>
    </tr>
    <tr>
    <th>5</th>
    <td>f</td>
    <td>9</td>
    <td>6.0</td>
    <td>7.0</td>
    <td>5.0</td>
    </tr>
    <tr>
    <th>6</th>
    <td>g</td>
    <td>3</td>
    <td>1.0</td>
    <td>1.0</td>
    <td>1.0</td>
    </tr>
    </tbody>
    </table>

    </div>

    sort_values函数

    根据数据进行排序的函数

    df9
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>A</th>
    <th>B</th>
    <th>C</th>
    <th>D</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>12</td>
    <td>5</td>
    <td>20</td>
    <td>14</td>
    </tr>
    <tr>
    <th>1</th>
    <td>4</td>
    <td>2</td>
    <td>16</td>
    <td>3</td>
    </tr>
    <tr>
    <th>2</th>
    <td>5</td>
    <td>54</td>
    <td>7</td>
    <td>17</td>
    </tr>
    <tr>
    <th>3</th>
    <td>44</td>
    <td>3</td>
    <td>3</td>
    <td>2</td>
    </tr>
    <tr>
    <th>4</th>
    <td>1</td>
    <td>2</td>
    <td>8</td>
    <td>6</td>
    </tr>
    </tbody>
    </table>

    </div>

    df9.sort_values("A")  # 默认是升序排列
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>A</th>
    <th>B</th>
    <th>C</th>
    <th>D</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>4</th>
    <td>1</td>
    <td>2</td>
    <td>8</td>
    <td>6</td>
    </tr>
    <tr>
    <th>1</th>
    <td>4</td>
    <td>2</td>
    <td>16</td>
    <td>3</td>
    </tr>
    <tr>
    <th>2</th>
    <td>5</td>
    <td>54</td>
    <td>7</td>
    <td>17</td>
    </tr>
    <tr>
    <th>0</th>
    <td>12</td>
    <td>5</td>
    <td>20</td>
    <td>14</td>
    </tr>
    <tr>
    <th>3</th>
    <td>44</td>
    <td>3</td>
    <td>3</td>
    <td>2</td>
    </tr>
    </tbody>
    </table>

    </div>

    # 先根据B升序,如果B相同,再根据D降序
    
    df9.sort_values(["B","D"], ascending=[True,False])  
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>A</th>
    <th>B</th>
    <th>C</th>
    <th>D</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>4</th>
    <td>1</td>
    <td>2</td>
    <td>8</td>
    <td>6</td>
    </tr>
    <tr>
    <th>1</th>
    <td>4</td>
    <td>2</td>
    <td>16</td>
    <td>3</td>
    </tr>
    <tr>
    <th>3</th>
    <td>44</td>
    <td>3</td>
    <td>3</td>
    <td>2</td>
    </tr>
    <tr>
    <th>0</th>
    <td>12</td>
    <td>5</td>
    <td>20</td>
    <td>14</td>
    </tr>
    <tr>
    <th>2</th>
    <td>5</td>
    <td>54</td>
    <td>7</td>
    <td>17</td>
    </tr>
    </tbody>
    </table>

    </div>

    tail函数

    查看末尾的数据

    df7.tail()
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>key</th>
    <th>A</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>1</th>
    <td>K1</td>
    <td>A1</td>
    </tr>
    <tr>
    <th>2</th>
    <td>K2</td>
    <td>A2</td>
    </tr>
    <tr>
    <th>3</th>
    <td>K3</td>
    <td>A3</td>
    </tr>
    <tr>
    <th>4</th>
    <td>K4</td>
    <td>A4</td>
    </tr>
    <tr>
    <th>5</th>
    <td>K5</td>
    <td>A5</td>
    </tr>
    </tbody>
    </table>

    </div>

    df7.tail(3)
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>key</th>
    <th>A</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>3</th>
    <td>K3</td>
    <td>A3</td>
    </tr>
    <tr>
    <th>4</th>
    <td>K4</td>
    <td>A4</td>
    </tr>
    <tr>
    <th>5</th>
    <td>K5</td>
    <td>A5</td>
    </tr>
    </tbody>
    </table>

    </div>

    unique函数

    查找每个字段的唯一元素

    df10
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>sid</th>
    <th>score</th>
    <th>rank_10</th>
    <th>rank_10_max</th>
    <th>rank_10_min</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>a</td>
    <td>9</td>
    <td>6.0</td>
    <td>7.0</td>
    <td>5.0</td>
    </tr>
    <tr>
    <th>1</th>
    <td>c</td>
    <td>8</td>
    <td>3.5</td>
    <td>4.0</td>
    <td>3.0</td>
    </tr>
    <tr>
    <th>2</th>
    <td>b</td>
    <td>9</td>
    <td>6.0</td>
    <td>7.0</td>
    <td>5.0</td>
    </tr>
    <tr>
    <th>3</th>
    <td>d</td>
    <td>7</td>
    <td>2.0</td>
    <td>2.0</td>
    <td>2.0</td>
    </tr>
    <tr>
    <th>4</th>
    <td>e</td>
    <td>8</td>
    <td>3.5</td>
    <td>4.0</td>
    <td>3.0</td>
    </tr>
    <tr>
    <th>5</th>
    <td>f</td>
    <td>9</td>
    <td>6.0</td>
    <td>7.0</td>
    <td>5.0</td>
    </tr>
    <tr>
    <th>6</th>
    <td>g</td>
    <td>3</td>
    <td>1.0</td>
    <td>1.0</td>
    <td>1.0</td>
    </tr>
    </tbody>
    </table>

    </div>

    df10["score"].unique()
    
    array([9, 8, 7, 3])
    
    df10["rank_10"].unique()
    
    array([6. , 3.5, 2. , 1. ])
    

    value_counts函数

    用于统计字段中每个唯一值的个数

    df6
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>sid</th>
    <th>phones</th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>0</th>
    <td>s1</td>
    <td>华为</td>
    </tr>
    <tr>
    <th>0</th>
    <td>s1</td>
    <td>小米</td>
    </tr>
    <tr>
    <th>0</th>
    <td>s1</td>
    <td>一加</td>
    </tr>
    <tr>
    <th>1</th>
    <td>s2</td>
    <td>三星</td>
    </tr>
    <tr>
    <th>1</th>
    <td>s2</td>
    <td>苹果</td>
    </tr>
    </tbody>
    </table>

    </div>

    df6["sid"].value_counts()
    
    s1    3
    s2    2
    Name: sid, dtype: int64
    
    df6["phones"].value_counts()
    
    华为    1
    苹果    1
    三星    1
    一加    1
    小米    1
    Name: phones, dtype: int64
    

    where函数

    用于查找Series或者DataFrame中满足某个条件的数据

    w = pd.Series(range(7))
    w
    
    0    0
    1    1
    2    2
    3    3
    4    4
    5    5
    6    6
    dtype: int64
    
    # 满足条件的显示;不满足的用空值代替
    w.where(w>3)
    
    0    NaN
    1    NaN
    2    NaN
    3    NaN
    4    4.0
    5    5.0
    6    6.0
    dtype: float64
    
    # 不满足条件的用8代替
    w.where(w > 1, 8)
    
    0    8
    1    8
    2    2
    3    3
    4    4
    5    5
    6    6
    dtype: int64
    

    xs函数

    该函数是用于多层级索引中用于获取指定索引处的值,使用一个关键参数来选择多索引特定级别的数据。

    d = {'num_legs': [4, 4, 2, 2],
         'num_wings': [0, 0, 2, 2],
         'class': ['mammal', 'mammal', 'mammal', 'bird'],
         'animal': ['cat', 'dog', 'bat', 'penguin'],
         'locomotion': ['walks', 'walks', 'flies', 'walks']}
    # 生成数据
    df11 = pd.DataFrame(data=d)
    # 重置索引
    df11 = df11.set_index(['class', 'animal', 'locomotion'])
    df11
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th></th>
    <th></th>
    <th>num_legs</th>
    <th>num_wings</th>
    </tr>
    <tr>
    <th>class</th>
    <th>animal</th>
    <th>locomotion</th>
    <th></th>
    <th></th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th rowspan="3" valign="top">mammal</th>
    <th>cat</th>
    <th>walks</th>
    <td>4</td>
    <td>0</td>
    </tr>
    <tr>
    <th>dog</th>
    <th>walks</th>
    <td>4</td>
    <td>0</td>
    </tr>
    <tr>
    <th>bat</th>
    <th>flies</th>
    <td>2</td>
    <td>2</td>
    </tr>
    <tr>
    <th>bird</th>
    <th>penguin</th>
    <th>walks</th>
    <td>2</td>
    <td>2</td>
    </tr>
    </tbody>
    </table>

    </div>

    # 获取指定索引的值
    df11.xs('mammal')  
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th></th>
    <th>num_legs</th>
    <th>num_wings</th>
    </tr>
    <tr>
    <th>animal</th>
    <th>locomotion</th>
    <th></th>
    <th></th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>cat</th>
    <th>walks</th>
    <td>4</td>
    <td>0</td>
    </tr>
    <tr>
    <th>dog</th>
    <th>walks</th>
    <td>4</td>
    <td>0</td>
    </tr>
    <tr>
    <th>bat</th>
    <th>flies</th>
    <td>2</td>
    <td>2</td>
    </tr>
    </tbody>
    </table>

    </div>

    # 指定多个索引处的值
    df11.xs(('mammal', 'dog'))
    
    /Applications/downloads/anaconda/anaconda3/lib/python3.7/site-packages/IPython/core/interactiveshell.py:2881: PerformanceWarning: indexing past lexsort depth may impact performance.
      return runner(coro)
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>num_legs</th>
    <th>num_wings</th>
    </tr>
    <tr>
    <th>locomotion</th>
    <th></th>
    <th></th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>walks</th>
    <td>4</td>
    <td>0</td>
    </tr>
    </tbody>
    </table>

    </div>

    # 获取指定索引和级别(level)的值
    
    df11.xs('cat', level=1)
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th></th>
    <th>num_legs</th>
    <th>num_wings</th>
    </tr>
    <tr>
    <th>class</th>
    <th>locomotion</th>
    <th></th>
    <th></th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>mammal</th>
    <th>walks</th>
    <td>4</td>
    <td>0</td>
    </tr>
    </tbody>
    </table>

    </div>

    df11
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th></th>
    <th></th>
    <th>num_legs</th>
    <th>num_wings</th>
    </tr>
    <tr>
    <th>class</th>
    <th>animal</th>
    <th>locomotion</th>
    <th></th>
    <th></th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th rowspan="3" valign="top">mammal</th>
    <th>cat</th>
    <th>walks</th>
    <td>4</td>
    <td>0</td>
    </tr>
    <tr>
    <th>dog</th>
    <th>walks</th>
    <td>4</td>
    <td>0</td>
    </tr>
    <tr>
    <th>bat</th>
    <th>flies</th>
    <td>2</td>
    <td>2</td>
    </tr>
    <tr>
    <th>bird</th>
    <th>penguin</th>
    <th>walks</th>
    <td>2</td>
    <td>2</td>
    </tr>
    </tbody>
    </table>

    </div>

    # 获取多个索引和级别的值
    df11.xs(('bird', 'walks'),level=[0, 'locomotion'])
    

    <div>
    <style scoped>
    .dataframe tbody tr th:only-of-type {
    vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }
    
    .dataframe thead th {
        text-align: right;
    }
    

    </style>

    <table border="1" class="dataframe">
    <thead>
    <tr style="text-align: right;">
    <th></th>
    <th>num_legs</th>
    <th>num_wings</th>
    </tr>
    <tr>
    <th>animal</th>
    <th></th>
    <th></th>
    </tr>
    </thead>
    <tbody>
    <tr>
    <th>penguin</th>
    <td>2</td>
    <td>2</td>
    </tr>
    </tbody>
    </table>

    </div>

    # 获取指定列和轴上的值
    df11.xs('num_wings', axis=1)
    
    class   animal   locomotion
    mammal  cat      walks         0
            dog      walks         0
            bat      flies         2
    bird    penguin  walks         2
    Name: num_wings, dtype: int64
    

    相关文章

      网友评论

          本文标题:精选23个Pandas函数

          本文链接:https://www.haomeiwen.com/subject/lpydcrtx.html