美文网首页
Python之1 pandas过滤特定的行和列

Python之1 pandas过滤特定的行和列

作者: 夕颜00 | 来源:发表于2020-05-27 14:33 被阅读0次
    1.导入pandas模块
    >>> import pandas as pd
    
    
    2.导入数据
    >>> titanic = pd.read_csv(r'C:\Users\Administrator\Desktop\titanic.csv')
    
    
    3.选择单列
    >>> ages = titanic["Age"]
    >>> ages.head()
    0    22.0
    1    38.0
    2    26.0
    3    35.0
    4    35.0
    Name: Age, dtype: float64
    >>> type(ages)
    <class 'pandas.core.series.Series'>
    
    

    head()方法,不指定行数,则默认显示5行,单列的类型是Series

    4.选择多列
    >>> age_sex = titanic[["Name", "Age", "Sex"]]
    >>> age_sex.head()
                                                    Name   Age     Sex
    0                            Braund, Mr. Owen Harris  22.0    male
    1  Cumings, Mrs. John Bradley (Florence Briggs Th...  38.0  female
    2                             Heikkinen, Miss. Laina  26.0  female
    3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  35.0  female
    4                           Allen, Mr. William Henry  35.0    male
    >>> type(age_sex)
    <class 'pandas.core.frame.DataFrame'>
    >>> age_sex.shape
    (891, 3)
    
    

    双列或以上的类型依旧是DataFrame,shape方法使用在DataFrame上则返回(行,列),若使用在Series上则返回行数

    5.过滤特定的单行,单条件
    >>> age_35 = titanic[titanic["Age"] > 35]
    >>> age_35.head()
        PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked
    1             2         1       1  ...  71.2833   C85         C
    6             7         0       1  ...  51.8625   E46         S
    11           12         1       1  ...  26.5500  C103         S
    13           14         0       3  ...  31.2750   NaN         S
    15           16         1       2  ...  16.0000   NaN         S
    >>> age_35.shape      #过滤后的行数
    (217, 12)
    >>> titanic.shape     #过滤前的行数
    (891, 12)
    
    

    条件表达式还支持( =>,>,==, !=,<,<=)等等...

    6.过滤特定单行,多条件
    >>> class_23 = titanic[titanic["Pclass"].isin([2, 3])]
    >>> class_23.head()
       PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked
    0            1         0       3  ...   7.2500   NaN         S
    2            3         1       3  ...   7.9250   NaN         S
    4            5         0       3  ...   8.0500   NaN         S
    5            6         0       3  ...   8.4583   NaN         Q
    7            8         0       3  ...  21.0750   NaN         S
    
    [5 rows x 12 columns]
    >>> class_23.shape
    (675, 12)
    >>> class_23 = titanic[(titanic["Pclass"] == 2) | (titanic["Pclass"] == 3)]
    >>> class_23.shape
    (675, 12)
    
    

    isin()函数等价于多条件判断的or或and,但你需要使用"|"和"&"替代or和and,且不同的条件需要单独用"()"

    7.过滤非空值
    >>> age_no_na = titanic[titanic["Age"].notna()]
    >>> age_no_na.shape
    (714, 12)
    >>> titanic.shape
    (891, 12)
    
    
    8.过滤特定行和列
    >>> adult_names = titanic.loc[titanic["Age"] > 35, ["Name", "Age"]]
    >>> adult_names.head()
                                                     Name   Age
    1   Cumings, Mrs. John Bradley (Florence Briggs Th...  38.0
    6                             McCarthy, Mr. Timothy J  54.0
    11                           Bonnell, Miss. Elizabeth  58.0
    13                        Andersson, Mr. Anders Johan  39.0
    15                   Hewlett, Mrs. (Mary D Kingcome)   55.0
    >>>
    >>> titanic.iloc[9:25, 2:5]    #选取10至25行和第3至5列
        Pclass                                               Name     Sex
    9        2                Nasser, Mrs. Nicholas (Adele Achem)  female
    10       3                    Sandstrom, Miss. Marguerite Rut  female
    11       1                           Bonnell, Miss. Elizabeth  female
    12       3                     Saundercock, Mr. William Henry    male
    13       3                        Andersson, Mr. Anders Johan    male
    14       3               Vestrom, Miss. Hulda Amanda Adolfina  female
    15       2                   Hewlett, Mrs. (Mary D Kingcome)   female
    16       3                               Rice, Master. Eugene    male
    17       2                       Williams, Mr. Charles Eugene    male
    18       3  Vander Planke, Mrs. Julius (Emelia Maria Vande...  female
    19       3                            Masselmani, Mrs. Fatima  female
    20       2                               Fynney, Mr. Joseph J    male
    21       2                              Beesley, Mr. Lawrence    male
    22       3                        McGowan, Miss. Anna "Annie"  female
    23       1                       Sloper, Mr. William Thompson    male
    24       3                      Palsson, Miss. Torborg Danira  female
    >>> titanic.iloc[0:3, 3] = "anonymous"  #将名称分配给anonymous第三列的前三个元素
    >>> titanic.head()
       PassengerId  Survived  Pclass                                          Name     Sex   Age  SibSp  Parch            Ticket     Fare Cabin Embarked
    0            1         0       3                                     anonymous    male  22.0      1      0         A/5 21171   7.2500   NaN        S
    1            2         1       1                                     anonymous  female  38.0      1      0          PC 17599  71.2833   C85        C
    2            3         1       3                                     anonymous  female  26.0      0      0  STON/O2. 3101282   7.9250   NaN        S
    3            4         1       1  Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1      0            113803  53.1000  C123        S
    4            5         0       3                      Allen, Mr. William Henry    male  35.0      0      0            373450   8.0500   NaN        S
    
    

    关于loc/iloc[行,列],其中loc:通过选取行(列)标签索引数据
    iloc:通过选取行(列)位置编号索引数据从0开始计数。

    相关文章

      网友评论

          本文标题:Python之1 pandas过滤特定的行和列

          本文链接:https://www.haomeiwen.com/subject/bnfaahtx.html