Python之1 pandas过滤特定的行和列

作者: 夕颜00 | 来源:发表于2020-05-27 14:33 被阅读0次

1.导入pandas模块

>>> import pandas as pd

2.导入数据

>>> titanic = pd.read_csv(r'C:\Users\Administrator\Desktop\titanic.csv')

3.选择单列

>>> ages = titanic["Age"]
>>> ages.head()
0    22.0
1    38.0
2    26.0
3    35.0
4    35.0
Name: Age, dtype: float64
>>> type(ages)
<class 'pandas.core.series.Series'>

head()方法，不指定行数，则默认显示5行，单列的类型是Series

4.选择多列

>>> age_sex = titanic[["Name", "Age", "Sex"]]
>>> age_sex.head()
                                                Name   Age     Sex
0                            Braund, Mr. Owen Harris  22.0    male
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  38.0  female
2                             Heikkinen, Miss. Laina  26.0  female
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  35.0  female
4                           Allen, Mr. William Henry  35.0    male
>>> type(age_sex)
<class 'pandas.core.frame.DataFrame'>
>>> age_sex.shape
(891, 3)

双列或以上的类型依旧是DataFrame，shape方法使用在DataFrame上则返回(行，列)，若使用在Series上则返回行数

5.过滤特定的单行，单条件

>>> age_35 = titanic[titanic["Age"] > 35]
>>> age_35.head()
    PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked
1             2         1       1  ...  71.2833   C85         C
6             7         0       1  ...  51.8625   E46         S
11           12         1       1  ...  26.5500  C103         S
13           14         0       3  ...  31.2750   NaN         S
15           16         1       2  ...  16.0000   NaN         S
>>> age_35.shape      #过滤后的行数
(217, 12)
>>> titanic.shape     #过滤前的行数
(891, 12)

条件表达式还支持（ =>，>，==， !=，<，<=）等等...

6.过滤特定单行，多条件

>>> class_23 = titanic[titanic["Pclass"].isin([2, 3])]
>>> class_23.head()
   PassengerId  Survived  Pclass  ...     Fare Cabin  Embarked
0            1         0       3  ...   7.2500   NaN         S
2            3         1       3  ...   7.9250   NaN         S
4            5         0       3  ...   8.0500   NaN         S
5            6         0       3  ...   8.4583   NaN         Q
7            8         0       3  ...  21.0750   NaN         S

[5 rows x 12 columns]
>>> class_23.shape
(675, 12)
>>> class_23 = titanic[(titanic["Pclass"] == 2) | (titanic["Pclass"] == 3)]
>>> class_23.shape
(675, 12)

isin()函数等价于多条件判断的or或and，但你需要使用"|"和"&"替代or和and，且不同的条件需要单独用"()"

7.过滤非空值

>>> age_no_na = titanic[titanic["Age"].notna()]
>>> age_no_na.shape
(714, 12)
>>> titanic.shape
(891, 12)

8.过滤特定行和列

>>> adult_names = titanic.loc[titanic["Age"] > 35, ["Name", "Age"]]
>>> adult_names.head()
                                                 Name   Age
1   Cumings, Mrs. John Bradley (Florence Briggs Th...  38.0
6                             McCarthy, Mr. Timothy J  54.0
11                           Bonnell, Miss. Elizabeth  58.0
13                        Andersson, Mr. Anders Johan  39.0
15                   Hewlett, Mrs. (Mary D Kingcome)   55.0
>>>
>>> titanic.iloc[9:25, 2:5]    #选取10至25行和第3至5列
    Pclass                                               Name     Sex
9        2                Nasser, Mrs. Nicholas (Adele Achem)  female
10       3                    Sandstrom, Miss. Marguerite Rut  female
11       1                           Bonnell, Miss. Elizabeth  female
12       3                     Saundercock, Mr. William Henry    male
13       3                        Andersson, Mr. Anders Johan    male
14       3               Vestrom, Miss. Hulda Amanda Adolfina  female
15       2                   Hewlett, Mrs. (Mary D Kingcome)   female
16       3                               Rice, Master. Eugene    male
17       2                       Williams, Mr. Charles Eugene    male
18       3  Vander Planke, Mrs. Julius (Emelia Maria Vande...  female
19       3                            Masselmani, Mrs. Fatima  female
20       2                               Fynney, Mr. Joseph J    male
21       2                              Beesley, Mr. Lawrence    male
22       3                        McGowan, Miss. Anna "Annie"  female
23       1                       Sloper, Mr. William Thompson    male
24       3                      Palsson, Miss. Torborg Danira  female
>>> titanic.iloc[0:3, 3] = "anonymous"  #将名称分配给anonymous第三列的前三个元素
>>> titanic.head()
   PassengerId  Survived  Pclass                                          Name     Sex   Age  SibSp  Parch            Ticket     Fare Cabin Embarked
0            1         0       3                                     anonymous    male  22.0      1      0         A/5 21171   7.2500   NaN        S
1            2         1       1                                     anonymous  female  38.0      1      0          PC 17599  71.2833   C85        C
2            3         1       3                                     anonymous  female  26.0      0      0  STON/O2. 3101282   7.9250   NaN        S
3            4         1       1  Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1      0            113803  53.1000  C123        S
4            5         0       3                      Allen, Mr. William Henry    male  35.0      0      0            373450   8.0500   NaN        S

关于loc/iloc[行，列]，其中loc：通过选取行（列）标签索引数据
iloc：通过选取行（列）位置编号索引数据从0开始计数。

网友评论

本文标题：Python之1 pandas过滤特定的行和列

本文链接：https://www.haomeiwen.com/subject/bnfaahtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

Python之1 pandas过滤特定的行和列

1.导入pandas模块

2.导入数据

3.选择单列

4.选择多列

5.过滤特定的单行，单条件

6.过滤特定单行，多条件

7.过滤非空值

8.过滤特定行和列

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读